datascience.tables.Table.sample¶

Table.sample(k=None, with_replacement=True, weights=None)[source]

Return a new table where k rows are randomly sampled from the original table.

Args:
`k` – specifies the number of rows (`int`) to be sampled from

the table. Default is k equal to number of rows in the table.

`with_replacement` – (`bool`) By default True;

Samples `k` rows with replacement from table, else samples `k` rows without replacement.

`weights` – Array specifying probability the ith row of the

table is sampled. Defaults to None, which samples each row with equal probability. `weights` must be a valid probability distribution – i.e. an array the length of the number of rows, summing to 1.

Raises:
ValueError – if `weights` is not length equal to number of rows

in the table; or, if `weights` does not sum to 1.

Returns:

A new instance of `Table` with `k` rows resampled.

```>>> jobs = Table().with_columns(
...     'job',  make_array('a', 'b', 'c', 'd'),
...     'wage', make_array(10, 20, 15, 8))
>>> jobs
job  | wage
a    | 10
b    | 20
c    | 15
d    | 8
>>> jobs.sample()
job  | wage
b    | 20
b    | 20
a    | 10
d    | 8
>>> jobs.sample(with_replacement=True)
job  | wage
d    | 8
b    | 20
c    | 15
a    | 10
>>> jobs.sample(k = 2)
job  | wage
b    | 20
c    | 15
>>> ws =  make_array(0.5, 0.5, 0, 0)
>>> jobs.sample(k=2, with_replacement=True, weights=ws)
job  | wage
a    | 10
a    | 10
>>> jobs.sample(k=2, weights=make_array(1, 0, 1, 0))
Traceback (most recent call last):
...
ValueError: probabilities do not sum to 1
>>> jobs.sample(k=2, weights=make_array(1, 0, 0)) # Weights must be length of table.
Traceback (most recent call last):
...
ValueError: 'a' and 'p' must have same size
```