datascience.tables.Table.sample¶
- Table.sample(k=None, with_replacement=True, weights=None)[source]¶
Return a new table where k rows are randomly sampled from the original table.
- Args:
k– specifies the number of rows (int) to be sampled fromthe table. Default is k equal to number of rows in the table.
with_replacement– (bool) By default True;Samples
krows with replacement from table, else sampleskrows without replacement.weights– Array specifying probability the ith row of thetable is sampled. Defaults to None, which samples each row with equal probability.
weightsmust be a valid probability distribution – i.e. an array the length of the number of rows, summing to 1.
- Raises:
- ValueError – if
weightsis not length equal to number of rows in the table; or, if
weightsdoes not sum to 1.
- ValueError – if
- Returns:
A new instance of
Tablewithkrows resampled.
>>> jobs = Table().with_columns( ... 'job', make_array('a', 'b', 'c', 'd'), ... 'wage', make_array(10, 20, 15, 8)) >>> jobs job | wage a | 10 b | 20 c | 15 d | 8 >>> jobs.sample() job | wage b | 20 b | 20 a | 10 d | 8 >>> jobs.sample(with_replacement=True) job | wage d | 8 b | 20 c | 15 a | 10 >>> jobs.sample(k = 2) job | wage b | 20 c | 15 >>> ws = make_array(0.5, 0.5, 0, 0) >>> jobs.sample(k=2, with_replacement=True, weights=ws) job | wage a | 10 a | 10 >>> jobs.sample(k=2, weights=make_array(1, 0, 1, 0)) Traceback (most recent call last): ... ValueError: probabilities do not sum to 1 >>> jobs.sample(k=2, weights=make_array(1, 0, 0)) # Weights must be length of table. Traceback (most recent call last): ... ValueError: 'a' and 'p' must have same size