datascience.tables.Table.groups

Table.groups(labels, collect=None)[source]

Group rows by multiple columns, count or aggregate others.

Args:

labels: list of column names (or indices) to group on

collect: a function applied to values in other columns for each group

Returns: A Table with each row corresponding to a unique combination of values in

the columns specified in labels, where the first columns are those specified in labels, followed by a column of counts for each of the unique values. If collect is provided, a Table is returned with all original columns, each containing values calculated by first grouping rows according to to values in the labels column, then applying collect to each set of grouped values in the other columns.

Note:

The grouped columns will appear first in the result table. If collect does not accept arguments with one of the column types, that column will be empty in the resulting table.

>>> marbles = Table().with_columns(
...    "Color", make_array("Red", "Green", "Blue", "Red", "Green", "Green"),
...    "Shape", make_array("Round", "Rectangular", "Rectangular", "Round", "Rectangular", "Round"),
...    "Amount", make_array(4, 6, 12, 7, 9, 2),
...    "Price", make_array(1.30, 1.30, 2.00, 1.75, 1.40, 1.00))
>>> marbles
Color | Shape       | Amount | Price
Red   | Round       | 4      | 1.3
Green | Rectangular | 6      | 1.3
Blue  | Rectangular | 12     | 2
Red   | Round       | 7      | 1.75
Green | Rectangular | 9      | 1.4
Green | Round       | 2      | 1
>>> marbles.groups(["Color", "Shape"])
Color | Shape       | count
Blue  | Rectangular | 1
Green | Rectangular | 2
Green | Round       | 1
Red   | Round       | 2
>>> marbles.groups(["Color", "Shape"], sum)
Color | Shape       | Amount sum | Price sum
Blue  | Rectangular | 12         | 2
Green | Rectangular | 15         | 2.7
Green | Round       | 2          | 1
Red   | Round       | 11         | 3.05