Data 8 `datascience` Reference¶

This notebook serves as an interactive, Data 8-friendly reference for the datascience library.

`Table` Functions and Methods¶

`Table()`¶

Create an empty table, usually to extend with data

[29]:

new_table = Table()
new_table

[29]:

[30]:

type(new_table)

[30]:

datascience.tables.Table

`Table.read_table()`¶

Table.read_table(filename)

Creates a table by reading the CSV file named filename (a string).

[31]:

trips = Table.read_table('https://raw.githubusercontent.com/data-8/textbook/gh-pages/data/trip.csv')
trips

[31]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
876419	413	8/5/2015 8:29	Civic Center BART (7th at Market)	72	8/5/2015 8:36	Townsend at 7th	65	269	Subscriber	94518
459672	408	9/18/2014 17:11	Harry Bridges Plaza (Ferry Building)	50	9/18/2014 17:17	Embarcadero at Sansome	60	429	Subscriber	94111
903647	723	8/25/2015 7:26	San Francisco Caltrain 2 (330 Townsend)	69	8/25/2015 7:38	Market at 10th	67	631	Subscriber	94025
452829	409	9/15/2014 8:29	Steuart at Market	74	9/15/2014 8:36	Market at 4th	76	428	Subscriber	94925
491023	224	10/9/2014 16:13	Santa Clara at Almaden	4	10/9/2014 16:17	San Jose Diridon Caltrain Station	2	144	Subscriber	94117
723352	519	4/13/2015 17:04	Howard at 2nd	63	4/13/2015 17:12	San Francisco Caltrain (Townsend at 4th)	70	629	Subscriber	94061
524499	431	10/31/2014 16:36	Townsend at 7th	65	10/31/2014 16:43	Civic Center BART (7th at Market)	72	630	Subscriber	94706
518524	389	10/28/2014 8:48	Market at Sansome	77	10/28/2014 8:54	2nd at South Park	64	458	Subscriber	94610
710070	11460	4/2/2015 18:13	Powell Street BART	39	4/2/2015 21:24	Powell Street BART	39	375	Subscriber	94107
793149	616	6/4/2015 5:26	Embarcadero at Bryant	54	6/4/2015 5:36	Embarcadero at Sansome	60	289	Subscriber	94105

... (99990 rows omitted)

`tbl.with_column`¶

tbl = Table()
tbl.with_column(name, values)
tbl.with_columns(n1, v1, n2, v2,...)

Creates a new table by adding a column with name name and values values to another table. name should be a string and values should have as many entries as there are rows in the original table. If values is a single value, then every row of that column has the value values.

In the examples below, we start with adding a column to the existing table trips with values being an array we construct from existing tables.

[32]:

trips.with_column(
    "Difference in terminal", abs(trips.column("Start Terminal") - trips.column("End Terminal"))
)

[32]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code	Difference in terminal
876419	413	8/5/2015 8:29	Civic Center BART (7th at Market)	72	8/5/2015 8:36	Townsend at 7th	65	269	Subscriber	94518	7
459672	408	9/18/2014 17:11	Harry Bridges Plaza (Ferry Building)	50	9/18/2014 17:17	Embarcadero at Sansome	60	429	Subscriber	94111	10
903647	723	8/25/2015 7:26	San Francisco Caltrain 2 (330 Townsend)	69	8/25/2015 7:38	Market at 10th	67	631	Subscriber	94025	2
452829	409	9/15/2014 8:29	Steuart at Market	74	9/15/2014 8:36	Market at 4th	76	428	Subscriber	94925	2
491023	224	10/9/2014 16:13	Santa Clara at Almaden	4	10/9/2014 16:17	San Jose Diridon Caltrain Station	2	144	Subscriber	94117	2
723352	519	4/13/2015 17:04	Howard at 2nd	63	4/13/2015 17:12	San Francisco Caltrain (Townsend at 4th)	70	629	Subscriber	94061	7
524499	431	10/31/2014 16:36	Townsend at 7th	65	10/31/2014 16:43	Civic Center BART (7th at Market)	72	630	Subscriber	94706	7
518524	389	10/28/2014 8:48	Market at Sansome	77	10/28/2014 8:54	2nd at South Park	64	458	Subscriber	94610	13
710070	11460	4/2/2015 18:13	Powell Street BART	39	4/2/2015 21:24	Powell Street BART	39	375	Subscriber	94107	0
793149	616	6/4/2015 5:26	Embarcadero at Bryant	54	6/4/2015 5:36	Embarcadero at Sansome	60	289	Subscriber	94105	6

... (99990 rows omitted)

We can also create a new table by adding two new columns with column name followed by the array values.

[33]:

cookies = Table()
cookies = cookies.with_columns(
    "Cookie", make_array("Sugar cookies", "Chocolate chip", "Red velvet", "Oatmeal raisin", "Peanut butter"),
    "Quantity", make_array(10, 15, 15, 10, 5)
)
cookies

[33]:

Cookie	Quantity
Sugar cookies	10
Chocolate chip	15
Red velvet	15
Oatmeal raisin	10
Peanut butter	5

[34]:

prices = make_array(1.00, 1.50, 1.75, 1.25, 1.00)
cookies = cookies.with_column("Price ($)", prices)
cookies

[34]:

Cookie	Quantity	Price ($)
Sugar cookies	10	1
Chocolate chip	15	1.5
Red velvet	15	1.75
Oatmeal raisin	10	1.25
Peanut butter	5	1

In the last examples, we add a new column Delicious with one value “yes,” and we see every column has the same value.

[35]:

cookies.with_column("Delicious", "yes")

[35]:

Cookie	Quantity	Price ($)	Delicious
Sugar cookies	10	1	yes
Chocolate chip	15	1.5	yes
Red velvet	15	1.75	yes
Oatmeal raisin	10	1.25	yes
Peanut butter	5	1	yes

`tbl.column()`¶

tbl.column(column_name_or_index)

Outputs an array of values of the column column_name_or_index. column_name_or_index is a string of the column name or number which is the index of the column.

In the examples below, we start with an array of the Cookie column from the table cookies first by the column name then by using the index of the column.

[36]:

cookies.column("Cookie")

[36]:

array(['Sugar cookies', 'Chocolate chip', 'Red velvet', 'Oatmeal raisin',
       'Peanut butter'], dtype='<U14')

[37]:

cookies.column(0)

[37]:

array(['Sugar cookies', 'Chocolate chip', 'Red velvet', 'Oatmeal raisin',
       'Peanut butter'], dtype='<U14')

`tbl.num_rows`¶

Computes the number of rows in a table.

[38]:

trips.num_rows

[38]:

[39]:

cookies.num_rows

[39]:

`tbl.num_columns`¶

Computes the number of columns in a table.

[40]:

trips.num_columns

[40]:

[41]:

cookies.num_columns

[41]:

`tbl.labels`¶

Outputs the column labels in a table.

[42]:

trips.labels

[42]:

('Trip ID',
 'Duration',
 'Start Date',
 'Start Station',
 'Start Terminal',
 'End Date',
 'End Station',
 'End Terminal',
 'Bike #',
 'Subscriber Type',
 'Zip Code')

[43]:

cookies.labels

[43]:

('Cookie', 'Quantity', 'Price ($)')

`tbl.select()`¶

tbl.select(col1, col2, ...)

Creates a copy of a table with only the selected columns. Each column is the column name as a string or the integer index of the column.

Suppose we want to select the Trip ID, Duration, Bike #, and Zip Code columns from the trips table.

[44]:

trips.select("Trip ID", "Duration", "Bike #", "Zip Code")

[44]:

Trip ID	Duration	Bike #	Zip Code
876419	413	269	94518
459672	408	429	94111
903647	723	631	94025
452829	409	428	94925
491023	224	144	94117
723352	519	629	94061
524499	431	630	94706
518524	389	458	94610
710070	11460	375	94107
793149	616	289	94105

... (99990 rows omitted)

Similarly, we can use indexes to select columns. Remember to start indexing at 0.

[45]:

trips.select(0, 1, 8, 10).show(5)

Trip ID	Duration	Bike #	Zip Code
876419	413	269	94518
459672	408	429	94111
903647	723	631	94025
452829	409	428	94925
491023	224	144	94117

... (99995 rows omitted)

`tbl.drop()`¶

tbl.drop(col1, col2, ...)

Creates a copy of a table without the specified columns. Each column is the column name as a string or integer index.

[46]:

cookies.drop("Quantity")

[46]:

Cookie	Price ($)
Sugar cookies	1
Chocolate chip	1.5
Red velvet	1.75
Oatmeal raisin	1.25
Peanut butter	1

[47]:

trips.drop("End Date", "Subscriber Type")

[47]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Station	End Terminal	Bike #	Zip Code
876419	413	8/5/2015 8:29	Civic Center BART (7th at Market)	72	Townsend at 7th	65	269	94518
459672	408	9/18/2014 17:11	Harry Bridges Plaza (Ferry Building)	50	Embarcadero at Sansome	60	429	94111
903647	723	8/25/2015 7:26	San Francisco Caltrain 2 (330 Townsend)	69	Market at 10th	67	631	94025
452829	409	9/15/2014 8:29	Steuart at Market	74	Market at 4th	76	428	94925
491023	224	10/9/2014 16:13	Santa Clara at Almaden	4	San Jose Diridon Caltrain Station	2	144	94117
723352	519	4/13/2015 17:04	Howard at 2nd	63	San Francisco Caltrain (Townsend at 4th)	70	629	94061
524499	431	10/31/2014 16:36	Townsend at 7th	65	Civic Center BART (7th at Market)	72	630	94706
518524	389	10/28/2014 8:48	Market at Sansome	77	2nd at South Park	64	458	94610
710070	11460	4/2/2015 18:13	Powell Street BART	39	Powell Street BART	39	375	94107
793149	616	6/4/2015 5:26	Embarcadero at Bryant	54	Embarcadero at Sansome	60	289	94105

... (99990 rows omitted)

[48]:

trips.drop(3, 6, 8, 9, 10)

[48]:

Trip ID	Duration	Start Date	Start Terminal	End Date	End Terminal
876419	413	8/5/2015 8:29	72	8/5/2015 8:36	65
459672	408	9/18/2014 17:11	50	9/18/2014 17:17	60
903647	723	8/25/2015 7:26	69	8/25/2015 7:38	67
452829	409	9/15/2014 8:29	74	9/15/2014 8:36	76
491023	224	10/9/2014 16:13	4	10/9/2014 16:17	2
723352	519	4/13/2015 17:04	63	4/13/2015 17:12	70
524499	431	10/31/2014 16:36	65	10/31/2014 16:43	72
518524	389	10/28/2014 8:48	77	10/28/2014 8:54	64
710070	11460	4/2/2015 18:13	39	4/2/2015 21:24	39
793149	616	6/4/2015 5:26	54	6/4/2015 5:36	60

... (99990 rows omitted)

`tbl.relabel()`¶

tbl.relabel(old_label, new_label)

Modifies the table by changing the label of the column named old_label to new_label. old_label can be a string column name or an integer index.

[49]:

cookies

[49]:

Cookie	Quantity	Price ($)
Sugar cookies	10	1
Chocolate chip	15	1.5
Red velvet	15	1.75
Oatmeal raisin	10	1.25
Peanut butter	5	1

[50]:

cookies.relabel("Quantity", "Amount remaining")

[50]:

Cookie	Amount remaining	Price ($)
Sugar cookies	10	1
Chocolate chip	15	1.5
Red velvet	15	1.75
Oatmeal raisin	10	1.25
Peanut butter	5	1

[51]:

cookies.relabel(0, "Type")

[51]:

Type	Amount remaining	Price ($)
Sugar cookies	10	1
Chocolate chip	15	1.5
Red velvet	15	1.75
Oatmeal raisin	10	1.25
Peanut butter	5	1

[52]:

cookies

[52]:

Type	Amount remaining	Price ($)
Sugar cookies	10	1
Chocolate chip	15	1.5
Red velvet	15	1.75
Oatmeal raisin	10	1.25
Peanut butter	5	1

`tbl.show()`¶

tbl.show(n)

Displays the first n rows of a table. If no n is provided, displays all rows.

[53]:

trips.show(5)

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
876419	413	8/5/2015 8:29	Civic Center BART (7th at Market)	72	8/5/2015 8:36	Townsend at 7th	65	269	Subscriber	94518
459672	408	9/18/2014 17:11	Harry Bridges Plaza (Ferry Building)	50	9/18/2014 17:17	Embarcadero at Sansome	60	429	Subscriber	94111
903647	723	8/25/2015 7:26	San Francisco Caltrain 2 (330 Townsend)	69	8/25/2015 7:38	Market at 10th	67	631	Subscriber	94025
452829	409	9/15/2014 8:29	Steuart at Market	74	9/15/2014 8:36	Market at 4th	76	428	Subscriber	94925
491023	224	10/9/2014 16:13	Santa Clara at Almaden	4	10/9/2014 16:17	San Jose Diridon Caltrain Station	2	144	Subscriber	94117

... (99995 rows omitted)

`tbl.sort()`¶

tbl.sort(column_name_or_index, descending=False)

Sorts the rows in the table by the values in the column column_name_or_index in ascending order by default. Set descending=True to sort in descending order. column_name_or_index can be a string column label or an integer index.

[54]:

cookies

[54]:

Type	Amount remaining	Price ($)
Sugar cookies	10	1
Chocolate chip	15	1.5
Red velvet	15	1.75
Oatmeal raisin	10	1.25
Peanut butter	5	1

[55]:

cookies.sort("Price ($)")

[55]:

Type	Amount remaining	Price ($)
Sugar cookies	10	1
Peanut butter	5	1
Oatmeal raisin	10	1.25
Chocolate chip	15	1.5
Red velvet	15	1.75

[56]:

# sort in descending order
cookies.sort("Amount remaining", descending = True)

[56]:

Type	Amount remaining	Price ($)
Red velvet	15	1.75
Chocolate chip	15	1.5
Oatmeal raisin	10	1.25
Sugar cookies	10	1
Peanut butter	5	1

[57]:

# alphabetical order
cookies.sort(0)

[57]:

Type	Amount remaining	Price ($)
Chocolate chip	15	1.5
Oatmeal raisin	10	1.25
Peanut butter	5	1
Red velvet	15	1.75
Sugar cookies	10	1

`tbl.where()`¶

tbl.where(column, predicate)

Filters the table for rows where the predicate is true. predicate should be one of the provided are.<something> functions. column can be a string column label or an integer index. A list of available predicates can be found below.

[58]:

cookies.where("Amount remaining", are.above(10))

[58]:

Type	Amount remaining	Price ($)
Chocolate chip	15	1.5
Red velvet	15	1.75

[59]:

cookies.where(0, are.equal_to("Chocolate chip"))

[59]:

Type	Amount remaining	Price ($)
Chocolate chip	15	1.5

[62]:

# if predicate is a value, look for rows where the column == the value
# equivalent to cookies.where(1, are.eual_to(15))
cookies.where(1, 15)

[62]:

Type	Amount remaining	Price ($)
Chocolate chip	15	1.5
Red velvet	15	1.75

[63]:

cookies.where("Price ($)", are.below(1.25))

[63]:

Type	Amount remaining	Price ($)
Sugar cookies	10	1
Peanut butter	5	1

`tbl.take()`¶

tbl.take(row_index, ...)

Returns a copy of the table with only the specified rows included. Rows are specified by their integer index, so 0 for the first, 1 for the second, etc.

[64]:

cookies

[64]:

Type	Amount remaining	Price ($)
Sugar cookies	10	1
Chocolate chip	15	1.5
Red velvet	15	1.75
Oatmeal raisin	10	1.25
Peanut butter	5	1

[65]:

cookies.take(0)

[65]:

Type	Amount remaining	Price ($)
Sugar cookies	10	1

[66]:

cookies.take(cookies.num_rows - 1)

[66]:

Type	Amount remaining	Price ($)
Peanut butter	5	1

[67]:

cookies.take(0, 1, 2)

[67]:

Type	Amount remaining	Price ($)
Sugar cookies	10	1
Chocolate chip	15	1.5
Red velvet	15	1.75

`Table` Visualizations¶

[68]:

actors = Table().read_table("https://github.com/data-8/textbook/raw/gh-pages/data/actors.csv")
actors

[68]:

Actor	Total Gross	Number of Movies	Average per Movie	#1 Movie	Gross
Harrison Ford	4871.7	41	118.8	Star Wars: The Force Awakens	936.7
Samuel L. Jackson	4772.8	69	69.2	The Avengers	623.4
Morgan Freeman	4468.3	61	73.3	The Dark Knight	534.9
Tom Hanks	4340.8	44	98.7	Toy Story 3	415
Robert Downey, Jr.	3947.3	53	74.5	The Avengers	623.4
Eddie Murphy	3810.4	38	100.3	Shrek 2	441.2
Tom Cruise	3587.2	36	99.6	War of the Worlds	234.3
Johnny Depp	3368.6	45	74.9	Dead Man's Chest	423.3
Michael Caine	3351.5	58	57.8	The Dark Knight	534.9
Scarlett Johansson	3341.2	37	90.3	The Avengers	623.4

... (40 rows omitted)

`tbl.scatter()`¶

tbl.scatter(x_column, y_column, fit_line=False)

Creates a scatter plot with x_column on the horizontal axis and y_column on the vertical axis. These labels can be column names as strings or integer indices. Set fit_line=True to include a line of best fit for the data. You can find more examples in the textbook.

[71]:

actors.scatter('Number of Movies', 'Total Gross')

../_images/reference-nb_datascience-reference_60_0.png

[73]:

actors.scatter(2, 3, fit_line=True)

../_images/reference-nb_datascience-reference_61_0.png

`tbl.plot()`¶

tbl.plot(x_column, y_column)

Plot a line graph with x_column on the horizontal axis and y_column on the vertical axis. Sorts the table in ascending order by values in x_column first. x_column and y_column can be column names as strings or integer indices.

[74]:

movies_by_year = Table.read_table('https://github.com/data-8/textbook/raw/gh-pages/data/movies_by_year.csv')
movies_by_year.show(3)

Year	Total Gross	Number of Movies	#1 Movie
2015	11128.5	702	Star Wars: The Force Awakens
2014	10360.8	702	American Sniper
2013	10923.6	688	Catching Fire

... (33 rows omitted)

[75]:

movies_by_year.plot('Year', 'Number of Movies')

../_images/reference-nb_datascience-reference_64_0.png

`tbl.barh()`¶

tbl.barh(categories)
tbl.barh(categories, values)

Plots a horizontal bar chart broken down by categories as the bars. If values is unspecified, one bar for each column of the table (except categories) is plotted. categories and values can be column names as strings or integer indices.

[76]:

cookies.barh("Type")

../_images/reference-nb_datascience-reference_66_0.png

[77]:

cookies.barh("Type", "Amount remaining")

../_images/reference-nb_datascience-reference_67_0.png

[78]:

cookies.barh(0, 2)

../_images/reference-nb_datascience-reference_68_0.png

`tbl.hist()`¶

tbl.hist(column)
tbl.hist(column, bins=...)

Plot a histogram of the values in column. Defaults to 10 bins of equal width. If bins is specified, it can be a number of bins to use (e.g. bins=25 will produce a histogram with 25 bins) or an array of values to use as bins (e.g. bins=make_array(1, 3, 4) will produce 2 bins: $[1, 3)$ and $[3, 4)$). column can be column names as strings or integer indices.

[79]:

actors.hist(3)

../_images/reference-nb_datascience-reference_70_0.png

[80]:

actors.hist("Gross")

../_images/reference-nb_datascience-reference_71_0.png

`Table.interactive_plots()`¶

Table.interactive_plots()

This function will change from static plots like the ones above to interactive plots made with plotly. If a plotting method has a plotly version, that method will be used instead.

[193]:

Table.interactive_plots()
actors.scatter("Total Gross", "Gross")

`Table.static_plots()`¶

Table.static_plots()

This function turns off plotly plots.

[194]:

Table.static_plots()
actors.scatter("Total Gross", "Gross")

../_images/reference-nb_datascience-reference_75_0.png

Advanced `Table` Functions¶

`tbl.apply()`¶

tbl.apply(function, column)
tbl.apply(function, col1, col2, ...)

Applies the function function to each element of the column column and returns the values returned as an array. If function takes more than one argument, you can specify multiple columns to use for each argument in order.

[65]:

actors.apply(np.average, "Number of Movies")

[65]:

array([41., 69., 61., 44., 53., 38., 36., 45., 58., 37., 38., 49., 60.,
       43.,  7., 31., 24., 50., 39., 79., 34., 63., 23., 34., 37., 36.,
       17., 50., 39., 42., 35., 36., 17., 46., 40., 32., 17., 29., 39.,
       26., 25., 36., 27., 43., 25., 21., 40., 35., 23., 25.])

[82]:

actors

[82]:

Actor	Total Gross	Number of Movies	Average per Movie	#1 Movie	Gross
Harrison Ford	4871.7	41	118.8	Star Wars: The Force Awakens	936.7
Samuel L. Jackson	4772.8	69	69.2	The Avengers	623.4
Morgan Freeman	4468.3	61	73.3	The Dark Knight	534.9
Tom Hanks	4340.8	44	98.7	Toy Story 3	415
Robert Downey, Jr.	3947.3	53	74.5	The Avengers	623.4
Eddie Murphy	3810.4	38	100.3	Shrek 2	441.2
Tom Cruise	3587.2	36	99.6	War of the Worlds	234.3
Johnny Depp	3368.6	45	74.9	Dead Man's Chest	423.3
Michael Caine	3351.5	58	57.8	The Dark Knight	534.9
Scarlett Johansson	3341.2	37	90.3	The Avengers	623.4

... (40 rows omitted)

The example below calculates the average gross for each movie by actor by applying a function that takes in the value of Total Gross and Number of Movies and returns their quotient.

[83]:

def average_gross(total_gross, num_movies):
    return total_gross / num_movies

actors.apply(average_gross, "Total Gross", "Number of Movies")

[83]:

array([118.82195122,  69.17101449,  73.25081967,  98.65454545,
        74.47735849, 100.27368421,  99.64444444,  74.85777778,
        57.78448276,  90.3027027 ,  86.68421053,  66.9244898 ,
        53.15666667,  73.8372093 , 451.84285714, 101.62580645,
       131.2125    ,  62.478     ,  79.67435897,  39.00379747,
        89.16764706,  46.70952381, 125.67826087,  84.86470588,
        76.40540541,  78.38888889, 165.63529412,  56.316     ,
        71.86153846,  65.12619048,  77.89428571,  75.425     ,
       157.75882353,  58.28913043,  67.0225    ,  83.15625   ,
       154.96470588,  89.83103448,  66.72564103,  99.25384615,
       102.308     ,  70.82777778,  94.26666667,  58.65348837,
       100.732     , 119.06190476,  61.5925    ,  70.36      ,
       106.86086957,  96.66      ])

`tbl.group()`¶

tbl.group(column_or_columns)
tbl.group(column_or_columns, func)

Groups a table by values in column_or_columns. If column_or_columns is an array, groups by each unique combination of elements in those columns. If func is specified, it should be a function that takes in an array of values and returns a single value. If unspecified, this defaults to the count of rows in the set.

[84]:

trips.group("Start Station")

[84]:

Start Station	count
2nd at Folsom	2302
2nd at South Park	2610
2nd at Townsend	3904
5th at Howard	2190
Adobe on Almaden	165
Arena Green / SAP Center	176
Beale at Market	2377
Broadway St at Battery St	2157
California Ave Caltrain Station	127
Castro Street and El Camino Real	339

... (60 rows omitted)

[85]:

trips.group("Start Station", np.mean).select(0,2)

[85]:

Start Station	Duration mean
2nd at Folsom	512.887
2nd at South Park	654.565
2nd at Townsend	755.176
5th at Howard	819.509
Adobe on Almaden	2522.5
Arena Green / SAP Center	1999.7
Beale at Market	679.602
Broadway St at Battery St	827.753
California Ave Caltrain Station	4403.29
Castro Street and El Camino Real	1221.86

... (60 rows omitted)

[86]:

trips.group("Start Station").sort("count", descending = True)

[86]:

Start Station	count
San Francisco Caltrain (Townsend at 4th)	7426
San Francisco Caltrain 2 (330 Townsend)	6114
Harry Bridges Plaza (Ferry Building)	4795
Temporary Transbay Terminal (Howard at Beale)	4212
Townsend at 7th	3925
2nd at Townsend	3904
Embarcadero at Sansome	3900
Steuart at Market	3872
Market at 10th	3370
Market at Sansome	3218

... (60 rows omitted)

[87]:

trips.group(['Start Station', 'End Station'])

[87]:

Start Station	End Station	count
2nd at Folsom	2nd at Folsom	22
2nd at Folsom	2nd at South Park	84
2nd at Folsom	2nd at Townsend	123
2nd at Folsom	5th at Howard	28
2nd at Folsom	Beale at Market	34
2nd at Folsom	Broadway St at Battery St	18
2nd at Folsom	Civic Center BART (7th at Market)	13
2nd at Folsom	Clay at Battery	70
2nd at Folsom	Commercial at Montgomery	46
2nd at Folsom	Davis at Jackson	8

... (1616 rows omitted)

`tbl.pivot()`¶

tbl.pivot(col1, col2)
tbl.pivot(col1, col2, values, collect)

Creates a pivot table with values in col1 as columns and values in col2 as rows. If values is unspecified, the values in the cells default to counts. If values is specified, it should be the label of a column whose values to pass as an array to collect, which should return a single value.

[88]:

more_cones = Table().with_columns(
    'Flavor', make_array('strawberry', 'chocolate', 'chocolate', 'strawberry', 'chocolate', 'bubblegum'),
    'Color', make_array('pink', 'light brown', 'dark brown', 'pink', 'dark brown', 'pink'),
    'Price', make_array(3.55, 4.75, 5.25, 5.25, 5.25, 4.75)
)

more_cones

[88]:

Flavor	Color	Price
strawberry	pink	3.55
chocolate	light brown	4.75
chocolate	dark brown	5.25
strawberry	pink	5.25
chocolate	dark brown	5.25
bubblegum	pink	4.75

[89]:

more_cones.pivot('Flavor', 'Color')

[89]:

Color	bubblegum	chocolate	strawberry
dark brown	0	2	0
light brown	0	1	0
pink	1	0	2

[90]:

more_cones.pivot('Flavor', 'Color', values='Price', collect=sum)

[90]:

Color	bubblegum	chocolate	strawberry
dark brown	0	10.5	0
light brown	0	4.75	0
pink	4.75	0	8.8

[91]:

more_cones.pivot(0, 1)

[91]:

Color	bubblegum	chocolate	strawberry
dark brown	0	2	0
light brown	0	1	0
pink	1	0	2

`tbl.join()`¶

tbl1.join(col1, tbl2)
tbl1.join(col1, tbl2, col2)

Performs a join of tbl1 on tbl2 where rows are only included if the value in col1 is present in both join columns. If col2 is unspecified, it is assumed to be the same label as col1.

[92]:

cones = Table().with_columns(
    'Flavor', make_array('strawberry', 'vanilla', 'chocolate', 'strawberry', 'chocolate'),
    'Price', make_array(3.55, 4.75, 6.55, 5.25, 5.75)
)
cones

[92]:

Flavor	Price
strawberry	3.55
vanilla	4.75
chocolate	6.55
strawberry	5.25
chocolate	5.75

[95]:

ratings = Table().with_columns(
    'Kind', make_array('strawberry', 'chocolate', 'vanilla', 'mint chip'),
    'Stars', make_array(2.5, 3.5, 4, 3)
)
ratings

[95]:

Kind	Stars
strawberry	2.5
chocolate	3.5
vanilla	4
mint chip	3

[97]:

# Joins cones on ratings. Note that the mint chip flavor doesn't appear since it's not in cones
rated = cones.join('Flavor', ratings, 'Kind')
rated

[97]:

Flavor	Price	Stars
chocolate	6.55	3.5
chocolate	5.75	3.5
strawberry	3.55	2.5
strawberry	5.25	2.5
vanilla	4.75	4

`tbl.sample()`¶

tbl.sample(k, with_replacement=True)

Returns a new table with k rows that were randomly sampled from the original table. If with_replacement is true, sampling occurs with replacement. For sampling without replacement, set with_replacement=False.

[98]:

# if you rerun this cell, you should get different results since the sample is random
rated.sample(2)

[98]:

Flavor	Price	Stars
chocolate	6.55	3.5
chocolate	6.55	3.5

Notice how the table below has more rows for certain flavors than the original rated table. This is because we are sampling with replacement, so you get theoretically get 5 of the same flavors!

[99]:

sampled_with_replacement = rated.sample(5)
sampled_with_replacement

[99]:

Flavor	Price	Stars
strawberry	5.25	2.5
strawberry	3.55	2.5
strawberry	3.55	2.5
chocolate	6.55	3.5
vanilla	4.75	4

[100]:

rated.sample(3, with_replacement = False)

[100]:

Flavor	Price	Stars
vanilla	4.75	4
strawberry	3.55	2.5
chocolate	6.55	3.5

String Methods¶

`str.split()`¶

string.split(separator)

Splits the string string into a list on each occurence of the substring separator. The occurences of separator are removed from the resulting list.

For example, the code below splits the string Data 8hiishifun. on the substring hi.

[101]:

example_string = "Data 8hiishifun."
example_string.split("hi")

[101]:

['Data 8', 'is', 'fun.']

[104]:

# split on .
another_string = "the.secret.message.is.123"
another_string.split(".")

[104]:

['the', 'secret', 'message', 'is', '123']

`str.join()`¶

string.join(array)

Combines each element of array into one string with string used to connect each element.

[105]:

fun_array = make_array("high", "great", "best")
"est ".join(fun_array)

[105]:

'highest greatest best'

[106]:

# you can join elements on the empty string to just merge the elements
some_strings = make_array("some", "list", "of", "strings")
"".join(some_strings)

[106]:

'somelistofstrings'

`str.replace()`¶

string.replace(old_string, new_string)

Replaces each occurrence of old_string in string with new_string.

[107]:

berkeley_string = "I saw 5 friends, 10 squirrels, and 20 people flyering on Sproul."
berkeley_string

[107]:

'I saw 5 friends, 10 squirrels, and 20 people flyering on Sproul.'

[108]:

berkeley_string.replace("friends", "frisbees")

[108]:

'I saw 5 frisbees, 10 squirrels, and 20 people flyering on Sproul.'

[110]:

# you can chain calls to .replace() since the return value is also a string
berkeley_string.replace("friends", "frisbees").replace("flyering on Sproul", "having a picnic on the Glade")

[110]:

'I saw 5 frisbees, 10 squirrels, and 20 people having a picnic on the Glade.'

Array Functions and Methods¶

[111]:

example_array = make_array(1, 3, 5, 7, 9)
example_array

[111]:

array([1, 3, 5, 7, 9])

`max()`¶

max(array)

Returns the maximum value of an array.

[112]:

max(example_array)

[112]:

`min()`¶

min(array)

Returns the minimum value of an array.

[113]:

min(example_array)

[113]:

`sum()`¶

sum(array)

Returns the sum of values in an array.

[114]:

sum(example_array)

[114]:

[115]:

sum(make_array(1, 2, 0, -10))

[115]:

-7

`abs()`¶

abs(num)
abs(array)

Take the absolute value of number or each number in an array.

[118]:

abs(-1)

[118]:

[119]:

new_arr = make_array(-3, -1, 5.2, 0.25, -4.9)
abs(new_arr)

[119]:

array([3.  , 1.  , 5.2 , 0.25, 4.9 ])

`round(num)`¶

round(num)
round(num, d)
np.round(array)
np.round(array, d)

Round number or array of numbers to the nearest integer. If d is specified, rounds to d places after the decimal. Use np.round to round arrays.

[124]:

round(3.14159)

[124]:

[125]:

round(3.14159, 3)

[125]:

3.142

[130]:

np.round(new_arr, 1)

[130]:

array([-3. , -1. ,  5.2,  0.2, -4.9])

`len()`¶

len(array)

Returns the length of an array.

[131]:

len(new_arr)

[131]:

`make_array()`¶

make_array(val1, val2, ...)

Creates a new array with the values passed.

[132]:

new_array = make_array(25, 16, 9, 4, 1)
new_array

[132]:

array([25, 16,  9,  4,  1])

`np.mean`¶

np.mean(array)
np.average(array)

Returns the mean of the values in an array.

[134]:

np.mean(new_array)

[134]:

11.0

[133]:

np.average(new_array)

[133]:

11.0

`np.std()`¶

np.std(array)

Returns the standard deviation of the values in an array.

[150]:

np.std(new_array)

[150]:

8.648699324175862

`np.diff()`¶

np.diff(array)

Returns an array with the pairwise differences between elements in the input array. The output will have length len(array) - 1 and will have elements $x_1 - x_0$, $x_2 - x_1$, $x_3 - x_2$, etc.

[135]:

np.diff(new_array)

[135]:

array([-9, -7, -5, -3])

[136]:

np.diff(make_array(1, 3, 5, 7))

[136]:

array([2, 2, 2])

`np.sqrt()`¶

np.sqrt(num)
np.sqrt(array)

Returns the square root of a number or an array of the square roots of each element in the input array.

[137]:

np.sqrt(4)

[137]:

2.0

[138]:

np.sqrt(new_array)

[138]:

array([5., 4., 3., 2., 1.])

`np.arange()`¶

np.arange(stop)
np.arange(start, stop)
np.arange(start, stop, step)

Returns an array of integers from start to stop incrementing by step. If start is unspecified, it is assumed be 0. If step is unspecified, it is assumed to be 1. The upper bound is exclusive, meaning that max(np.arange(10)) is 9.

[139]:

np.arange(0, 11)

[139]:

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

[140]:

np.arange(5)

[140]:

array([0, 1, 2, 3, 4])

[142]:

np.arange(0, 102, 2.7)

[142]:

array([ 0. ,  2.7,  5.4,  8.1, 10.8, 13.5, 16.2, 18.9, 21.6, 24.3, 27. ,
       29.7, 32.4, 35.1, 37.8, 40.5, 43.2, 45.9, 48.6, 51.3, 54. , 56.7,
       59.4, 62.1, 64.8, 67.5, 70.2, 72.9, 75.6, 78.3, 81. , 83.7, 86.4,
       89.1, 91.8, 94.5, 97.2, 99.9])

`array.item()`¶

array.item(num)

Returns the item at index num in an array (remember Python indices start at 0!).

[143]:

np.arange(0, 102, 2).item(1)

[143]:

[146]:

new_array.item(2)

[146]:

[147]:

new_array.item(len(new_array) - 1)

[147]:

`np.random.choice`¶

np.random.choice(array)
np.random.choice(array, n, replace=True)

Picks one or n of items from an array at random. By default, with replacement (set replace=False for without replacement).

[149]:

np.random.choice(new_array)

[149]:

[150]:

np.random.choice(new_array, 3)

[150]:

array([ 4,  4, 16])

[152]:

np.random.choice(np.arange(0, 102, 2), 10, replace=False)

[152]:

array([ 98,  22,  12,  56,  24,  54, 100,  52,  28,  88])

`np.count_nonzero()`¶

Returns the number of nonzero elements in an array. Because False values are considered zeros (as integers), this can also give you the number of Trues in an array of boolean valuea.

[153]:

another_array = make_array(0, 1, 2, 0, 4, 0, 1, 0, 0)
np.count_nonzero(another_array)

[153]:

[159]:

bools = make_array(True, False, True, False, False, True, False)
np.count_nonzero(bools)

[159]:

`np.append()`¶

np.append(array, item)

Returns a copy of the input array with item (must be the same type as the other entries in the array) appended to the end.

[160]:

new_array

[160]:

array([25, 16,  9,  4,  1])

[161]:

np.append(new_array, 1000)

[161]:

array([  25,   16,    9,    4,    1, 1000])

`percentile()`¶

percentile(percent, array)

Returns the value corresponding to the specified percentile of an array. percent should be in percentage form (i.e. 50 not 0.5).

[162]:

long_array = make_array(1, 1, 1, 2, 2, 2, 3, 3, 3, 4)
long_array

[162]:

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4])

[163]:

percentile(50, long_array)

[163]:

[164]:

percentile(90, long_array)

[164]:

`Table.where` Predicates¶

All of the predicates described below can be negated by preceding the name with not_. For example, we can find values not equal to a specific value using are.not_equal_to(value).

`are.equal_to()`¶

tbl.where(column, are.equal_to(value))

Filter leaves rows only where the value in column is equal to value.

[166]:

trips.where("Duration", are.equal_to(519))

[166]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
723352	519	4/13/2015 17:04	Howard at 2nd	63	4/13/2015 17:12	San Francisco Caltrain (Townsend at 4th)	70	629	Subscriber	94061
824979	519	6/27/2015 15:02	Japantown	9	6/27/2015 15:11	San Jose City Hall	10	660	Customer	nil
439946	519	9/5/2014 12:38	Yerba Buena Center of the Arts (3rd @ Howard)	68	9/5/2014 12:47	Civic Center BART (7th at Market)	72	452	Subscriber	94105
788261	519	6/1/2015 9:21	Powell at Post (Union Square)	71	6/1/2015 9:30	Steuart at Market	74	575	Subscriber	94108
560479	519	11/28/2014 14:20	South Van Ness at Market	66	11/28/2014 14:29	Powell at Post (Union Square)	71	609	Subscriber	94108
653797	519	2/23/2015 8:47	Market at 10th	67	2/23/2015 8:55	Yerba Buena Center of the Arts (3rd @ Howard)	68	546	Subscriber	94102
887134	519	8/12/2015 17:29	Civic Center BART (7th at Market)	72	8/12/2015 17:38	Mechanics Plaza (Market at Battery)	75	313	Subscriber	94103
482225	519	10/3/2014 16:41	Spear at Folsom	49	10/3/2014 16:50	Broadway St at Battery St	82	209	Subscriber	94133
681697	519	3/14/2015 10:19	Embarcadero at Sansome	60	3/14/2015 10:28	Harry Bridges Plaza (Ferry Building)	50	368	Customer	95120
912821	519	8/31/2015 17:00	Embarcadero at Folsom	51	8/31/2015 17:09	San Francisco Caltrain (Townsend at 4th)	70	354	Subscriber	94085

... (115 rows omitted)

`are.above()`¶

tbl.where(column, are.above(value))

Filter leaves rows only where the value in column is strictly greater than value.

[167]:

trips.where("Duration", are.above(1000))

[167]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
710070	11460	4/2/2015 18:13	Powell Street BART	39	4/2/2015 21:24	Powell Street BART	39	375	Subscriber	94107
589964	15097	1/3/2015 15:22	Embarcadero at Sansome	60	1/3/2015 19:33	Golden Gate at Polk	59	599	Customer	29225
831509	1057	7/2/2015 10:14	2nd at Folsom	62	7/2/2015 10:31	South Van Ness at Market	66	631	Subscriber	94114
442750	6084	9/8/2014 13:32	Embarcadero at Sansome	60	9/8/2014 15:14	Embarcadero at Sansome	60	368	Customer	474454
608714	19799	1/18/2015 10:07	University and Emerson	35	1/18/2015 15:37	San Francisco Caltrain (Townsend at 4th)	70	686	Customer	nil
711961	1026	4/4/2015 7:07	Davis at Jackson	42	4/4/2015 7:24	Harry Bridges Plaza (Ferry Building)	50	189	Subscriber	94111
833071	2314	7/4/2015 11:13	Market at 4th	76	7/4/2015 11:52	Washington at Kearny	46	602	Customer	94806
570731	1218	12/8/2014 23:51	MLK Library	11	12/9/2014 0:12	St James Park	13	299	Customer	95033
853698	1048	7/20/2015 10:53	Broadway St at Battery St	82	7/20/2015 11:10	Embarcadero at Sansome	60	636	Customer	91436
787510	3670	5/31/2015 10:47	Mountain View City Hall	27	5/31/2015 11:48	Castro Street and El Camino Real	32	713	Customer	94041

... (11576 rows omitted)

`are.above_or_equal_to()`¶

tbl.where(column, are.above_or_equal_to(value))

Filter leaves rows only where the value in column is greater than or equal to value.

[168]:

trips.where("Duration", are.above_or_equal_to(1000))

[168]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
710070	11460	4/2/2015 18:13	Powell Street BART	39	4/2/2015 21:24	Powell Street BART	39	375	Subscriber	94107
589964	15097	1/3/2015 15:22	Embarcadero at Sansome	60	1/3/2015 19:33	Golden Gate at Polk	59	599	Customer	29225
831509	1057	7/2/2015 10:14	2nd at Folsom	62	7/2/2015 10:31	South Van Ness at Market	66	631	Subscriber	94114
442750	6084	9/8/2014 13:32	Embarcadero at Sansome	60	9/8/2014 15:14	Embarcadero at Sansome	60	368	Customer	474454
608714	19799	1/18/2015 10:07	University and Emerson	35	1/18/2015 15:37	San Francisco Caltrain (Townsend at 4th)	70	686	Customer	nil
711961	1026	4/4/2015 7:07	Davis at Jackson	42	4/4/2015 7:24	Harry Bridges Plaza (Ferry Building)	50	189	Subscriber	94111
833071	2314	7/4/2015 11:13	Market at 4th	76	7/4/2015 11:52	Washington at Kearny	46	602	Customer	94806
570731	1218	12/8/2014 23:51	MLK Library	11	12/9/2014 0:12	St James Park	13	299	Customer	95033
853698	1048	7/20/2015 10:53	Broadway St at Battery St	82	7/20/2015 11:10	Embarcadero at Sansome	60	636	Customer	91436
787510	3670	5/31/2015 10:47	Mountain View City Hall	27	5/31/2015 11:48	Castro Street and El Camino Real	32	713	Customer	94041

... (11597 rows omitted)

`are.below()`¶

tbl.where(column, are.below(value))

Filter leaves rows only where the value in column is strictly less than value.

[170]:

trips.where("Duration", are.below(100))

[170]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
482797	65	10/4/2014 7:50	San Francisco Caltrain (Townsend at 4th)	70	10/4/2014 7:52	San Francisco Caltrain (Townsend at 4th)	70	430	Subscriber	95112
483052	81	10/4/2014 13:52	Harry Bridges Plaza (Ferry Building)	50	10/4/2014 13:53	Harry Bridges Plaza (Ferry Building)	50	306	Customer	nan
569620	84	12/8/2014 10:09	Civic Center BART (7th at Market)	72	12/8/2014 10:10	Civic Center BART (7th at Market)	72	326	Subscriber	94111
502332	79	10/16/2014 17:26	Beale at Market	56	10/16/2014 17:27	Temporary Transbay Terminal (Howard at Beale)	55	613	Subscriber	94602
604012	76	1/14/2015 15:18	Davis at Jackson	42	1/14/2015 15:19	Broadway St at Battery St	82	601	Subscriber	94107
704918	70	3/30/2015 22:51	Broadway St at Battery St	82	3/30/2015 22:52	Broadway St at Battery St	82	394	Subscriber	94107
513458	83	10/24/2014 8:50	2nd at Folsom	62	10/24/2014 8:51	Howard at 2nd	63	569	Subscriber	94107
696725	94	3/25/2015 8:47	Post at Kearny	47	3/25/2015 8:49	Washington at Kearny	46	516	Subscriber	94109
829817	86	7/1/2015 9:27	Market at Sansome	77	7/1/2015 9:28	2nd at South Park	64	292	Subscriber	94538
745895	73	4/29/2015 13:05	Yerba Buena Center of the Arts (3rd @ Howard)	68	4/29/2015 13:06	Yerba Buena Center of the Arts (3rd @ Howard)	68	380	Subscriber	94947

... (403 rows omitted)

`are.below_or_equal_to()`¶

tbl.where(column, are.below_or_equal_to(value))

Filter leaves rows only where the value in column is less than or equal to value.

[171]:

trips.where("Duration", are.below_or_equal_to(100))

[171]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
482797	65	10/4/2014 7:50	San Francisco Caltrain (Townsend at 4th)	70	10/4/2014 7:52	San Francisco Caltrain (Townsend at 4th)	70	430	Subscriber	95112
483052	81	10/4/2014 13:52	Harry Bridges Plaza (Ferry Building)	50	10/4/2014 13:53	Harry Bridges Plaza (Ferry Building)	50	306	Customer	nan
569620	84	12/8/2014 10:09	Civic Center BART (7th at Market)	72	12/8/2014 10:10	Civic Center BART (7th at Market)	72	326	Subscriber	94111
502332	79	10/16/2014 17:26	Beale at Market	56	10/16/2014 17:27	Temporary Transbay Terminal (Howard at Beale)	55	613	Subscriber	94602
604012	76	1/14/2015 15:18	Davis at Jackson	42	1/14/2015 15:19	Broadway St at Battery St	82	601	Subscriber	94107
704918	70	3/30/2015 22:51	Broadway St at Battery St	82	3/30/2015 22:52	Broadway St at Battery St	82	394	Subscriber	94107
513458	83	10/24/2014 8:50	2nd at Folsom	62	10/24/2014 8:51	Howard at 2nd	63	569	Subscriber	94107
696725	94	3/25/2015 8:47	Post at Kearny	47	3/25/2015 8:49	Washington at Kearny	46	516	Subscriber	94109
808199	100	6/15/2015 20:57	Post at Kearny	47	6/15/2015 20:58	2nd at South Park	64	537	Subscriber	94107
829817	86	7/1/2015 9:27	Market at Sansome	77	7/1/2015 9:28	2nd at South Park	64	292	Subscriber	94538

... (430 rows omitted)

`are.between()`¶

tbl.where(column, are.between(x, y))

Filter leaves rows only where the value in column is greather than or equal to x and less than y (i.e. in the interval $[x, y)$).

[172]:

trips.where("Duration", are.between(100, 200))

[172]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
437830	151	9/4/2014 9:13	Grant Avenue at Columbus Avenue	73	9/4/2014 9:15	Commercial at Montgomery	45	306	Subscriber	94104
436255	195	9/3/2014 11:53	2nd at Folsom	62	9/3/2014 11:57	Spear at Folsom	49	403	Subscriber	94107
585884	151	12/26/2014 13:34	Broadway St at Battery St	82	12/26/2014 13:37	Harry Bridges Plaza (Ferry Building)	50	576	Subscriber	94107
548322	191	11/17/2014 20:10	Yerba Buena Center of the Arts (3rd @ Howard)	68	11/17/2014 20:13	Market at Sansome	77	29	Subscriber	94705
594999	185	1/7/2015 17:53	San Antonio Caltrain Station	29	1/7/2015 17:56	San Antonio Shopping Center	31	176	Subscriber	94040
468534	194	9/24/2014 19:08	Mechanics Plaza (Market at Battery)	75	9/24/2014 19:11	Harry Bridges Plaza (Ferry Building)	50	443	Subscriber	94107
873710	169	8/3/2015 17:20	Broadway St at Battery St	82	8/3/2015 17:23	Embarcadero at Sansome	60	532	Subscriber	94114
853087	168	7/20/2015 7:27	Temporary Transbay Terminal (Howard at Beale)	55	7/20/2015 7:30	2nd at Folsom	62	418	Subscriber	94602
863019	162	7/27/2015 8:31	Temporary Transbay Terminal (Howard at Beale)	55	7/27/2015 8:34	Mechanics Plaza (Market at Battery)	75	504	Subscriber	94111
883134	173	8/10/2015 15:11	Embarcadero at Folsom	51	8/10/2015 15:14	Beale at Market	56	363	Subscriber	94117

... (5083 rows omitted)

`are.between_or_equal_to()`¶

tbl.where(column, are.between_or_equal_to(x, y))

Filter leaves rows only where the value in column is between or equal to x and y (i.e. in the interval $[x, y]$).

[173]:

trips.where("Duration", are.between_or_equal_to(100, 200))

[173]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
437830	151	9/4/2014 9:13	Grant Avenue at Columbus Avenue	73	9/4/2014 9:15	Commercial at Montgomery	45	306	Subscriber	94104
436255	195	9/3/2014 11:53	2nd at Folsom	62	9/3/2014 11:57	Spear at Folsom	49	403	Subscriber	94107
585884	151	12/26/2014 13:34	Broadway St at Battery St	82	12/26/2014 13:37	Harry Bridges Plaza (Ferry Building)	50	576	Subscriber	94107
548322	191	11/17/2014 20:10	Yerba Buena Center of the Arts (3rd @ Howard)	68	11/17/2014 20:13	Market at Sansome	77	29	Subscriber	94705
903735	200	8/25/2015 7:59	Temporary Transbay Terminal (Howard at Beale)	55	8/25/2015 8:02	Steuart at Market	74	453	Subscriber	94501
594999	185	1/7/2015 17:53	San Antonio Caltrain Station	29	1/7/2015 17:56	San Antonio Shopping Center	31	176	Subscriber	94040
468534	194	9/24/2014 19:08	Mechanics Plaza (Market at Battery)	75	9/24/2014 19:11	Harry Bridges Plaza (Ferry Building)	50	443	Subscriber	94107
873710	169	8/3/2015 17:20	Broadway St at Battery St	82	8/3/2015 17:23	Embarcadero at Sansome	60	532	Subscriber	94114
853087	168	7/20/2015 7:27	Temporary Transbay Terminal (Howard at Beale)	55	7/20/2015 7:30	2nd at Folsom	62	418	Subscriber	94602
863019	162	7/27/2015 8:31	Temporary Transbay Terminal (Howard at Beale)	55	7/27/2015 8:34	Mechanics Plaza (Market at Battery)	75	504	Subscriber	94111

... (5180 rows omitted)

`are.contained_in()`¶

tbl.where(column, are.contained_in(string_or_array))

Filter leaves rows only where the value in column is a substring of string_or_array if it is a string or an element of string_or_array if it is an array

[176]:

trips.where("Start Station", are.contained_in("2nd at Folsom San Antonio Caltrain Station"))

[176]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
436255	195	9/3/2014 11:53	2nd at Folsom	62	9/3/2014 11:57	Spear at Folsom	49	403	Subscriber	94107
831509	1057	7/2/2015 10:14	2nd at Folsom	62	7/2/2015 10:31	South Van Ness at Market	66	631	Subscriber	94114
877160	306	8/5/2015 16:33	2nd at Folsom	62	8/5/2015 16:39	Beale at Market	56	527	Subscriber	94602
768619	840	5/15/2015 11:35	2nd at Folsom	62	5/15/2015 11:49	Market at 10th	67	604	Subscriber	94903
594999	185	1/7/2015 17:53	San Antonio Caltrain Station	29	1/7/2015 17:56	San Antonio Shopping Center	31	176	Subscriber	94040
701211	252	3/27/2015 16:26	2nd at Folsom	62	3/27/2015 16:30	Spear at Folsom	49	405	Subscriber	94105
487432	561	10/7/2014 17:48	2nd at Folsom	62	10/7/2014 17:58	Commercial at Montgomery	45	342	Subscriber	94107
610970	808	1/20/2015 13:28	2nd at Folsom	62	1/20/2015 13:42	Harry Bridges Plaza (Ferry Building)	50	310	Subscriber	94025
753668	196	5/5/2015 11:48	2nd at Folsom	62	5/5/2015 11:52	Temporary Transbay Terminal (Howard at Beale)	55	533	Subscriber	94973
466551	222	9/23/2014 18:12	2nd at Folsom	62	9/23/2014 18:16	2nd at Townsend	61	620	Subscriber	94107

... (2578 rows omitted)

[178]:

trips.where("Start Terminal", are.contained_in(make_array(62, 29)))

[178]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
436255	195	9/3/2014 11:53	2nd at Folsom	62	9/3/2014 11:57	Spear at Folsom	49	403	Subscriber	94107
831509	1057	7/2/2015 10:14	2nd at Folsom	62	7/2/2015 10:31	South Van Ness at Market	66	631	Subscriber	94114
877160	306	8/5/2015 16:33	2nd at Folsom	62	8/5/2015 16:39	Beale at Market	56	527	Subscriber	94602
768619	840	5/15/2015 11:35	2nd at Folsom	62	5/15/2015 11:49	Market at 10th	67	604	Subscriber	94903
594999	185	1/7/2015 17:53	San Antonio Caltrain Station	29	1/7/2015 17:56	San Antonio Shopping Center	31	176	Subscriber	94040
701211	252	3/27/2015 16:26	2nd at Folsom	62	3/27/2015 16:30	Spear at Folsom	49	405	Subscriber	94105
487432	561	10/7/2014 17:48	2nd at Folsom	62	10/7/2014 17:58	Commercial at Montgomery	45	342	Subscriber	94107
610970	808	1/20/2015 13:28	2nd at Folsom	62	1/20/2015 13:42	Harry Bridges Plaza (Ferry Building)	50	310	Subscriber	94025
753668	196	5/5/2015 11:48	2nd at Folsom	62	5/5/2015 11:52	Temporary Transbay Terminal (Howard at Beale)	55	533	Subscriber	94973
466551	222	9/23/2014 18:12	2nd at Folsom	62	9/23/2014 18:16	2nd at Townsend	61	620	Subscriber	94107

... (2578 rows omitted)

`are.containing()`¶

tbl.where(column, are.containing(value))

Filter leaves rows only where the value in column contains the substring value.

[180]:

trips.where("End Station", are.containing("at"))

[180]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
876419	413	8/5/2015 8:29	Civic Center BART (7th at Market)	72	8/5/2015 8:36	Townsend at 7th	65	269	Subscriber	94518
459672	408	9/18/2014 17:11	Harry Bridges Plaza (Ferry Building)	50	9/18/2014 17:17	Embarcadero at Sansome	60	429	Subscriber	94111
903647	723	8/25/2015 7:26	San Francisco Caltrain 2 (330 Townsend)	69	8/25/2015 7:38	Market at 10th	67	631	Subscriber	94025
452829	409	9/15/2014 8:29	Steuart at Market	74	9/15/2014 8:36	Market at 4th	76	428	Subscriber	94925
491023	224	10/9/2014 16:13	Santa Clara at Almaden	4	10/9/2014 16:17	San Jose Diridon Caltrain Station	2	144	Subscriber	94117
723352	519	4/13/2015 17:04	Howard at 2nd	63	4/13/2015 17:12	San Francisco Caltrain (Townsend at 4th)	70	629	Subscriber	94061
524499	431	10/31/2014 16:36	Townsend at 7th	65	10/31/2014 16:43	Civic Center BART (7th at Market)	72	630	Subscriber	94706
518524	389	10/28/2014 8:48	Market at Sansome	77	10/28/2014 8:54	2nd at South Park	64	458	Subscriber	94610
793149	616	6/4/2015 5:26	Embarcadero at Bryant	54	6/4/2015 5:36	Embarcadero at Sansome	60	289	Subscriber	94105
681771	895	3/14/2015 11:46	Market at 10th	67	3/14/2015 12:01	Market at 4th	76	416	Subscriber	94107

... (78805 rows omitted)

`are.strictly_between()`¶

tbl.where(column, are.strictly_between(x, y))

Filter leaves rows only where the value in column is strictly greater than x and less than y (i.e. in the interval $(x, y)$).

[181]:

trips.where("Duration", are.strictly_between(100, 200))

[181]:

Trip ID	Duration	Start Date	Start Station	Start Terminal	End Date	End Station	End Terminal	Bike #	Subscriber Type	Zip Code
437830	151	9/4/2014 9:13	Grant Avenue at Columbus Avenue	73	9/4/2014 9:15	Commercial at Montgomery	45	306	Subscriber	94104
436255	195	9/3/2014 11:53	2nd at Folsom	62	9/3/2014 11:57	Spear at Folsom	49	403	Subscriber	94107
585884	151	12/26/2014 13:34	Broadway St at Battery St	82	12/26/2014 13:37	Harry Bridges Plaza (Ferry Building)	50	576	Subscriber	94107
548322	191	11/17/2014 20:10	Yerba Buena Center of the Arts (3rd @ Howard)	68	11/17/2014 20:13	Market at Sansome	77	29	Subscriber	94705
594999	185	1/7/2015 17:53	San Antonio Caltrain Station	29	1/7/2015 17:56	San Antonio Shopping Center	31	176	Subscriber	94040
468534	194	9/24/2014 19:08	Mechanics Plaza (Market at Battery)	75	9/24/2014 19:11	Harry Bridges Plaza (Ferry Building)	50	443	Subscriber	94107
873710	169	8/3/2015 17:20	Broadway St at Battery St	82	8/3/2015 17:23	Embarcadero at Sansome	60	532	Subscriber	94114
853087	168	7/20/2015 7:27	Temporary Transbay Terminal (Howard at Beale)	55	7/20/2015 7:30	2nd at Folsom	62	418	Subscriber	94602
863019	162	7/27/2015 8:31	Temporary Transbay Terminal (Howard at Beale)	55	7/27/2015 8:34	Mechanics Plaza (Market at Battery)	75	504	Subscriber	94111
883134	173	8/10/2015 15:11	Embarcadero at Folsom	51	8/10/2015 15:14	Beale at Market	56	363	Subscriber	94117

... (5056 rows omitted)

Miscellaneous Functions¶

`sample_proportions()`¶

sample_proportions(sample_size, model_proportions)

Samples sample_size objects from the distribution specified by model_proportions. sample_size should be an integer, model_proportions an array of probabilities that sum up to 1. It returns an array with the same size as model_proportions. Each item in the array corresponds to the proportion of times it was sampled out of the sample_size times.

[182]:

sample_proportions(100, [.5, .3, .2])

[182]:

array([0.32, 0.32, 0.36])

`minimize()`¶

minimize(function)

This function returns an array of values that minimize function. function should be a function that takes in a certain number of arguments and returns a number. The array returned by minimize is structured such that if each value in the array was passed into function as arguments, it would minimize the output value of function.

[190]:

def f(x, y):
    return 0.47 * x**2 + 1.23 * np.log(y)

minimize(f)

[190]:

array([ 5.17585792, -0.58835469])

Data 8 datascience Reference¶

Table Functions and Methods¶

Table()¶

Table.read_table()¶

tbl.with_column¶

tbl.column()¶

tbl.num_rows¶

tbl.num_columns¶

tbl.labels¶

tbl.select()¶

tbl.drop()¶

tbl.relabel()¶

tbl.show()¶

tbl.sort()¶

tbl.where()¶

tbl.take()¶

Table Visualizations¶

tbl.scatter()¶

tbl.plot()¶

tbl.barh()¶

tbl.hist()¶

Table.interactive_plots()¶

Table.static_plots()¶

Advanced Table Functions¶

tbl.apply()¶

tbl.group()¶

tbl.pivot()¶

tbl.join()¶

tbl.sample()¶

String Methods¶

str.split()¶

str.join()¶

str.replace()¶

Array Functions and Methods¶

max()¶

min()¶

sum()¶

abs()¶

round(num)¶

len()¶

make_array()¶

np.mean¶

np.std()¶

np.diff()¶

np.sqrt()¶

np.arange()¶

array.item()¶

np.random.choice¶

np.count_nonzero()¶

np.append()¶

percentile()¶

Table.where Predicates¶

are.equal_to()¶

are.above()¶

are.above_or_equal_to()¶

are.below()¶

are.below_or_equal_to()¶

are.between()¶

are.between_or_equal_to()¶

are.contained_in()¶

are.containing()¶

are.strictly_between()¶

Miscellaneous Functions¶

sample_proportions()¶

minimize()¶

Data 8 `datascience` Reference¶

`Table` Functions and Methods¶

`Table()`¶

`Table.read_table()`¶

`tbl.with_column`¶

`tbl.column()`¶

`tbl.num_rows`¶

`tbl.num_columns`¶

`tbl.labels`¶

`tbl.select()`¶

`tbl.drop()`¶

`tbl.relabel()`¶

`tbl.show()`¶

`tbl.sort()`¶

`tbl.where()`¶

`tbl.take()`¶

`Table` Visualizations¶

`tbl.scatter()`¶

`tbl.plot()`¶

`tbl.barh()`¶

`tbl.hist()`¶

`Table.interactive_plots()`¶

`Table.static_plots()`¶

Advanced `Table` Functions¶

`tbl.apply()`¶

`tbl.group()`¶

`tbl.pivot()`¶

`tbl.join()`¶

`tbl.sample()`¶

`str.split()`¶

`str.join()`¶

`str.replace()`¶

`max()`¶

`min()`¶

`sum()`¶

`abs()`¶

`round(num)`¶

`len()`¶

`make_array()`¶

`np.mean`¶

`np.std()`¶

`np.diff()`¶

`np.sqrt()`¶

`np.arange()`¶

`array.item()`¶

`np.random.choice`¶

`np.count_nonzero()`¶

`np.append()`¶

`percentile()`¶

`Table.where` Predicates¶

`are.equal_to()`¶

`are.above()`¶

`are.above_or_equal_to()`¶

`are.below()`¶

`are.below_or_equal_to()`¶

`are.between()`¶

`are.between_or_equal_to()`¶

`are.contained_in()`¶

`are.containing()`¶

`are.strictly_between()`¶

`sample_proportions()`¶

`minimize()`¶