## Resources

### Final Review!

• Comprehensive review of statistical concepts: steps, examples, purpose of topics like hypothesis testing, confidence intervals, correlation, regression, classification, two-sample inference, central limit theorem, Bayes' Rule, and more. Thanks to Francie McQuarrie for this!

### Tutoring Worksheets

• Worksheets from tutoring sections this semester are available for review. To use the provided files, you'll have to start by downloading the desired zip file to your computer and double clicking on it to expand it into a folder. Then, navigate to Jupyterhub and upload the files in the folder to your files by clicking the "upload" button in the top righthand corner and selecting the desired files. (Please note, you must upload the files from inside the unzipped folder, not the zip file itself.) Once your files are uploaded to Jupyterhub, open the notebook, which is a .ipynb file, and get to work! Email Emma for technical help with uploading files or come to office hours.

### Lab Slides

Lab slides for Nanxi and Katherine's section (MW, 1-3pm)

Midterms:

Finals:

### Staff Solutions

Please note that you will need to be signed into your berkeley.edu email account as your default account to access the Google Drive folders.

### Discussion Video Walkthroughs

We've compiled a list of additional questions for our datasets here if you'd like more practice or want to do your own independent data investigation.

### Table Functions and Methods

In the examples in the left column, np refers to the NumPy module, as usual. Everything else is a function, a method, an example of an argument to a function or method, or an example of an object we might call the method on. For example, tbl refers to a table, array refers to an array, and num refers to a number. array.item(0) is an example call for the method item, and in that example, array is the name previously given to some array.

Example Function Call Chapter Description
Table() 5 Creates an empty table, usually to extend with data.
Table().read_table(filename) 5 Creates a table from a data file.
tbl.with_column(name, values)
tbl.with_columns(n1, v1, n2, v2, ...)
5 A table with an additional or replaced column or columns. nameis a string for the name of a column, values is an array.
tbl.column(column_name_or_index) 5 The values of a column (an array)
tbl.num_rows 5 The number of rows in a table.
tbl.num_columns 5 The number of columns in a table.
tbl.labels 5 A list of the column labels in a table.
tbl.select(col1, col2, ...) 5 Creates a copy of a table with only selected columns. Each column is the column name or index.
tbl.drop(col1, col2, ...) 5 Creates a copy of a table without selected columns. Each column is the column name or index.
tbl.relabel(old_label, new_label) 5 Modifies the existing table in place, changing the column heading in the first argument to the second.
tbl.relabeled(old_label, new_label) 5 Returns a new table with the column heading in the first argument changed to the second.
tbl.sort(column_name_or_index) 5.1 Creates a copy of a table sorted by the values in a column. Defaults to ascending order unless optional argument "descending = True" is included.
tbl.where(column, predicate) 5.2 A table of the rows for which the column satisfies some predicate. See Table.where predicates below.
tbl.take(row_indices) 5.2 A table with only the rows at the given indices. row_indices is an array of indices.
tbl.scatter(x_column, y_column) 6 Draws a scatter plot consisting of one point for each row of the table. Note that x_column and y_column must be strings specifying column names.
tbl.barh(categories)
tbl.barh(categories, values)
6.1 Displays a bar chart with bars for each category in a column, with height proportional to the corresponding frequency. values argument unnecessary if table has only a column of categories and a column of values.
tbl.hist(column, units, bins) 6.2 Generates a histogram of the numerical values in a column. units and bins are optional arguments, used to label the axes and group the values into intervals (bins), respectively. Bins have the form [a, b).
tbl.apply(function, column) 7.1 Returns an array of values resulting from applying a function to each item in a column.
tbl.group(column_or_columns, func) 7.2, 7.3 Group rows by unique values or combinations of values in a column(s). Multiple columns must be entered in array or list form. Other values aggregated by count (default) or optional argument func.
tbl.pivot(col1, col2, vals, collect)
tbl.pivot(col1, col2)
7.3 A pivot table where each unique value in col1 has its own column and each unique value in col2 has its own row. Count or aggregate values from a third column, collect with some function. Default valsand collect return counts in cells.
tblA.join(colA, tblB, colB)
tblA.join(colA, tblB)
7.4 Generate a table with the columns of tblA and tblB, containing rows for all values of a column that appear in both tables. Default colB is colA. colA and colBmust be strings specifying column names.
tbl.sample(n)
tbl.sample(n, with_replacement)
9 A new table where n rows are randomly sampled from the original table. Default is with replacement. For sampling without replacement, use argument with_replacement=False. For a non-uniform sample, provide a third argument weights=distribution where distribution is an array or list containing the probability of each row.
proportions_from_distribution(tbl, prop_col, n) 10.1 Returns a copy of tbl with an additional column Random Sample containing the proportions of a n-sized random sample, drawn using the proportions in prop_col.

### Array Functions and Methods

Example Function Call Chapter Description
max(array) 3.3 Returns the maximum value of an array.
min(array) 3.3 Returns the minimum value of an array.
sum(array) 3.3 Returns the sum of the values in an array.
abs(num), np.abs(array) 3.3 Take the absolute value of number or each number in an array.
round(num), np.round(array) 3.3 Round number or array of numbers to the nearest integer.
len(array) 3.3 Returns the length (number of elements) of an array.
make_array(val1, val2, ...) 4.4 Makes a numpy array with the values passed in. Values must be the same data type.
np.average(array), np.mean(array) 4.4 Returns the average of the values in an array.
np.diff(array) 4.4 Returns a new array of size len(array)-1 with elements equal to the difference between adjacent elements; val_2 - val_1, val_3 - val_2, etc.
np.sqrt(array) 4.4 Returns an array with the square root of each element
np.arange(start, stop, step)
np.arange(start, stop)
np.arange(stop)
4.5 An array of numbers starting with start, going up in increments of step, and going up to but excluding stop. When start and/or step are left out, default values are used in their place. Default step is 1; default start is 0.
array.item(index) 4.6 Returns the i-th item in an array (remember Python indices start at 0!)
np.random.choice(array, n)
np.random.choice(array)
8 An array of items selected at random with replacement from an array. Default number of items is 1 if n is not specified.
np.count_nonzero(array) 8 Counts the number of non-zero (or True) elements in an array.
np.append(array, item) 8.2 Returns a copy of the input array with item (must be the same type as the other entries in the array) appended to the end.
percentile(percentile, array) 11.1 Returns the item at the corresponding percentile of an array.