The following are some general concepts and motivations behind the choices that were made in teaching Data 8. They give an idea for why particular decisions were made about the topics covered and technologies used.

Major goals of Data 8

  • Accessibility and Equity: Students from all backgrounds should be able to take Data 8. As such, no prerequisites in statistics or programming are required for the course; only basic high-school algebra are necessary.

  • Diversity: Data 8 can be taken by students from any major across campus, and should be acceptable as a potential pre-requisite for statistics, math, or computing many majors. Currently, more than 50 different majors each semester, with no particular major comprising of more than 20% of the class.

  • Pedagogical Clarity: Data 8 is designed to first teach introductory programming, then statistics through a computation lens, and ultimately concludes with basic methods in inference.

  • Scalability: The course must meet the growing demand from students in order to be widely accessible at Berkeley. Data 8 has to grown be taken by more than 1200 students each semester.

Core concepts / inspirations

  • Leverage the combination of Computer Science and Statistics

  • Come away with practical data science skills applicable to any domain

  • Be able to conduct robust inference from limited data

  • Be able to run experiments and test hypotheses

  • Know to use the right statistical tools for different tasks

  • Quantify and understand uncertainty in data

  • Harness the power of computation and simulation in conducting data science

  • Illustrate the above concepts with real-world data from a variety of domains

Large decisions made in teaching Data 8

  • Shield the students from the topics that take away from the core concepts noted above

  • Aim the course for anybody, not just statistics or CS majors. Thus, Data 8 begins does not have a statistics or programming prerequisite.

    • The course begins with teaching basic programming in python.

    • Use the datascience module instead of learning (complex) package-specific APIs

  • Use a JupyterHub to not force students to set up their own environments, also creating equitable computing environments.

  • Provide pre-collected/cleaned data, allowing students to avoid data-cleaning

  • Carry out complex probabilistic and statistical concepts through simulation Concepts such as industry-adopted packages or data-cleaning are covered in subsequent courses, and a more advanced formalization of the CS and statistical concepts will occur in later classes like Data 100 or Data 102.