Course Description

Foundations of Data Science combines three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It also delves into social issues surrounding data analysis such as privacy and study design.


This course does not have any prerequisites beyond high-school algebra. The curriculum and format is designed specifically for students who have not previously taken statistics or computer science courses. Students with some prior experience in either statistics or computing are welcome to enroll and will find much of interest due to the innovative nature of the course. Students who have taken several statistics or computer science courses should instead take a more advanced course.

Materials & Resources

Our primary text is an online book called Computational and Inferential Thinking: The Foundations of Data Science. This text was written for the course by the course instructors.

The computing platform for the course is hosted at datahub.berkeley.edu. Students find it convenient to use their own computer for the course. If you do not have adequate access to a personal computer, we have machines available for you; please contact your lab GSI.


You are not alone in this course; the staff and instructors are here to support you as you learn the material. It's expected that some aspects of the course will take time to master, and the best way to master challenging material is to ask questions. For questions, use Piazza. We will also hold virtual office hours for real-time discussions. Small-group tutoring sessions will be available for students in need of additional support to develop confidence with core concepts. In past semesters, students who attended have found these sessions to be a great use of their time. Details about sign-ups will be available later in the term.


The rest of this page details the policies that will be enforced in the Summer 2020 offering of this course. These policies are subject to change until the beginning of the semester and throughout the remainder of the course, at the judgement of the course staff.

All times listed below are in Pacific Daylight Time (PDT).

Waitlisted Students

If you are on the waitlist, you must still do all coursework and complete labs and homework by the deadlines. We will not be offering extensions if you are admitted into the course later. So it is your responsibility to stay up to date on the assignments.

You are welcome to attend discussions, labs and live lecture sessions while you are on the waiting list. Pre-recorded lecture videos will also be provided to you. Discussions and labs sessions held over Zoom may feel crowded in the first week class, but will free up as the semester progresses.

Unfortunately, doing all the coursework is not a guarantee of enrollment. You will only be enrolled if there is space in a lab. Enrollment will proceed by CalCentral.


The majority of the course's core content is taught via short, pre-recorded lecture videos. These videos are organized into modules, each corresponding to a foundational topic in data science. Modules contain anywhere from an hour and a half to two hours worth of videos. Two modules will be released each week: one on Sunday, and one on Tuesday. Students are responsable for watching these videos in their own time. We encourage you to start as soon as possible so as not to fall behind.

Live Lecture Sessions

Live lecture sessions will be held on Mondays and Wednesdays from 10 to 11am over Zoom. These lectures will be used to highlight and review vital concepts introduced by the modules. Accompanying notebooks with examples will typically be provided to students. Recordings of these sessions will be provided, though students are highly encouraged to attend in real time.

Data Exploration Sessions

The materials covered each week in lecture videos, labs and discussions will be presented and discussed in the context of real-world data analyses every Friday from 10-11am. This lecture series will teach students to apply the course’s core concepts to real data science problems. In particular, these sessions will model the individual components of the final project.

Recordings of these sessions will be provided, as will all notebooks, datasets, and slide decks. However, we recommend that students attend in real-time. Participation is encouraged!


Lab assignments are a required part of the course and will be released twice a week on Sundays and Tuesdays. To get credit for each lab, you must finish the entire lab and pass all autograder tests. Labs released on Sundays are due on Tuesdays at 11:59 PM. Labs released on Tuesdays are due on Thursdays at 11:59 PM.

Labs are held on Monday and Wednesdays at 11 am, 1 pm, 3 pm, and 5 pm over Zoom. You may attend any lab, regardless of the section you have officially enrolled in.


Just like labs, discussion sections are an integral part of the course. Students are expected to either attend discussion sections held on Tuesdays and Thursdays or review the associated discussion recordings and worksheets. We will not be recording attendance or checking for worksheet completion.

If you have questions related to discussion materials but cannot make one of the scheduled sections, post on the relevant Piazza thread or attend office hours.

Homework and Projects

Homework assignments are a required part of the course. They will be released on Mondays and Thursdays, and due on Thursdays and Sundays. Each student must submit each homework independently, but is allowed to discuss problems with other students and course staff. See the "Learning Cooperatively" section below.

Data science is about analyzing real-world data sets, and so a series of projects involving real data are a required part of the course. On each project, you may work with a single partner. Both partners will receive the same score.

Midterm Exam

The midterm exam will be held on Friday, July 17 from 10:00 a.m. to 12:00 p.m.. Please note the date and time carefully.

There will be no alternate midterm exam. Unless you have accommodations as determined by the university and approved by the instructors, you must take the midterm at the date and time provided here. If you have accommodations, please provide the formal notification to your lab GSI before the end of the first week of classes.

Final Project

There will be a final project in place of a final exam. Students will work in pairs to showcase their newfound skills in a cumulative data exploration project to critically analyze a dataset. Project guidelines will be released after the midterm on Piazza.


Grades will be assigned using the following weighted components:

Activity Grade
Lab 10%
Homework 20%
Projects 20%
Midterm 20%
Final Project 30%

In past semesters of Data 8, more than 40% of the students received grades in the A+/A/A- range and more than 35% received grades in the B+/B/B- range.

Instructors and TAs will not release grade bins during the semester, and they will be created after all grades come in by the instructors at the end of the semester. No staff members know this information, including the instructors, so please do not ask us.

Grades for Homeworks, Projects, and Labs will be posted about 1 week after the assignment's due date. There are 2 scores, OK and Gradescope, that add up together to make your total assignment score. Solutions to the assignment and common mistakes will also be posted on Piazza. It is up to the student to check the solutions and request a regrade before the regrade deadline. Regrades for coding OK questions should be emailed to the lab TA and regrades for written Gradescope questions can be made on Gradescope. Any regrade request past the deadline will not be looked at; this is to enforce the same deadline across all students, so please do not delay in reviewing your work.

For the midterm exam, there will be a regrade submission window. Again, it is up to the student to look at the solutions and common mistakes before submitting a regrade. Requests where a rubric item was incorrectly selected or not selected will be reviewed, but any regrade requests that ask to change the rubric or for partial credit will be ignored. Any regrade request will result in a complete regrade of the exam, so the score may increase or decrease. Please be sure that the regrade requests are reasonable and not frivolous.

Late Submission

Late submissions of labs will not be accepted under any circumstances. The same goes for homework and projects, unless you have relevant university accommodations. If you have such accommodations, please provide the formal notification to your lab GSI before the end of the first week of classes.

Your two lowest homework scores and your two lowest lab scores will be dropped in the calculation of your overall grade. There will be no alternate due dates for assignments missed due to illness, other commitments, and so on. The drops are intended to cover those situations.

Projects will be accepted up to 2 days (48 hours) late. Projects submitted fewer than 24 hours after the deadline will receive 2/3 credit, and projects submitted between 24 and 48 hours after the deadline will receive 1/3 credit. Projects submitted 48 hours or more after the deadline will receive no credit.

Learning Cooperatively

We encourage you to discuss course content with your friends and classmates as you are working on your assignments. No matter your academic background, you will learn more if you work alongside others than if you work alone. Ask questions, answer questions, and share ideas liberally.

If some emergency takes you away from the course for an extended period, or if you decide to drop the course for any reason, please don't just disappear silently! You should inform your lab GSI and your project partner (if you have one) immediately, so that nobody is expecting you to do something you can't finish.

Academic Honesty

You must write your answers in your own words, and you must not share your completed work. The exception to this rule is that you can share everything related to a project with your project partner (if you have one) and turn in one project between the two of you.

Make a serious attempt at every assignment yourself. If you get stuck, read the textbook and go over the lectures and lab discussion. After that, go ahead and discuss any remaining doubts with others, especially the course staff. That way you will get the most out of the discussion.

It is important to keep in mind the limits to collaboration. As noted above, you and your friends are encouraged to discuss course content and approaches to problem solving. But you are not allowed to share your code or answers with other students. Doing so is academically dishonest, and it doesn't help them either. It sets them up for trouble on upcoming assignments and on the midterm exam.

You are also not permitted to turn in answers or code that you have obtained from others. Not only is such copying dishonest, it circumvents the pedagogical goals of an assignment. You must solve problems with the resources made available in the course.

Please read Berkeley's Code of Conduct carefully. Penalties for cheating in Data 8 are severe and include reporting to the Center for Student Conduct. They might also include a F in the course or even dismissal from the university. It's just not worth it!

When you need help, reach out to the course staff using Piazza, in office hours, and/or during live labs and discussions. You are not alone in Data 8! Instructors and staff are here to help you succeed. We expect that you will work with integrity and with respect for other members of the class, just as the course staff will work with integrity and with respect for you.

Finally, know that it's normal to struggle. Berkeley has high standards, which is one of the reasons its degrees are valued. Everyone struggles even though many try not to show it. Even if you don't learn everything that's being covered, you'll be able to build on what you do learn, whereas if you cheat you'll have nothing to build on. You aren't expected to be perfect; it's ok not to get an A.

A Parting Thought

The main goal of the course is that you should learn, and have a fantastic experience doing so. Please keep that goal in mind throughout the semester. Welcome to Data 8!