Course Description

Foundations of Data Science combines three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social issues surrounding data analysis such as privacy and design.


This course does not have any prerequisites beyond high-school algebra. The curriculum and format is designed specifically for students who have not previously taken statistics or computer science courses. Students with some prior experience in either statistics or computing are welcome to enroll, though some parts of the course will be slow. Students who have taken both statistics and computer science courses should instead take a more advanced course.

Materials & Resources

Our primary text is an online book called Computational and Inferential Thinking: The Foundations of Data Science. This text was written for the course by the course instructors.

The computing platform for the course is hosted at datahub.berkeley.edu. Students find it convenient to use their own computer for the course. If you do not have adequate access to a personal computer, we have machines available for you; please contact the instructor.


You are not alone in this course; the staff and instructors are here to support you as you learn the material. It's expected that some aspects of the course will take time to master, and the best way to master challenging material is to ask questions. For online questions, use Piazza. We will also hold office hours for in-person discussions.


Twice-a-week labs are a required part of the course and should be submitted during your lab session. To receive credit, you must attend lab, work on the lab assignment until you're finished or the lab period is over, and get checked off by a course staff member. Labs will be released on Sunday and Tuesday night. If you don't want to attend lab physically, you may complete a lab assignment remotely, but you must complete it by Tuesday and Thursday at 11:59pm, respectively, to receive credit. Note that if you attend lab, you can still get credit even if you don't finish all of the lab problems. However, if you choose to work remotely, you must finish the entire lab to receive credit. Each person must submit each lab independently, but you are welcome to collaborate with other students in your lab room.


Small-group tutoring sections will be available to a subset of students who sign up for them during the second week of classes. For students who have not programmed before, these sections will be an excellent use of your time. Details about sign-ups will be shared in lecture and posted here. Tutoring sessions are held in BIDS (Berkeley Institute of Data Science) unless your tutor contacts you otherwise. BIDS is in Doe Library, immediately to your left if you walk in through the Memorial Glade entrance.


Data science is about analyzing real-world data sets, and so a series of projects involving real data are a required part of the course. You may work with a single partner on each project, and we strongly recommend that you find a partner in your lab section.

Twice-a-week homework assignments are a required part of the course. Each student must submit each homework independently, but you are allowed to discuss problems with other students.


The midterm exam will be held in class (during the lecture period) on Friday, July 14. The final exam will be held from 2pm to 5pm on Thursday, August 10. Unless you have accommodations as determined by the university or permission from the instructor, you must take the midterm and the final at the dates and times provided here. Please check your course schedule and make sure that you have no conflicts with these exams.


There are no explicit points given for participation in this offering of Data 8. However, fulfilling the optional participation requirements will allow you to drop your lowest 2 homework and lowest 3 lab grades.

Participation is earned by attending lecture. Attendance will begin to count in week 2; the first week is optional.

The requirement for attending lecture is to attend at least 4 hours of lecture in each of at least 6 weeks, starting with week 2. Not attending could potentially save you 20-30 hours; just enough time to complete all the homeworks and labs. Attendance will be taken via a Google Form. Students caught faking attendance will fail the course. If you are intending to earn attendance credit but can't attend due to unforeseen circumstances, you are encouraged to contact the instructor instead of subverting the attendance system.


Grades will be assigned using the following weighted components:

Activity Grade
Lab 15%
Homework 25%
Projects 20%
Midterm 10%
Final 30%

Every assignment is weighted equally in its category. For example, there are 10 labs, so each one is worth 1.5% of your total grade.

The course will not be curved, but further details of grading criteria may not be announced until the end of the course. It is certainly possible for all students to receive high grades in this course if all of you show mastery of the material on exams and complete all assignments.

Learning Cooperatively

With the obvious exception of exams, we encourage you to discuss all of the course activities with your friends and classmates as you are working on them. You will definitely learn more in this class if you work with others than if you do not. Ask questions, answer questions, and share ideas liberally.

Since you're working collaboratively, keep your project partner and the course staff informed. If some medical or personal emergency takes you away from the course for an extended period, or if you decide to drop the course for any reason, please don't just disappear silently! You should inform your project partner, so that nobody is depending on you to do something you can't finish.

Academic Honesty

Cooperation has a limit, however. You should not share your code or answers directly with other students. Doing so doesn't help them; it just sets them up for trouble on exams. Feel free to discuss the problems with others beforehand, but not the solutions. Please complete your own work and keep it to yourself. The exception to this rule is that you can share everything related to a project with your project partner and turn in one project between you.

Penalties for cheating are severe — they range from a zero grade for the assignment or exam up to dismissal from the University, for a second offense.

Rather than copying someone else's work, ask for help. You are not alone in this course! The course staff is here to help you succeed. If you invest the time to learn the material and complete the projects, you won't need to copy any answers.

Late Submission

If you want to receive credit for an assignment that you will turn in after the deadline, you must ask your GSI before the deadline. Otherwise, late homework & lab will not be accepted. Late projects will be accepted for half credit up to 48 hours after the deadline. Extensions will only be offered in advance of the deadline and for exceptional circumstances.

A Parting Thought

This page shouldn't end with a list of penalties for cheating or lateness, because penalties and grades aren't the purpose of the course. We actually just want you to learn. Please keep that goal in mind throughout the semester. Welcome to Data 8.