Foundations of Data Science combines three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It also delves into social issues surrounding data analysis such as privacy and design.
This course does not have any prerequisites beyond high-school algebra. The curriculum and format is designed specifically for students who have not previously taken statistics or computer science courses. Students with some prior experience in either statistics or computing are welcome to enroll and will find much of interest due to the innovative nature of the course. Students who have taken several statistics or computer science courses should instead take a more advanced course.
Our primary text is an online book called Computational and Inferential Thinking: The Foundations of Data Science. This text was written for the course by the course instructors.
The computing platform for the course is hosted at datahub.berkeley.edu. Students find it convenient to use their own computer for the course. If you do not have adequate access to a personal computer, we have machines available for you; please contact your lab GSI.
You are not alone in this course; the staff and instructors are here to support you as you learn the material. It's expected that some aspects of the course will take time to master, and the best way to master challenging material is to ask questions. For online questions, use Piazza. We will also hold office hours for in-person discussions. Small-group tutoring sessions will be available for students in need of additional support to develop confidence with core concepts. In past semesters, students who attended have found these sessions to be a great use of their time. Details about sign-ups will be available later in the term
The rest of this page details the policies that will be enforced in the Spring 2019 offering of this course. These policies are subject to change until the beginning of the semester and throughout the remainder of the course, at the judgement of the course staff.
If you are on the waiting list, you must still do all coursework and complete labs and homework by the deadlines. We will not be offering extensions if you are admitted into the course later. So it is your responsibility to stay up to date on the assignments.
You are welcome to attend lecture and lab while you are on the waiting list. Rooms may feel a little crowded in the first week.
Unfortunately, doing all the work is not a guarantee of enrollment. You will only be enrolled if there is space in your lab. Enrollment will proceed by CalCentral.
The only lab you can attend is the one in which you are enrolled in CalCentral. You are required to attend lab in the first week of classes. Please check CalCentral to make sure you know when and where your lab is.
The weekly lab session has two components: questions and discussion (not using the computer) about recent material, and a lab assignment that develops skills with computational and inferential concepts. These lab assignments are a required part of the course and will be released on Monday mornings.
Lab sessions are not webcast. The set of questions covered in lab will be posted; for the related discussion, please attend the session.
You can get credit for each lab assignment in one of two ways described below:
Attend your own assigned lab section, make progress substantial enough for your work to be checked off by course staff, and submit your lab (even if it is incomplete) by the end of the lab period. Note that your submitted work need not be complete in order to receive full credit if you were checked off. However, you may only be checked off after the discussion portion of the lab and not before that. Please note that you must attend and participate in the entirety of the discussion portion of the lab in order to get checked off for the assignment.
Complete the lab on your own and submit the completed lab by Wednesday morning at 8:59 a.m. If you choose this route, you must finish the entire lab and pass all autograder tests to receive credit. Because missing lab means missing group discussion of important course concepts, we recommend that you don't use this option except in weeks when you are physically unable to come to lab. If you have finished your lab early, you can still attend and participate in the discussion.
Weekly homework assignments are a required part of the course. Each student must submit each homework independently, but is allowed to discuss problems with other students and course staff. See the "Learning Cooperatively" section below.
Data science is about analyzing real-world data sets, and so a series of projects involving real data are a required part of the course. On each project, you may work with a single partner; your partner must be from your assigned lab section. Both partners will receive the same score.
If you submit a homework or project 24 hours before the deadline or earlier, you will receive 1 bonus point on that assignment.
The midterm exam will be held on Friday March 15 from 7 p.m. to 9 p.m. Please note the date and time carefully. Rooms will be announced closer to the date.
The final exam is required for a passing grade, and will be held on Tuesday May 14 from 3 p.m. to 6 p.m. (Exam Group 7). Rooms will be announced closer to the date. Please double check your course schedule to make sure that you have no conflicting finals.
There will be no alternate exams. Unless you have accommodations as determined by the university and approved by the instructors, you must take the midterm and the final at the dates and times provided here. If you have accommodations, please provide the formal notification to your lab GSI before the end of the second week of classes.
Grades will be assigned using the following weighted components:
In past semesters of Data 8, more than 40% of the students received grades in the A+/A/A- range and more than 35% received grades in the B+/B/B- range.
Late submissions of labs will not be accepted under any circumstances. The same goes for homework and projects, unless you have relevant university accommodations. If you have such accommodations, please provide the formal notification to your lab GSI before the end of the second week of classes.
Your two lowest homework scores and your lowest lab score will be dropped in the calculation of your overall grade. There will be no alternate due dates for assignments missed due to illness, other commitments, and so on. The drops are intended to cover those situations.
Projects will be accepted up to 2 days (48 hours) late. Projects submitted fewer than 24 hours after the deadline will receive 2/3 credit, and projects submitted between 24 and 48 hours after the deadline will receive 1/3 credit. Projects submitted 48 hours or more after the deadline will receive no credit.
We encourage you to discuss course content with your friends and classmates as you are working on your weekly assignments. No matter what your academic background, you will definitely learn more in this class if you work with others than if you do not. Ask questions, answer questions, and share ideas liberally.
If some emergency takes you away from the course for an extended period, or if you decide to drop the course for any reason, please don't just disappear silently! You should inform your lab GSI and your project partner (if you have one) immediately, so that nobody is expecting you to do something you can't finish.
You must write your answers in your own words, and you must not share your completed work. The exception to this rule is that you can share everything related to a project with your project partner (if you have one) and turn in one project between the two of you.
Make a serious attempt at every assignment yourself. If you get stuck, read the textbook and go over the lectures and lab discussion. After that, go ahead and discuss any remaining doubts with others, especially the course staff. That way you will get the most out of the discussion.
It is important to keep in mind the limits to collaboration. As noted above, you and your friends are encouraged to discuss course content and approaches to problem solving. But you are not allowed to share your code or answers with other students. Doing so is academically dishonest, and it also doesn't help them: it just sets them up for trouble on the next assignment and on exams.
You are also not permitted to turn in answers or code that you have obtained from others. Not only is such copying dishonest, it misses the point of the assignments, which is not for you to find the answers somewhere and send them along to the staff. It is for you to figure out how to solve the problems, with the support available in the course.
Please read Berkeley's Code of Conduct carefully. Penalties for cheating in Data 8 are severe and include reporting to the Center for Student Conduct. They might also include a F in the course or even dismissal from the university. It's just not worth it.
All you have to do is ask staff for help when you need it. You are not alone in Data 8! Instructors and staff are here to help you succeed. We expect that you will work with integrity and with respect for other members of the class, just as the course staff will work with integrity and with respect for you.
The main goal of the course is that you should learn, and have a fantastic experience doing so. Please keep that goal in mind throughout the semester. Welcome to Data 8.