Foundations of Data Science combines three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social issues surrounding data analysis such as privacy and design.
This course does not have any prerequisites beyond high-school algebra. The curriculum and format is designed specifically for students who have not previously taken statistics or computer science courses. Students with some prior experience in either statistics or computing are welcome to enroll and will find much of interest due to the innovative nature of the course. Students who have taken several statistics or computer science courses should instead take a more advanced course.
Lectures will be held every day for the first seven weeks of the course. Lectures will last from 9am to 10am from Monday to Thursday, and from 9am to 11am on Fridays. Lecture attendance is not mandatory, though lectures will not be webcast this summer. All materials presented in a lecture will be posted promptly after that lecture.
Our primary text is an online book called Computational and Inferential Thinking: The Foundations of Data Science. This text was written for the course by the course instructors.
The computing platform for the course is hosted at summer.datahub.berkeley.edu. Students find it convenient to use their own computer for the course. If you do not have adequate access to a personal computer, we have machines available for you; please contact your GSI.
You are not alone in this course; the staff and instructors are here to support you as you learn the material. It's expected that some aspects of the course will take time to master, and the best way to master challenging material is to ask questions. For online questions, use Piazza. We will also hold office hours for in-person discussions.
The rest of this page details the policies that will be enforced in the Summer 2018 offering of this course. These policies are subject to change until the beginning of the semester and throughout the remainder of the course, at the judgement of the course staff.
Bi-Weekly labs are a required part of the course. Labs will be released Saturday nights and Monday nights. You can get credit for each lab in one way described below:
Attend your own assigned lab section, make progress substantial enough for your work to be checked off by course staff, and submit your lab (even if it is incomplete) by the end of the lab period. Note that your submitted work need not be complete in order to receive full credit if you were checked off.
Complete the labs by Sunday nights and Tuesday night at 11:59pm respectively. If you choose this route, you must finish the entire lab to receive credit. This policy is not encouraged by the course staff, and is only recommended if you are sure that you will not be able to make lab a certain week.
Bi-Weekly homework assignments are a required part of the course. Each student must submit each homework independently, but are allowed to discuss problems with other students and course staff. See the "Learning Cooperatively" section below.
Data science is about analyzing real-world data sets, and so a series of projects involving real data are a required part of the course. On each project, you may work with a single partner; your partner must be from your assigned lab section.
The midterm exam will be held during the class period on Friday, July 13th. Rooms will be announced closer to the date.
The final exam will be held from 5pm to 8pm. on Thursday, August 9th. Rooms will be announced closer to the date.
Unless you have accommodations as determined by the university and approved by the instructor, you must take the midterm and the final at the dates and times provided here. Please check your course schedule and make sure that you have no conflicts with these exams. If you have a conflict with the final exam with another course, let the instructors know and they will see what can be done.
Grades will be assigned using the following weighted components:
Every assignment is weighted equally in its category. For example, there are 3 projects, so each project is worth (25/3)% of your grade.
Overall, in past semesters of Data 8, more than 40% of the students have received grades in the A+/A/A- range, and more than 35% have received grades in the B+/B/B- range.
Late submissions of labs will not be accepted under any circumstances. The same goes for homework, unless you have relevant DSP accommodations that provide a two-day extension on homework assignments.
Your lowest two homework scores and your lowest lab score will be dropped in the calculation of your overall grade.
Projects will be accepted up to 2 days (48 hours) late; a project submitted less than 24 hours after the deadline will receive 2/3 credit, a project submitted between 24 and 48 hours after the deadline will receive 1/3 credit, and a project submitted 48 hours or more after the deadline will receive no credit.
If you are on the waitlist, please do all coursework and attend labs and lectures in accordance to the deadlines.
If you enroll or join the waitlist after Week 1, you must contact your assigned Lab TA (see Staff Page) by email as soon as you add the course. The email must include your full name, Student ID number, date you added the course, and your @berkeley.edu email address. Once we receive your email, we will explain the prorating policy.
With the obvious exception of exams, we encourage you to discuss all of the course activities with your friends and classmates as you are working on them. You will definitely learn more in this class if you work with others than if you do not. Ask questions, answer questions, and share ideas liberally.
If some emergency takes you away from the course for an extended period, or if you decide to drop the course for any reason, please don't just disappear silently! You should inform your GSI and your project partner (if you have one), so that nobody is expecting you to do something you can't finish.
Collaboration has a limit, however. You should not share your code or answers directly with other students. Doing so doesn't help them; it just sets them up for trouble on exams.
Make a serious attempt at the assignment yourself, and then discuss your doubts with others. In this way you, and they, will get more out of the discussion.
Please write up your answers in your own words and don't share your completed work. The exception to this rule is that you can share everything related to a project with your project partner and turn in one project between you.
Penalties for cheating are severe - they range from a zero grade for the assignment or exam up to an F in the course, or even dismissal from the University.
Rather than copying someone else's work, ask for help. You are not alone in this course! The course staff is here to help you succeed. We expect that you will work with integrity and with respect for other members of the class, just as the course staff will work with integrity and with respect for you.
The main goal of Data 8 is that you should learn, and have a fantastic experience doing so. Please keep that goal in mind throughout the semester. Welcome to Data 8.