We have weekly lectures Wednesday mornings 9-9:50 am PST. On Tuesdays, you may attend one of the two lab sessions, 1-4 pm or 7-10 pm PST. You should always attend the same lab session (either 1 pm or 7 pm) along with your teammates, unless you have a conflict and let the course instructors know.
We also have TA recitations on Thursdays 7–8:30 pm PST, homework help on Thursday 8:30–10 pm PST, and instructor office hours on Wednesdays 2:30–3:30 pm PST. Attendance at these three activities is not required.
Tuesday and Thursday meetings are normally in Chen 130, and Wednesday meetings are in Chen 100. Wednesday meetings are in Broad 200.
If we need to do remote learning for any portion of the term, class meetings will be at this Zoom link. This includes the first week of the term.
The lab sessions are spent working on the week’s homework, which always includes working with real data sets with your teammates. You are expected to be working diligently during this time, and it is a golden opportunity to do so. The course staff will be there to help you.
Lessons and lesson exercises
Prior to each lab session, you must go through the lessons listed on the schedule page for the week. These will give you the requisite skills you need to work on the homework problems of the week. To verify completion of the lessons and to identify points of confusion ahead of time, you will individually need to commit a small exercise to the team repository in the GitHub group. Place this in the
lesson_exercises/ directory in a file named
## is the number of the lesson. Lesson exercises are due at noon PDT on the Sunday before the lab session.
The lesson exercises are not graded for correctness, but for thoughtfulness. A perfectly reasonable answer to a problem in a lesson exercise is, “I am really having trouble with this concept. Can you please go over it in class?”
The BE/Bi 103 GitHub group
A GitHub group is set up for the class. You will be part of the group through your GitHub account. You will have access to your personal repository for solo homework problems and to a team repository for all other homework and lesson exercise submission. All homeworks and lesson exercises are submitted by pushing to the appropriate GitHub repository.
There are weekly homework assignments. These consist almost entirely of working up real data sets, though there are some theoretical aspects and may be some analysis of fabricated data.
Data analysis is almost always a collaborative effort in both research and industry. Therefore, you will be assigned to teams of three (possibly with some teams of four depending on course enrollment). For most problems, you will submit your homework as a team. For problems marked “SOLO,” you will not collaborate with your team or other students in the class, but submit your own work. The following homework policies apply.
Each homework has a defined due date and time. For most homeworks, this is Friday at 5 pm PST. Your team must tag your completed homework in your team’s homework repository before this time. For solo submissions, you must tag your completed homework in your own repository before this time.
The commit containing your final submission must be tagged with
##is the two-digit homework assignment number (e.g., 04). Failure to properly tag your homework will result in a 5% deduction from your point total. This sounds harsh, but we have a lot of problem sets to manage and have automated scripts to do it. These will fail if you do not properly name your files.
Each homework problem must be committed as both as single Jupyter notebook and as that notebook converted to HTML. (To convert a notebook to HTML, you can use Jupyter’s menu: File ⟶ Export Notebook As… ⟶ Export Notebook to HTML.) The file names must be
hw#.#.html. For example, homework problem 3.2 is in the files
When you submit an assignment, the notebook and the rendered HTML must match. Furthermore, the cells should be executed in order.
If you wrote any other files that are necessary for your homework, such as Stan files, they should live in the same directory as your Jupyter notebook and HTML file. Additionally, you should also display any code you wrote in external files in your Jupyter notebook/HTML file. You should not, however, commit .hpp or .o files, nor executables. You generally should not commit files containing samples either, unless necessary for explanatory purposes or some other custom analysis you are considering.
All code you wrote to do your assignment must be included in the notebook. Code from imported packages that you did not write (e.g., modules distributed by the class instructors) need not be displayed in the notebook. We will often run the code in your notebook; all code must run to get credit.
Since we are often running your code to check it, you must have the data sets be accessed in the standard way for the class. That is to say, the following code (or something similar that sets up the correct directory structure) must be at the top of each submitted notebook that uses a data set.
import os, sys if "google.colab" in sys.modules: data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/" else: data_path = "../data/"
When accessing files within your notebooks, do it with something like this:
filename = os.path.join(data_path, 'name_of_datafile.csv').
All of your results must be clearly explained and all graphics clearly presented and embedded in the Jupyter notebook.
Any mathematics in your homework must render clearly and properly with MathJax. This essentially means that your equations must be written in correct LaTeX.
Where appropriate, you need to give detailed discussion of analysis choices you have made. As an example, you may choose a particular graphical method for model comparison.
To give a better guideline on how to construct your assignments (and this is good practice in general in your own workflows), you should follow these guidelines.
Each code cell should do only one task or define only one, simple function.
Do not have any adjacent code cells. Thus, you should have explanatory text describing what is happening in the code cells. This text should not just explain the code, but your reasoning behind why you are doing the calculation.
Show all equations.
Use Markdown headers to delineate sections of your notebook. In this class, this at least means using headers to delineate parts of the problem.
Because this is important to make your work clear, the TAs will deduct points if these rules are not followed.
There is seldom a single right way to analyze a set of data. You are encouraged to try different approaches to analysis. If you perform an analysis and find problems with it, clearly write what the problems are and how they came about. Even if your analysis does not completely work, but you demonstrate that you thought carefully about it and understand its difficulties, you will get nearly full credit.
You should also include attribution in your homework submission: who on the team did what. While different people on the team may do different parts of the homework, I encourage you to work together on all parts of the homework. At the very least, you personally must understand all of the steps taken in the homework solutions and be able to repeat them by yourself.
Throughout the term, your team will have six “grace days” for late homeworks. For example, your team can submit homework 1 two days late, homework 6 three days late, and homework 8 one day late. After that, no more late homeworks will be accepted. No grace days may be applied to the last two homeworks due to the end of term schedule. The instructor will not grant less than six grace days, but reserves the right to grant more grace days.
For solo assignments, you will have a total of two grace days for the term, again with no grace days applied to the last two homeworks.
85% of your grade is determined from homework. Everyone on your team will get the same grade on the team homeworks, but naturally the solo homeworks may have different scores.
15% of your grade is determined from submission of your lesson exercises and participation in the lab sessions. You are expected to work together with your team members and course instructors with your full attention during the lab sessions.
Collaboration policy and Honor Code
Some of the data we will use in this course is unpublished, generously given to us by researchers both from Caltech and from other institutions. They have given us their data in good faith that it will be used only in this class. It is therefore imperative that you do not disseminate data sets that I ask you not to distribute anywhere outside of this class.
Since most of the homework is done in assigned teams, you obviously should collaborate heavily with the other members of your team. You are free to discuss the homework with other teams, including via Ed, but the work you submit must be the work of your own team. Solo homework problems are to be done entirely on your own.
You may not consult solutions of homework problems from previous editions of this course.
You are free to consult references, literature, websites, blogs, etc., outside of the materials presented in class. (The exceptions are places where homework problems are completely worked out, such as previous editions of this or other courses, or other students’ work.) In fact, you are encouraged to do so. If you do, you must properly cite the sources in your homework. Be warned: doing homework by Google fishing will not work! The problems are too open ended and the techniques are too varied.
Excused absences and extensions
Under certain circumstances, missed lab or lecture sessions will be excused and extensions given on the homework without costing grace days. They must be requested from the course instructor.
You are free to contact the course staff via email at any time, but we encourage you to use the class Ed page for questions related to course topics and homework. Most of our mass communication with you will be through Ed, so be sure to set your Ed account to give you email alerts if necessary.
When posting on Ed, please follow these guidelines:
Appropriately tag your question or comment with the appropriate category (e.g., “Lesson12” or “HW5”).
If you have a question about a coding bug make every attempt to provide a minimal example that demonstrates the problem. A minimal example strips out all other details beyond what is necessary to reproduce the problem or bug. Posting error messages without code is seldom helpful.
Better than a screen shot is a code snippet as text.
If you feel that posting a minimal example will result in showing too much of your answer to your classmates, you can post your question on Ed privately so that only the course staff can see it.
While you are free to post anonymously to your classmates (course staff will always know who posts), we encourage you to post with your real name. This can spur discussions among students, which can be productive.
Course staff strives to answer questions quickly, but students should answer when they can. This also spurs more conversation and results in faster answers to questions.