Lecture 0
Duke University
STA 199 Summer 2026: Session 2
Adapted from slides by Mine Çetinkaya-Rundel, Katie Solarz & John Zito
May 13, 2026
Instructor
Josh Lim
josh.lim@duke.edu
TA, Lab Instructor
Kenna Roberts
makenna.roberts@duke.edu











First half:
Data science
Second half:
Statistical thinking
Quantifying our uncertainty about that knowledge.
Campaign manager: What is the probability that our candidate wins the election?
(A flurry of analysis takes place.)
Data scientist: Our best guess is 54%.
Campaign manager: How reliable is that estimate? How confident are we in that? What’s the margin of error?
Parallel Universe 1
Data scientist: It’s 54% give or take 3%.
Parallel Universe 2
Data scientist: It’s 54% give or take 20%.
It’s all about decision-making under uncertainty
The manager is going to make wildly different decisions about campaign strategy and spending depending on how uncertain the environment is.



https://sta199-su26-2.github.io
Roughly one per lecture
Graded for good-faith attempt, not accuracy
Practice this week; graded thereafter
At least one commit to your AE repo by 10:45am of the day of lecture
Complete 80% for full lecutre attendance credit
Lab session formally takes place on Mondays and Thursdays following lecture (11:00am - 12:15pm)
Labs are to be started during the lab session & completed at home by the posted due date
I encourage you to make the most of lab sessions, as you have access to both your peers and Kenna, the course TA, during this time
Due dates (typically):
Monday Lab: Due Wednesday at 11:59 PM
Thursday Lab: Due Sunday at 11:59 PM
Discussion with classmates = 🤩 ; Copying = ❌
Lowest lab score is dropped
Two exams, each 25%
Midterm: June 1, during lecture + lab (tentatively) CHANGE
Final: June 24, 9am - 12pm CHANGE
You will be permitted a “cheat sheet” (both sides of a single 8.5” x 11” piece of paper)
Caution
It’s possible the first midterm gets bumped to June 2; this will be communicated by next Wednesday, 5/20. The final exam date is above my pay grade & cannot be changed. If you cannot take the exams on these dates, please have a discussion with me today. CHANGE
Dataset of your choice, method of your choice
Teamwork
Presentation and write-up
Presentations will take place in the last lab (June 22)
Interim deadlines, peer review on content, peer evaluation for team contribution
Some lab sessions allocated to project progress
Caution
Final presentation date cannot be changed; you must complete the project and participate in project presentations to pass this class.
| Category | Percentage |
|---|---|
| Labs | 20% |
| Project | 20% |
| Exam 1 | 25% |
| Exam 2 | 25% |
| Application Exercises | 5% |
| Lab Attendance | 5% |
See course syllabus for how the final letter grade will be determined.
Josh: Old Chem 203 CHANGE
Kenna: Time & Location TBD
All linked from the course website:
The Student Disability Access Office (SDAO) is available to ensure that students are able to engage with their courses and related assignments.
I am committed to making all course materials accessible, and I’m always open to feedback on how to do this better!
If you need testing accommodations
Make sure I get a letter, and make your appointments in the Testing Center now.
Labs: discussing and helping one another is fine; sharing your solutions via text, email, AirDrop, carrier pigeon, or any other method, and / or copying from others is not permitted;
Exams: collaboration of any kind is completely forbidden
Projects: collaboration of all kinds is enthusiastically encouraged within your team; between teams, it’s the same as labs; do not directly share your materials or copy from others.
AI tools for code:
!= correct / good code.AI tools for narrative: Absolutely not!
AI tools for learning: Sure, but be careful/critical!
Caution
Exception: Use of AI tools is completely forbidden during lab session. When you are in lab, you have far better tools / resources available to you - Kenna, our TA, and each other! Blatant disregard for this policy will result in a 0 for the current lab assignment.
To uphold the Duke Community Standard:
I will not lie, cheat, or steal in my academic endeavors;
I will conduct myself honorably in all my endeavors; and
I will act if the Standard is compromised.
The text below can be found at this link, which is a post on the website of a personal injury law firm. Suppose we want to investigate the validity of their claim… What data might we want? What methods are appropriate? How will we perform our analysis? How can we best communicate our findings?



On 4/22/2025, TidyTuesday posted a tidy version of the raw data analyzed in Harper and Palayew’s (2019) study, “The annual cannabis holiday and fatal traffic crashes,” available here.
Let’s load this data into R…



ggplot(daily_accidents_420, aes(fatalities_count, fill = e420)) +
geom_histogram() +
facet_wrap(~ e420, ncol = 1, scales = "free_y") +
labs(title = "Histogram of Daily Traffic Accident Fatality Count, 1992-2016",
subtitle = "On the 'High Holiday' (4/20), vs. Not",
x = "Daily Traffic Accident Fatality Count (1992 - 2016)",
y = "Count of Observations") +
theme_minimal() +
theme(legend.position = "none") 




60 four-year-olds randomly assigned to 3 conditions (20 each):
Then researchers measured executive function (EF) immediately after.
Three tasks:
Z-scores for the first 3 EF tasks were summed into a composite EF score.
No baseline!
Executive function was only measured after the experiment — we don’t know if the groups were comparable before watching anything.
What metrics?
Tower of Hanoi scored 0 or 1. HTKS is a children’s game. Are these really measuring “brain impairment”?


More on this tomorrow - basically, it is the Google Drive of coding!
Find AE 00 on the course website!