Lecture 1
Duke University
STA 199 Summer 2026: Session 2
Adapted from slides by Mine Çetinkaya-Rundel, Katie Solarz & John Zito
May 14, 2026
Getting to know you survey, please do so ASAPCourse operation
Doing data science
tidyverse and friendsBy the end of the course, you will be able to…
Computational reproducibility:
Scientific replication:
Our tools will help you achieve the first, which is necessary (but not sufficient!) for the second.
What does it mean for a data analysis to be “reproducible”?
Short-term goals:
Long-term goals:
Some best practices from the American Statistical Association







Option 1:
Sit back and enjoy the show!
Option 2:
Go to your container and launch RStudio.

Packages: Fundamental units of reproducible R code, including reusable R functions, the documentation that describes how to use them, and sample data1
As of 13 May 2026, there are 23,731 R packages available on CRAN (the Comprehensive R Archive Network)2
We’re going to work with a small (but important) subset of these!
install.packages(), once per system:Note
We already pre-installed many of the package you’ll need for this course, so you might go the whole semester without needing to run install.packages()!
library(), once per session:penguins data frame# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm
<fct> <fct> <dbl> <dbl> <int>
1 Adelie Torgersen 39.1 18.7 181
2 Adelie Torgersen 39.5 17.4 186
3 Adelie Torgersen 40.3 18 195
4 Adelie Torgersen NA NA NA
5 Adelie Torgersen 36.7 19.3 193
6 Adelie Torgersen 39.3 20.6 190
7 Adelie Torgersen 38.9 17.8 181
8 Adelie Torgersen 39.2 19.6 195
9 Adelie Torgersen 34.1 18.1 193
10 Adelie Torgersen 42 20.2 190
# ℹ 334 more rows
# ℹ 3 more variables: body_mass_g <int>, sex <fct>, year <int>
bill_length_mm [1] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42.0 37.8 37.8 41.1
[14] 38.6 34.6 36.6 38.7 42.5 34.4 46.0 37.8 37.7 35.9 38.2 38.8 35.3
[27] 40.6 40.5 37.9 40.5 39.5 37.2 39.5 40.9 36.4 39.2 38.8 42.2 37.6
[40] 39.8 36.5 40.8 36.0 44.1 37.0 39.6 41.1 37.5 36.0 42.3 39.6 40.1
[53] 35.0 42.0 34.5 41.4 39.0 40.6 36.5 37.6 35.7 41.3 37.6 41.1 36.4
[66] 41.6 35.5 41.1 35.9 41.8 33.5 39.7 39.6 45.8 35.5 42.8 40.9 37.2
[79] 36.2 42.1 34.6 42.9 36.7 35.1 37.3 41.3 36.3 36.9 38.3 38.9 35.7
[92] 41.1 34.0 39.6 36.2 40.8 38.1 40.3 33.1 43.2 35.0 41.0 37.7 37.8
[105] 37.9 39.7 38.6 38.2 38.1 43.2 38.1 45.6 39.7 42.2 39.6 42.7 38.6
[118] 37.3 35.7 41.1 36.2 37.7 40.2 41.4 35.2 40.6 38.8 41.5 39.0 44.1
[131] 38.5 43.1 36.8 37.5 38.1 41.1 35.6 40.2 37.0 39.7 40.2 40.6 32.1
[144] 40.7 37.3 39.0 39.2 36.6 36.0 37.8 36.0 41.5 46.1 50.0 48.7 50.0
[157] 47.6 46.5 45.4 46.7 43.3 46.8 40.9 49.0 45.5 48.4 45.8 49.3 42.0
[170] 49.2 46.2 48.7 50.2 45.1 46.5 46.3 42.9 46.1 44.5 47.8 48.2 50.0
[183] 47.3 42.8 45.1 59.6 49.1 48.4 42.6 44.4 44.0 48.7 42.7 49.6 45.3
[196] 49.6 50.5 43.6 45.5 50.5 44.9 45.2 46.6 48.5 45.1 50.1 46.5 45.0
[209] 43.8 45.5 43.2 50.4 45.3 46.2 45.7 54.3 45.8 49.8 46.2 49.5 43.5
[222] 50.7 47.7 46.4 48.2 46.5 46.4 48.6 47.5 51.1 45.2 45.2 49.1 52.5
[235] 47.4 50.0 44.9 50.8 43.4 51.3 47.5 52.1 47.5 52.2 45.5 49.5 44.5
[248] 50.8 49.4 46.9 48.4 51.1 48.5 55.9 47.2 49.1 47.3 46.8 41.7 53.4
[261] 43.3 48.1 50.5 49.8 43.5 51.5 46.2 55.1 44.5 48.8 47.2 NA 46.8
[274] 50.4 45.2 49.9 46.5 50.0 51.3 45.4 52.7 45.2 46.1 51.3 46.0 51.3
[287] 46.6 51.7 47.0 52.0 45.9 50.5 50.3 58.0 46.4 49.2 42.4 48.5 43.2
[300] 50.6 46.7 52.0 50.5 49.5 46.4 52.8 40.9 54.2 42.5 51.0 49.7 47.5
[313] 47.6 52.0 46.9 53.5 49.0 46.2 50.9 45.5 50.9 50.8 50.1 49.0 51.5
[326] 49.8 48.1 51.4 45.7 50.7 42.5 52.2 45.2 49.3 50.2 45.6 51.9 46.8
[339] 45.7 55.8 43.5 49.6 50.8 50.2
flipper_length_mmThis can be fixed by using penguins$flipper_length_mm.
[1] 181 186 195 NA 193 190 181 195 193 190 186 180 182 191 198 185
[17] 195 197 184 194 174 180 189 185 180 187 183 187 172 180 178 178
[33] 188 184 195 196 190 180 181 184 182 195 186 196 185 190 182 179
[49] 190 191 186 188 190 200 187 191 186 193 181 194 185 195 185 192
[65] 184 192 195 188 190 198 190 190 196 197 190 195 191 184 187 195
[81] 189 196 187 193 191 194 190 189 189 190 202 205 185 186 187 208
[97] 190 196 178 192 192 203 183 190 193 184 199 190 181 197 198 191
[113] 193 197 191 196 188 199 189 189 187 198 176 202 186 199 191 195
[129] 191 210 190 197 193 199 187 190 191 200 185 193 193 187 188 190
[145] 192 185 190 184 195 193 187 201 211 230 210 218 215 210 211 219
[161] 209 215 214 216 214 213 210 217 210 221 209 222 218 215 213 215
[177] 215 215 216 215 210 220 222 209 207 230 220 220 213 219 208 208
[193] 208 225 210 216 222 217 210 225 213 215 210 220 210 225 217 220
[209] 208 220 208 224 208 221 214 231 219 230 214 229 220 223 216 221
[225] 221 217 216 230 209 220 215 223 212 221 212 224 212 228 218 218
[241] 212 230 218 228 212 224 214 226 216 222 203 225 219 228 215 228
[257] 216 215 210 219 208 209 216 229 213 230 217 230 217 222 214 NA
[273] 215 222 212 213 192 196 193 188 197 198 178 197 195 198 193 194
[289] 185 201 190 201 197 181 190 195 181 191 187 193 195 197 200 200
[305] 191 205 187 201 187 203 195 199 195 210 192 205 210 187 196 196
[321] 196 201 190 212 187 198 199 201 193 203 187 197 191 203 202 194
[337] 206 189 195 207 202 193 210 198
function(argument)Functions are (most often) verbs, followed by what they will be applied to in parentheses:
mean()Let’s compute the average of a set of numbers:
Object documentation can be accessed with ?
install.packages() function and loaded with the library function, once per session:Your containers come “fully loaded,” so you may not have to install any new packages.
Data frames: like the spreadsheets of R

? to get help with objects (like data frames and functions):$ to access columnsNote
Generally, you need to use the $ to tell R where to find that column.
<- or equals sign = to save objectsNote
Check your environment pane for the saved object!
Note
If you have trouble understanding what a message is saying, there is a high chance someone has explained the message online.
If data analysis was cooking…
Installing a package would be like buying ingredients from the store
Loading a package would be like getting the ingredients out of your pantry and setting them on your counter top to be used
aka the package you’ll hear about the most…


GitHub is the home for your Git-based projects on the internet – like DropBox but much, much better
We will use GitHub as a platform for web hosting and collaboration (and as our course management system!)


with human readable messages





Option 1:
Sit back and enjoy the show!
Note
You’ll need to stick to this option if you haven’t yet accepted your GitHub invite and don’t have a repo created for you.
Option 2:
Go to the course GitHub organization and clone ae-your-github-username repo to your container.
Find your application exercise repo, that will always be named using the naming convention assignment_title-your-github-username, e.g., ae-kgsolarz or lab-1-kgsolarz.
Click on the green “Code” button, make sure SSH is selected, copy the repo URL

In RStudio, File > New Project > From Version Control > Git
Paste repo URL copied in previous step, then click tab to auto-fill the project name, then click Create Project
If you haven’t done Lab 0, for one time only, type yes in the pop-up dialogue
Never accepted GitHub invite \(\rightarrow\) Look for it in your email and accept it
Cloning repo fails \(\rightarrow\) Review/redo Lab 0 steps for setting up SSH key
Still no luck? Stay after class today or come by my office hours this weekend or post on Ed for help
Option 1:
Sit back and enjoy the show!
Note
If you chose (or had to choose) this option for the previous tour, or if you couldn’t clone your repo for any reason, you’ll need to stick to this option.
Option 2:
Go to RStudio and open the document ae-01-meet-the-penguins.qmd.

Once we made changes to our Quarto document, we
went to the Git pane in RStudio
staged our changes by clicking the checkboxes next to the relevant files
committed our changes with an informative commit message
pulled from GitHub to make sure we had the latest version of our repo
pushed our changes to our application exercise repos
confirmed on GitHub that we could see our changes pushed from RStudio