What gets counted counts and Introduction to

Dr. Nathaniel Cline

Agenda

1

What Gets Counted?

2

Reproducible data analysis

3

and RStudio

4

Review and to do

Counting

  • what gets counted reflects the priorities, biases, interests, politics, etc… of those in charge of counting

  • Examples:

    • GDP data adjustments
    • 1875 State Census of Massachussets
    • Gender binaries on forms
    • Police killings
    • Maternal deaths

Quantitative data and power

Without quantitative research, Oakley explains, “it is difficult to distinguish between personal experience and collective oppression.”

Reflections on the readings


What did you walk away with from the readings?


What is the relationship between quantitative data and power?


What are the tradeoffs involved in transforming human experience into a number?


Reproducibility checklist

What does it mean for a data analysis to be “reproducible”?

Near-term goals:

  • Are the tables and figures reproducible from the code and data?
  • Does the code actually do what you think it does?
  • In addition to what was done, is it clear why it was done?

Long-term goals:

  • Can the code be used for other data?
  • Can you extend the code to do other things?

Toolkit for reproducibility

  • Scriptability \(\rightarrow\) R
  • Literate programming (code, narrative, output in one place) \(\rightarrow\) R Markdown
  • Version control \(\rightarrow\) Git / GitHub

R and RStudio

R and RStudio

  • R is an open-source statistical programming language
  • R is also an environment for statistical computing and graphics
  • It’s easily extensible with packages

  • RStudio is a convenient interface for R called an IDE (integrated development environment), e.g. “I write R code in the RStudio IDE”
  • RStudio is not a requirement for programming with R, but it’s very commonly used by R programmers, data scientists, economists, data journalists and others

R packages

  • Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data1

  • As of September 2023, there are over 19,000 R packages available on CRAN (the Comprehensive R Archive Network)2

  • We’re going to work with a small (but important) subset of these!

R Studio tour

A short list (for now) of R essentials

  • Functions are (most often) verbs, followed by what they will be applied to in parentheses:
do_this(to_this)
do_that(to_this, to_that, with_those)
  • Packages are installed with the install.packages function and loaded with the library function, once per session:
install.packages("package_name")
library(package_name)

R essentials (continued)

  • Columns (variables) in data frames are accessed with $:
dataframe$var_name
  • Object documentation can be accessed with ?
?mean

tidyverse

  • The tidyverse is an opinionated collection of R packages designed for data science
  • All packages share an underlying philosophy and a common grammar

Quarto and Rmarkdown

  • rmarkdown and the various packages that support it enable R users to write their code and prose in reproducible computational documents

  • In the past we would generally refer to R Markdown documents (with .Rmd extension), e.g. “Do this in your R Markdown document”

  • these days Quarto is the next generation and we will use .qmd files

Quarto and Rmarkdown


  • Fully reproducible reports – each time you render (knit) the analysis is ran from the beginning

  • Simple markdown syntax for text

  • Code goes in chunks, defined by three backticks, narrative goes outside of chunks

Tour: Quarto

Environments

Tip

The environment of your R Markdown document is separate from the Console!

Remember this, and expect it to bite you a few times as you’re learning to work with QMD (R Markdown)!

Environments

First, run the following in the console


x <- 2
x * 3


All looks good, eh?

Add the following in an R chunk in QMD


x * 3


What happens? Why the error?

How will we use QMD?

  • Every assignment / report / project / etc. is an QMD document

  • You’ll always have a template QMD document to start with

  • The amount of scaffolding in the template will decrease over the semester

Review

1

What Gets Counted?

2

Reproducible data analysis

3

and RStudio

To do

1

Read

Ch. 8, “Importing Data” ;

Ch. 4, “Data Transformation”;

Ch. 6 “Data Tidying”

in: Wickham, Çetinkaya-Rundel, and Grolemund R for Data Science

2

Do

Assignment 3