Data Analysis for IR 101: R Scripting

Participants are assumed to have experience manipulating spreadsheet data but little or no experience working with R scripts. You will learn how to write scripts in R Studio to describe and automate the same data operations you already know how to do in Excel. Each week you will be introduced to a topic, work through group exercises during class, work through more advanced exercises during a live problem session, and develop a course-long personal project.

1 Learning objectives

  • Learn how to use R Studio to develop, test, and execute automated data processes
  • Learn the basic commands from R’s tidyverse package, to replicate common data sheet operations, including data summaries
  • Learn the basics of the R language, like arrays, strings, and control

2 Requirements of the class

  • Elapsed time: 6 weeks
  • Weekly online synchronous sessions (4 of each + 1 wrap-up)
    • Lessons: Mondays at 3-4:30pmET. During this session, we will introduce new topics, demonstrate them, and provide time for students to experiment with them.
    • Problem sessions: Wednesdays at 3-4pmET. During this session, students will work through a problem set, alone or in groups, and ask questions as they arise. No new topics will be introduced.
  • Required tasks each week: Work on homework for approximately five hours each week outside of class
  • Final project
    • Students will be encouraged to think about what project they will choose throughout the course
    • Students will be given one week to work on their project
    • We devote the last class to student discussions of their projects
    • Students will turn in their project end of the course
  • Exit interview: Students will arrange a 30-minute discussion with the professors about R, the course, and their project.

3 Before the first live session

You (or your IT group if your computer is locked down by your organization) need to follow the instructions on the setup page. This basically consists of the following:

  1. Install R (>= v4.4.2)
  2. Install RStudio (>= 2024.12.0)
    • And set your preferences for it.
  3. Install R packages:
    • tidyverse
    • tidylog

4 Schedule for the course

4.1 Week 1: Basics of data manipulation

4.1.1 Lesson: February 3 at 3pm

Agenda
We will discuss (and demonstrate where appropriate) each of the following. You will also have the opportunity to try out some of the commands.
  • Introduction to R & RStudio
  • Comparison of R and Excel
  • Demonstrate the process of working with R & RStudio
  • Introduction to the basics
    • Data import and export: read_csv() and write_csv()
    • Data tools: select(), filter(), mutate()
    • The pipe operator (|>)
    • Basic math operations
    • Basic logic (AND, OR, NOT) operations
  • Familiarize with RforIR.com
To-do after class
  • Work through each of the lessons (below) before our upcoming problem session this week.
  • After working through the lessons, then work through this week’s homework (below).
Resources

4.1.2 Problem session: February 5 at 3pm

The contents of this class will be determined by student interests and needs. We will not cover any new material. This time will be spent providing time for students to work on the homework assignment. If students have questions about the Lessons for the week, then we will be glad to address those as well.

4.2 Week 2: Pivot tables

4.2.1 Lesson: February 10 at 3pm

Agenda
  • Grouping and ungrouping
  • Summarize and mutate within groups
    • Functions: group_by(), summarize(), arrange(), head()
    • Calculations: mean(), median(), max(), min()
    • Pivoting with spread() and gather()
To-do after class
  • Work through each of the lessons (below) before our upcoming problem session this week.
  • After working through the lessons, then work through this week’s homework (below).
Resources

4.2.2 Problem session: February 12 at 3pm

The contents of this class will be determined by student interests and needs. We will not cover any new material. This time will be spent providing time for students to work on the homework assignment. If students have questions about the Lessons for the week, then we will be glad to address those as well.

4.3 Week 3: Data joins

4.3.1 Lesson: February 24 at 3pm

Agenda
  • Left joins
  • Primary & foreign keys
  • ER diagrams
To-do after class
  • Work through each of the lessons (below) before our upcoming problem session this week.
  • After working through the lessons, then work through this week’s homework (below).
Resources
  • Lessons
  • In-class demonstration (IPEDS data)
  • Homework (at end of IPEDS R script)
  • In-class

4.3.2 Problem session: February 26 at 3pm

The contents of this class will be determined by student interests and needs. We will not cover any new material. This time will be spent providing time for students to work on the homework assignment. If students have questions about the Lessons for the week, then we will be glad to address those as well.

4.4 Week 4: Putting it all together

4.4.1 Lesson: March 3 at 3pm

Agenda
  • Discuss and demonstrate the basics of the for loop
  • Discuss and demonstrate how to put data tables in a workbook (as a communications tool)
  • Demonstrate several integrated R scripts that show a range of examples of what can be accomplished with the tools that we have learned about in this class
To-do after class
  • Work through the lesson (below) before our upcoming problem session this week.
  • Scan through and attempt to interpret the university example that you have in your possession.
  • After working through the lesson, then work through this week’s homework (below).
Resources

4.4.2 Problem session: March 5 at 3pm

The contents of this class will be determined by student interests and needs. We will not cover any new material. This time will be spent providing time for students to work on the homework assignment. If students have questions about the Lessons for the week, then we will be glad to address those as well.

4.5 Week 5: Project development

Students work on Personal/Institutional Project

4.6 Week 6: Wrap-up

4.6.1 Discussion: March 17 at 3pm

Submit & present final project; schedule discussion time with professors

  • Turn in R script for the final project
  • Present a summary in class
  • Schedule a meeting for a course and project wrap-up discussion