Pivoting, groups, and functions

Week 2 In-class Demo

1 Introduction

In this week’s demo, we will focus on the benefits of the tidyverse for data transformation and summarization. We will use a sample survey dataset to demonstrate how to pivot tables and summarize data.

1.1 How to follow along with this document

Using this document
This is a convenience for learning. It is not what you will be doing when you are running R commands yourself outside of class! Thus, while we will use this approach for learning, it is not for doing work.
Using RStudio
This is how you will do work (including your homework and personal project), so you should get used to using this environment as quickly as possible. Click on this link to download the files (R project, R script, data, and folder structure) that you need to do all of this in RStudio.

Let’s take a look at how to use each of these two approaches.

1.2 Using this document

Within this document are blocks of R code. You can edit and execute this code as a way of practicing your R skills:

  • Edit the code that is shown in the box. Click on the Run Code button.
  • Make further edits and re-run that code. You can do this as often as you’d like.
  • Click the Start Over button to bring back the original code if you’d like.
  • If the code has a “Test your knowledge” header, then you are given instructions and can get feedback on your attempts.

1.3 Using RStudio

If you were to do the following in RStudio, we encourage you to do the following. (You don’t need to do any of this if you are executing the commands within this document — it is all handled for you.)

Download the ZIP file
Use the link to download the ZIP file. Then unzip it.
Open the Rproj (R project) file
Double-click on the file week1.Rproj to open RStudio. Or, if RStudio is already open, use File/Open Project and open that file.
Clean up the workspace
When beginning a new project in RStudio, it’s always a good idea to remove active, existing data by doing the following:
rm(list = ls())

In the above, ls() lists all objects in the workspace, and rm() deletes them.

An alternative, and perhaps better, practice is to just restart R using the Session menu, but this is a quick way to clean up most things.

1.4 Load packages

Next, we load the necessary packages for our analysis. The tidyverse package provides a suite of tools for data manipulation, tidylog gives us detailed messages about our data transformations, and knitr helps with table formatting.

library(tidyverse)
library(tidylog)
library(knitr)
library(kableExtra)

You might have to install kableExtra as we haven’t used it before.

install.packages("kableExtra")

2 The demonstration

2.1 Load and inspect raw data

We begin by loading the raw survey data from a CSV file. This data contains responses from students rating their abilities.

Next, we inspect the structure of the raw data to understand its format and contents.

2.2 Pivot data for analysis

Survey data often has item labels in columns, which can be cumbersome. We pivot the data so that item labels are in a single column, making it easier to analyze.

2.3 Summary statistics

We calculate summary statistics for the survey values to get an overall sense of the data distribution.

We also summarize the item labels to understand the distribution of responses across different survey items.

Notice that this information isn’t very useful because Item is a text column instead of the more appropriate categorical type.

2.4 Convert categorical columns to factors

To better analyze categorical data, we convert the Item and Program columns to factors. This helps in summarizing and visualizing the data.

2.5 Summary statistics by Item

We calculate summary statistics by item to understand how students rate their abilities across different traits.

If you are interested, you can find a very nice guide to making beautiful tables at this page.

2.6 Group by Program and Item

We group the data by program and item to find the average response value for each combination. This helps us compare responses across different programs.

We then summarize the item averages to get an overview of the data.

2.7 Pivot for comparison across programs

To compare item averages across programs, we pivot the data to put the programs as column headers.

We then summarize the wide item averages to understand the distribution of responses across programs.

Now let’s take a look at a formatted display of this wide data:

2.8 Filter and visualize data

We filter the data to focus on the creativity trait and arrange it in descending order of average response value.

Finally, we visualize the item averages to gain insights into student confidence across different programs and traits.

3 Conclusion

In this demo, we have seen how to use the tidyverse to transform and summarize data efficiently. We have also visualized the data to gain insights into student confidence across different programs and traits.