rm(list = ls())
Pivoting, groups, and functions
Week 2 In-class Demo
1 Introduction
In this week’s demo, we will focus on the benefits of the tidyverse
for data transformation and summarization. We will use a sample survey dataset to demonstrate how to pivot tables and summarize data.
1.1 How to follow along with this document
- Using this document
- This is a convenience for learning. It is not what you will be doing when you are running R commands yourself outside of class! Thus, while we will use this approach for learning, it is not for doing work.
- Using
RStudio
-
This is how you will do work (including your homework and personal project), so you should get used to using this environment as quickly as possible. Click on this link to download the files (R project, R script, data, and folder structure) that you need to do all of this in
RStudio
.
Let’s take a look at how to use each of these two approaches.
1.2 Using this document
Within this document are blocks of R
code. You can edit and execute this code as a way of practicing your R
skills:
- Edit the code that is shown in the box. Click on the
Run Code
button. - Make further edits and re-run that code. You can do this as often as you’d like.
- Click the
Start Over
button to bring back the original code if you’d like. - If the code has a “Test your knowledge” header, then you are given instructions and can get feedback on your attempts.
1.3 Using RStudio
If you were to do the following in RStudio
, we encourage you to do the following. (You don’t need to do any of this if you are executing the commands within this document — it is all handled for you.)
- Download the
ZIP
file -
Use the link to download the
ZIP
file. Then unzip it. - Open the
Rproj
(R project) file -
Double-click on the file
week1.Rproj
to openRStudio
. Or, ifRStudio
is already open, useFile
/Open Project
and open that file. - Clean up the workspace
-
When beginning a new project in
RStudio
, it’s always a good idea to remove active, existing data by doing the following:
In the above, ls()
lists all objects in the workspace, and rm()
deletes them.
An alternative, and perhaps better, practice is to just restart R
using the Session
menu, but this is a quick way to clean up most things.
1.4 Load packages
Next, we load the necessary packages for our analysis. The tidyverse
package provides a suite of tools for data manipulation, tidylog
gives us detailed messages about our data transformations, and knitr
helps with table formatting.
library(tidyverse)
library(tidylog)
library(knitr)
library(kableExtra)
You might have to install kableExtra
as we haven’t used it before.
install.packages("kableExtra")
2 The demonstration
2.1 Load and inspect raw data
We begin by loading the raw survey data from a CSV file. This data contains responses from students rating their abilities.
Next, we inspect the structure of the raw data to understand its format and contents.
2.2 Pivot data for analysis
Survey data often has item labels in columns, which can be cumbersome. We pivot the data so that item labels are in a single column, making it easier to analyze.
2.3 Summary statistics
We calculate summary statistics for the survey values to get an overall sense of the data distribution.
We also summarize the item labels to understand the distribution of responses across different survey items.
Notice that this information isn’t very useful because Item
is a text column instead of the more appropriate categorical type.
2.4 Convert categorical columns to factors
To better analyze categorical data, we convert the Item
and Program
columns to factors. This helps in summarizing and visualizing the data.
2.5 Summary statistics by Item
We calculate summary statistics by item to understand how students rate their abilities across different traits.
If you are interested, you can find a very nice guide to making beautiful tables at this page.
2.6 Group by Program and Item
We group the data by program and item to find the average response value for each combination. This helps us compare responses across different programs.
We then summarize the item averages to get an overview of the data.
2.7 Pivot for comparison across programs
To compare item averages across programs, we pivot the data to put the programs as column headers.
We then summarize the wide item averages to understand the distribution of responses across programs.
Now let’s take a look at a formatted display of this wide data:
2.8 Filter and visualize data
We filter the data to focus on the creativity trait and arrange it in descending order of average response value.
Finally, we visualize the item averages to gain insights into student confidence across different programs and traits.
3 Conclusion
In this demo, we have seen how to use the tidyverse
to transform and summarize data efficiently. We have also visualized the data to gain insights into student confidence across different programs and traits.