student_class <- c("freshman", "sophomore", "junior", "senior",
"senior", "freshman", "junior")
str(student_class)
chr [1:7] "freshman" "sophomore" "junior" "senior" "senior" "freshman" ...
The following is a handy trick when we want to find the proportion of cases that meet a condition.
First, consider this vector (it could also be a list) that specify the class of a set of students:
student_class <- c("freshman", "sophomore", "junior", "senior",
"senior", "freshman", "junior")
str(student_class)
chr [1:7] "freshman" "sophomore" "junior" "senior" "senior" "freshman" ...
As the displayed results show, student_class
is a character vector.
The following tests whether each individual element of the vector is equal to "freshman"
. Before looking at the results, note the following:
student_class
is a vector and "freshman"
is a character string.R
is smart enough that it interprets the statement correctly without getting confused about the incompatibility between vector and character string.
Let’s be clear about what the above output says:
“The first and sixth values of the vector are equal to
"freshman"
and the other values are not.”
The return value is a vector of logical values.
Now let’s convert those logical values to numeric values:
This might be unexpected, but let’s look at the results in a bit more detail:
student_class == "freshman"
produces a vector of logical values.as.numeric()
on a vector in order to convert all of its elements into numeric values.0
or a 1
. All of the 1
values correspond to those values in student_class
that are equal to "freshman"
.Thus, what we have is a numeric value with seven values, two of which are 1
s.
Now, suppose that we want the know the percentage of all the vector elements that are equal to "freshman"
— in English, we want to know the percentage of students who are freshman.
Look at that vector of 1
and 0
values above. If we want to determine the percentage of 1
values in the vector, then we can use the mean()
function on it:
That is, about 28.6% of the students (2/7, more precisely) are freshmen.
R
provides a shortcut for performing this conversion and calculation — it allows you to skip the conversion from logical to numeric! Look at the following:
Either of the last two methods for calculating the mean are correct. We just wanted to show you how R
provides shortcuts for common needs.