Finding proportion of cases to meet a condition

The following is a handy trick when we want to find the proportion of cases that meet a condition.

First, consider this vector (it could also be a list) that specify the class of a set of students:

student_class <- c("freshman", "sophomore", "junior", "senior", 
                   "senior", "freshman", "junior")
str(student_class)
 chr [1:7] "freshman" "sophomore" "junior" "senior" "senior" "freshman" ...

As the displayed results show, student_class is a character vector.

The following tests whether each individual element of the vector is equal to "freshman". Before looking at the results, note the following:

R is smart enough that it interprets the statement correctly without getting confused about the incompatibility between vector and character string.

student_class == "freshman"
[1]  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE

Let’s be clear about what the above output says:

“The first and sixth values of the vector are equal to "freshman" and the other values are not.”

The return value is a vector of logical values.

Now let’s convert those logical values to numeric values:

as.numeric(student_class == "freshman")
[1] 1 0 0 0 0 1 0

This might be unexpected, but let’s look at the results in a bit more detail:

Thus, what we have is a numeric value with seven values, two of which are 1s.

Now, suppose that we want the know the percentage of all the vector elements that are equal to "freshman" — in English, we want to know the percentage of students who are freshman.

Tip

The mean() function

The mean() function calculates the average of a vector of values, as follows:

mean(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
[1] 5.5

What this function does is count the number of elements, add the values of the elements, and then divides the sum by the count.

Look at that vector of 1 and 0 values above. If we want to determine the percentage of 1 values in the vector, then we can use the mean() function on it:

mean(as.numeric(student_class == "freshman"))
[1] 0.2857143

That is, about 28.6% of the students (2/7, more precisely) are freshmen.

R provides a shortcut for performing this conversion and calculation — it allows you to skip the conversion from logical to numeric! Look at the following:

mean(student_class == "freshman")
[1] 0.2857143

Either of the last two methods for calculating the mean are correct. We just wanted to show you how R provides shortcuts for common needs.