Advanced lesson – Finding proportion of cases to meet a condition

The following is a handy trick when we want to find the proportion of cases that meet a condition.

First, consider this vector (it could also be a list) that specify the class of a set of students:

As the displayed results show, student_class is a character vector.

The following tests whether each individual element of the vector is equal to "freshman". Before executing this, note the following:

R is smart enough that it interprets the statement correctly without getting confused about the incompatibility between vector and character string.

Let’s be clear about what the above output says: “The first and sixth values of the vector are equal to "freshman" and the other values are not.” The return value is a vector of logical values.

This might be unexpected, but let’s look at the results in a bit more detail:

Thus, what we have is a numeric value with seven values, two of which are 1s.

Now, suppose that we want the know the percentage of all the vector elements that are equal to "freshman" — in English, we want to know the percentage of students who are freshman.

Tip

The mean() function

The mean() function calculates the average of a vector of values, as follows:

What this function does is count the number of elements, add the values of the elements, and then divides the sum by the count.

Look at that vector of 1 and 0 values above. If we want to determine the percentage of 1 values in the vector, then we can use the mean() function on it:

That is, about 28.6% of the students (2/7, more precisely) are freshmen.

R provides a shortcut for performing this conversion and calculation — it allows you to skip the conversion from logical to numeric! Look at the following:

Either of the last two methods for calculating the mean are correct. We just wanted to show you how R provides shortcuts for common needs.