Advanced lesson – Finding proportion of cases to meet a condition
The following is a handy trick when we want to find the proportion of cases that meet a condition.
First, consider this vector (it could also be a list) that specify the class of a set of students:
As the displayed results show, student_class is a character vector.
The following tests whether each individual element of the vector is equal to "freshman". Before executing this, note the following:
student_classis a vector and"freshman"is a character string.- The operator between the two values consists of two equal signs, not one. This means that it is a test of equality and not an assignment operator.
R is smart enough that it interprets the statement correctly without getting confused about the incompatibility between vector and character string.
Let’s be clear about what the above output says: “The first and sixth values of the vector are equal to "freshman" and the other values are not.” The return value is a vector of logical values.
This might be unexpected, but let’s look at the results in a bit more detail:
- We know (from above) that
student_class == "freshman"produces a vector of logical values. - We saw in a previous section that we can use
as.numeric()on a vector in order to convert all of its elements into numeric values. - The result of all of this is a numeric vector in which all values are either a
0or a1. All of the1values correspond to those values instudent_classthat are equal to"freshman".
Thus, what we have is a numeric value with seven values, two of which are 1s.
Now, suppose that we want the know the percentage of all the vector elements that are equal to "freshman" — in English, we want to know the percentage of students who are freshman.
The mean() function
The mean() function calculates the average of a vector of values, as follows:
What this function does is count the number of elements, add the values of the elements, and then divides the sum by the count.
Look at that vector of 1 and 0 values above. If we want to determine the percentage of 1 values in the vector, then we can use the mean() function on it:
That is, about 28.6% of the students (2/7, more precisely) are freshmen.
R provides a shortcut for performing this conversion and calculation — it allows you to skip the conversion from logical to numeric! Look at the following:
Either of the last two methods for calculating the mean are correct. We just wanted to show you how R provides shortcuts for common needs.