Advanced lesson – Finding proportion of cases to meet a condition
The following is a handy trick when we want to find the proportion of cases that meet a condition.
First, consider this vector (it could also be a list) that specify the class of a set of students:
As the displayed results show, student_class
is a character vector.
The following tests whether each individual element of the vector is equal to "freshman"
. Before executing this, note the following:
student_class
is a vector and"freshman"
is a character string.- The operator between the two values consists of two equal signs, not one. This means that it is a test of equality and not an assignment operator.
R
is smart enough that it interprets the statement correctly without getting confused about the incompatibility between vector and character string.
Let’s be clear about what the above output says: “The first and sixth values of the vector are equal to "freshman"
and the other values are not.” The return value is a vector of logical values.
This might be unexpected, but let’s look at the results in a bit more detail:
- We know (from above) that
student_class == "freshman"
produces a vector of logical values. - We saw in a previous section that we can use
as.numeric()
on a vector in order to convert all of its elements into numeric values. - The result of all of this is a numeric vector in which all values are either a
0
or a1
. All of the1
values correspond to those values instudent_class
that are equal to"freshman"
.
Thus, what we have is a numeric value with seven values, two of which are 1
s.
Now, suppose that we want the know the percentage of all the vector elements that are equal to "freshman"
— in English, we want to know the percentage of students who are freshman.
The mean()
function
The mean()
function calculates the average of a vector of values, as follows:
What this function does is count the number of elements, add the values of the elements, and then divides the sum by the count.
Look at that vector of 1
and 0
values above. If we want to determine the percentage of 1
values in the vector, then we can use the mean()
function on it:
That is, about 28.6% of the students (2/7, more precisely) are freshmen.
R
provides a shortcut for performing this conversion and calculation — it allows you to skip the conversion from logical to numeric! Look at the following:
Either of the last two methods for calculating the mean are correct. We just wanted to show you how R
provides shortcuts for common needs.