Defining a factor
In IR work, we often report on class standing, first year (or freshman), sophomore, junior, and senior for a four-year undergraduate program. But these don’t sort alphabetically in the right order as strings — freshman comes first, then junior, then senior, then sophomore. There is a way to tell R
how to sort them correctly by converting them to a factor type. For more info on factors, see this page.
Let’s create this data frame to play with for a bit:
We can sort by a column with the arrange()
function, but it will sort alphabetically.
That’s not usually what we want, so we can convert class
to a factor with the mutate()
function. We’ll overwrite the existing data frame with the new one, using df <- ...
The important parts of the above are to put the levels
into the right order, and then to set ordered
to TRUE
. Now when sorting df
by class
, the rows will appear in the right order.
This trick is particularly useful when we want to make a chart or table where the reader will expect sorting in a particular way. You will see much more of factors as you progress in your work with R
.