class(9.6)
[1] "numeric"
Prefer to learn via live instruction? Register for my Introduction to R for Data Analysis seminar via Instats on January 15-16 2025.
So far, we have only worked with numbers in R. But there are many other kinds of values that you will encounter in your R journeys.
The main types of values that you’ll encounter in R are:
Numeric: numbers, e.g., 1
, 3.5
, 1e5
(which is scientific notation for 100,000)
Character: free-form text values, e.g., "California"
, "John Doe"
, "XJ1784"
Logical (Boolean): binary values corresponding to TRUE
and FALSE
You can use the class()
function to ask what type of object a value is. For example, the class of 9.6
is “numeric”
So is the class of -5
and 1e7
(which is scientific notation for 10,000,000)
You can also use the class()
function to ask the class of the value stored in a variable:
Identify the class of y
(which contains the value 7
):
If your object has class "numeric"
, you can do mathematical computations with it:
Many datasets will contain text as well as numbers! In R, text has a “character” type.
The following contain examples of character type values:
And like numbers, you can save character type values in a variable:
To view the contents of our character variable, you just type its name as usual:
And I can ask what class it has using the class function:
What is the difference between a variable name and a character value? Character values are always surrounded by quotes, whereas variable names are not.
So if I try to type banana
into my R console without the quotes, R will think I am referring to a variable name called banana
and I will get a mildly rude error because I haven’t defined any variables called banana
:
The object 'banana' not found
means that I’ve accidentally referred to a variable in my code (banana
, in this case) that doesn’t exist because I haven’t defined it!
Does the answer to the question above surprise you? Remember, that whenever a value is surrounded by quotes, it is a character. It doesn’t matter whether the value contains a number or not!
What do you think will happen when you try to do mathematical operations with character (text) variables?
Let’s define a character variable and try to add the number 1
to it:
Error in char + 1: non-numeric argument to binary operator
This Error in char + 1 : non-numeric argument to binary operator
error will become very familiar to you in time. This error is a very unhelpful way that R tells us that we cannot do mathematical operations with non-numeric (e.g., character) values. Bummer.
So if we can’t do math with character values, what’s the point of them?
The purpose of character values is to store categorical and text information, which we will often use to do things like creating groups in our data (e.g., separating people according to the state in which they live).
Next up are the “logical” (“Boolean”) type values. These are fairly simple because there are only two of them: TRUE
and FALSE
.
For your logical value to be recognized as a logical value it must be in all caps. As if you’re yelling (LIKE THIS). If you don’t yell loud enough, R will complain. For instance, if I only yell the first letter, like this:
R says Error: object 'True' not found
, which, if you were paying attention earlier, is code for “there is no variable named True
. R is trying to find a variable called True
and it’s failing to do so which is unsurprising… because you haven’t defined one! It doesn’t know you’re trying to use a logical TRUE
value, because you didn’t use all caps.
As with everything else, we can use class()
to ask the class of logical values, and unsurprisingly, it tells us that the class of logical values are “logical”:
[1] "logical"
[1] "logical"
[1] "logical"
What do you think will happen when we try to do mathematical operations with logical values? Let’s try:
[1] -2
[1] 0.2
Interestingly, it seems to work (unlike when we tried to do mathematical operations with character values)… But what is it doing?
If you could choose any numbers to convert TRUE
and FALSE
to, what would you choose? I would probably choose TRUE
to be 1
and FALSE
to be 0
. Fortunately for me, this is exactly what R does.
When they are involved in mathematical operations, logical values are converted to their numeric binary counterpart values of 0
(FALSE
) and 1
(TRUE
).
If you replaced logical_var
(which contains TRUE
) with 1
and FALSE
with 0
in the code chunk above, does the output make sense now?
Before you run the following code, predict which of the following four computations will work and what their output will be.
Only the third computation is valid.
The first computation doesn’t work because "TRUE"
is a character type (since it is surrounded by quotes) and you can’t add characters and numbers.
The second computation doesn’t work because you can’t add character values to one another.
This third computation does work because when used in a mathematical operation, the Boolean/logical value FALSE
is treated as 0
The fourth computation doesn’t work because "TRUE"
is a character type (since it is surrounded by quotes) and you can’t add characters to anything, including logical values.
Let’s define a numeric variable.
Let’s try to convert the numeric object to a character type using the as.character()
function. As you may have guessed, as.character()
tries to convert whatever object given inside its parentheses (i.e., its “argument”) to a character type.
Did it work? Notice that the 12.5 has some quotes around it now. That means that it’s not a numeric value anymore. It’s now a text (character) value that contains a number. This means that you can’t do math with it.
But just to prove it to you, I’m going to try anyway (and unsurprisingly, I get an error):
Error in as.character(numeric_var) * 5: non-numeric argument to binary operator
To confirm that as.character(numeric_var)
is indeed a character, I can apply the class()
function to the character value created by as.character(numeric_var)
by placing as.character(numeric_var)
inside the class()
parentheses:
This code is “nesting” the class()
and as.character()
functions.
Do you think that running the as.character(numeric_var)
code has modified the original numeric_var
object at all (i.e., does using as.character()
on a variable actually convert that variable to a character type… or does it just print out the character type version of the variable)?
You can check by just outputting the numeric_var
object by typing its name:
Notice there are no quotes, so it’s still a numeric-type object. We can also confirm this using the class function:
If we wanted to update the numeric_var
object so that it had a character type, we would need to “reassign” it to the output of as.character(numeric_var)
. This would overwrite the old numeric_var
and replace it with the new character version. I don’t want to do this though, so I’m not going to run this code.
# To overwrite numeric_var with a character version, run:
numeric_var <- as.character(numeric_var)
numeric_var
[1] "12.5"
Just as there is an as.character()
function, there is also an as.numeric()
function (there’s also an as.logical()
function, but I don’t think I’ve ever actually had used it)
Rather than bore you to bits by outlining all of the possible conversions you can do with as.numeric()
and as.character()
, you’re going to do it for me. Use the as.numeric()
and as.character()
functions to fill in the following table (I’ve already filled in the first row for you):
value |
Original type | as.character(value) |
as.numeric(value) |
---|---|---|---|
12.5 |
numeric | "12.5" |
12.5 |
TRUE |
logical | ||
FALSE |
logical | ||
"howdy" |
character | ||
"99" |
character | ||
"1,200" |
character |
Pay close attention to which value
entries have quotes and which values do not.
Did any of these results surprise you?
When you run as.numeric("howdy")
or as.numeric("1,200")
, you should get an NA
value, which is a missing value, along with a warning:
Unlike an error, which means that your code did not actually run, when you get a warning, your code has run, but R is telling you it’s not happy with you. When you get a warning, it’s a good idea to take a pause and consider that perhaps your code may not have done what you expected.
The warning NAs introduced by coercion
happens when you try to convert characters to numbers. Characters cannot be converted to numbers, unless the character contains a number without any additional characters, as you should have seen when filling in your table above.
This means that this works:
But this does not:
1,200
may look like a number, but the presence of the comma ,
means that R cannot parse the number inside the quotes. What is obvious to us is not always obvious to our computer overlords.
If you do want to convert a character containing a number, such as "1,200"
to a numeric type, you can use the parse_number()
function from the “readr” R library. You’ll learn more about libraries in future chapters, so don’t worry about running this code now–I just wanted to let you know that this exists!
NA
valuesLet’s talk briefly about the NA
value (missing values). They are everywhere. You will often find that once they make their way into your data in R, missing values have a way of permeating your existence.
A missing value, NA
, is a special type of object. Like TRUE
and FALSE
, your NA
must be in all caps (i.e., you must yell when you type it).
For example, this is the NA value:
But R thinks that the lowercase version, na
, is a variable (and R then complains when I type na
because I haven’t defined a variable called na
):
NA
values are annoying mostly because the result of any mathematical operation with an NA
is always NA
:
Armed with the knowledge that character values will be converted to NA
when you apply as.numeric()
, but numeric values can be converted to character values using as.character()
just fine, try the following exercise.
Without running the following pieces of code, which of the following pieces of code will work, and what do you think the output will be?
Let’s go ahead and create two band-new numeric variables, x
and y
:
I’m now going to ask R some questions about x
and y
.
First question: “Is x equal to 2?”
R answered “Yes!” But in R, “Yes!” is TRUE
.
To ask a question of equality, we used two equals symbols ==
.
Next question: “Is x
less than or equal to 1
?”
Again, R came through with an answer (this time FALSE
). To ask a question of “less than or equal to”, we used a “less than” symbol <
followed by an equals symbol =
, giving me <=
.
Although both ==
and <=
kind of look like the assignment operators =
and <-
, they’re not affiliated in any way.
==
and <=
are “question asking” operators, or “logical operators” if you want to sound fancy (they’re called “logical operators” because they always result in a TRUE
or FALSE
logical result).
Before we asked if x
was equal to 2
(x == 2
), but we can also ask whether x
is equal to y
:
As well as “is x
not equal to y” using the “not equal to” logical question operator of an exclamation point followed by an equals symbol !=
(not equals):
In fact, for any logical question we ask, we can ask its inverse by placing the original question in parentheses and prefacing it with a !
. So the following is another way to ask x != y
:
The parentheses are necessary to tell R that we want the inverse of the entire question x == y
(not just of x
).
Here are some more questions:
“Is x
strictly greater than y
?”
“Is x
greater than or equal to y
?”
“Is x
strictly less than y
?”
It’s almost like we’re talking to R and it’s replying! This is going to be really important later, since we will use these kinds of logical operations to filter to various subsets of our data based on logical conditions/questions.