3  Types

Learn R live!

Prefer to learn via live instruction? Register for my Introduction to R for Data Analysis seminar via Instats on January 15-16 2025.

3.1 Common object types

So far, we have only worked with numbers in R. But there are many other kinds of values that you will encounter in your R journeys.

The main types of values that you’ll encounter in R are:

  • Numeric: numbers, e.g., 1, 3.5, 1e5 (which is scientific notation for 100,000)

  • Character: free-form text values, e.g., "California", "John Doe", "XJ1784"

  • Logical (Boolean): binary values corresponding to TRUE and FALSE

3.1.1 Numeric values

You can use the class() function to ask what type of object a value is. For example, the class of 9.6 is “numeric”

class(9.6)
[1] "numeric"

So is the class of -5

class(-5)
[1] "numeric"

and 1e7 (which is scientific notation for 10,000,000)

class(1e7)
[1] "numeric"

You can also use the class() function to ask the class of the value stored in a variable:

y <- 2 * 3 + 1
y
[1] 7

Identify the class of y (which contains the value 7):

class(y)
[1] "numeric"

If your object has class "numeric", you can do mathematical computations with it:

y + 2
[1] 9
y^3
[1] 343

Identify the class (type) of the value 99.9

class(99.9)
[1] "numeric"

3.1.2 Character values

Many datasets will contain text as well as numbers! In R, text has a “character” type.

The following contain examples of character type values:

"banana"
[1] "banana"
"I really like owls"
[1] "I really like owls"

And like numbers, you can save character type values in a variable:

char_var <- "my first character variable"

To view the contents of our character variable, you just type its name as usual:

char_var
[1] "my first character variable"

And I can ask what class it has using the class function:

class(char_var)
[1] "character"

What is the difference between a variable name and a character value? Character values are always surrounded by quotes, whereas variable names are not.

So if I try to type banana into my R console without the quotes, R will think I am referring to a variable name called banana and I will get a mildly rude error because I haven’t defined any variables called banana:

banana
Error in eval(expr, envir, enclos): object 'banana' not found

The object 'banana' not found means that I’ve accidentally referred to a variable in my code (banana, in this case) that doesn’t exist because I haven’t defined it!

What will be the class of the following variable that contains the value "1" with quotes (as opposed to 1 without quotes)?

var_one <- "1"

It’s a character value

class(var_one)
[1] "character"

Note the quotes when I print out the variable:

var_one
[1] "1"

Does the answer to the question above surprise you? Remember, that whenever a value is surrounded by quotes, it is a character. It doesn’t matter whether the value contains a number or not!

3.1.2.1 Mathematical computations with character values

What do you think will happen when you try to do mathematical operations with character (text) variables?

Let’s define a character variable and try to add the number 1 to it:

# define a character variable
char <- "hello"
# try to add 1 to it
char + 1
Error in char + 1: non-numeric argument to binary operator

This Error in char + 1 : non-numeric argument to binary operator error will become very familiar to you in time. This error is a very unhelpful way that R tells us that we cannot do mathematical operations with non-numeric (e.g., character) values. Bummer.

So if we can’t do math with character values, what’s the point of them?

The purpose of character values is to store categorical and text information, which we will often use to do things like creating groups in our data (e.g., separating people according to the state in which they live).

3.1.3 Logical values

Next up are the “logical” (“Boolean”) type values. These are fairly simple because there are only two of them: TRUE and FALSE.

TRUE
[1] TRUE
FALSE
[1] FALSE

For your logical value to be recognized as a logical value it must be in all caps. As if you’re yelling (LIKE THIS). If you don’t yell loud enough, R will complain. For instance, if I only yell the first letter, like this:

True
Error in eval(expr, envir, enclos): object 'True' not found

R says Error: object 'True' not found, which, if you were paying attention earlier, is code for “there is no variable named True. R is trying to find a variable called True and it’s failing to do so which is unsurprising… because you haven’t defined one! It doesn’t know you’re trying to use a logical TRUE value, because you didn’t use all caps.

As with everything else, we can use class() to ask the class of logical values, and unsurprisingly, it tells us that the class of logical values are “logical”:

class(TRUE)
[1] "logical"
class(FALSE)
[1] "logical"
logical_var <- TRUE
class(logical_var)
[1] "logical"

3.1.3.1 Mathematical computations with logical values

What do you think will happen when we try to do mathematical operations with logical values? Let’s try:

# Try to subtract 3 from logical_var
logical_var - 3
[1] -2
# Try to add 0.2 to FALSE
FALSE + 0.2
[1] 0.2

Interestingly, it seems to work (unlike when we tried to do mathematical operations with character values)… But what is it doing?

If you could choose any numbers to convert TRUE and FALSE to, what would you choose? I would probably choose TRUE to be 1 and FALSE to be 0. Fortunately for me, this is exactly what R does.

When they are involved in mathematical operations, logical values are converted to their numeric binary counterpart values of 0 (FALSE) and 1 (TRUE).

If you replaced logical_var (which contains TRUE) with 1 and FALSE with 0 in the code chunk above, does the output make sense now?

Before you run the following code, predict which of the following four computations will work and what their output will be.

"TRUE" * 4
"banana" + "apple"
FALSE + 5
TRUE + "TRUE"

Only the third computation is valid.

The first computation doesn’t work because "TRUE" is a character type (since it is surrounded by quotes) and you can’t add characters and numbers.

"TRUE" * 4
Error in "TRUE" * 4: non-numeric argument to binary operator

The second computation doesn’t work because you can’t add character values to one another.

"banana" + "apple"
Error in "banana" + "apple": non-numeric argument to binary operator

This third computation does work because when used in a mathematical operation, the Boolean/logical value FALSE is treated as 0

FALSE + 5
[1] 5

The fourth computation doesn’t work because "TRUE" is a character type (since it is surrounded by quotes) and you can’t add characters to anything, including logical values.

TRUE + "TRUE"
Error in TRUE + "TRUE": non-numeric argument to binary operator

3.2 Type conversions

Let’s define a numeric variable.

numeric_var <- 12.5

Let’s try to convert the numeric object to a character type using the as.character() function. As you may have guessed, as.character() tries to convert whatever object given inside its parentheses (i.e., its “argument”) to a character type.

# apply as.character() to numeric_var
as.character(numeric_var)
[1] "12.5"

Did it work? Notice that the 12.5 has some quotes around it now. That means that it’s not a numeric value anymore. It’s now a text (character) value that contains a number. This means that you can’t do math with it.

But just to prove it to you, I’m going to try anyway (and unsurprisingly, I get an error):

as.character(numeric_var) * 5
Error in as.character(numeric_var) * 5: non-numeric argument to binary operator

To confirm that as.character(numeric_var) is indeed a character, I can apply the class() function to the character value created by as.character(numeric_var) by placing as.character(numeric_var) inside the class() parentheses:

class(as.character(numeric_var))
[1] "character"

This code is “nesting” the class() and as.character() functions.

Do you think that running the as.character(numeric_var) code has modified the original numeric_var object at all (i.e., does using as.character() on a variable actually convert that variable to a character type… or does it just print out the character type version of the variable)?

You can check by just outputting the numeric_var object by typing its name:

numeric_var
[1] 12.5

Notice there are no quotes, so it’s still a numeric-type object. We can also confirm this using the class function:

class(numeric_var)
[1] "numeric"

If we wanted to update the numeric_var object so that it had a character type, we would need to “reassign” it to the output of as.character(numeric_var). This would overwrite the old numeric_var and replace it with the new character version. I don’t want to do this though, so I’m not going to run this code.

# To overwrite numeric_var with a character version, run:
numeric_var <- as.character(numeric_var)
numeric_var
[1] "12.5"

Just as there is an as.character() function, there is also an as.numeric() function (there’s also an as.logical() function, but I don’t think I’ve ever actually had used it)

Rather than bore you to bits by outlining all of the possible conversions you can do with as.numeric() and as.character(), you’re going to do it for me. Use the as.numeric() and as.character() functions to fill in the following table (I’ve already filled in the first row for you):

value Original type as.character(value) as.numeric(value)
12.5 numeric "12.5" 12.5
TRUE logical
FALSE logical
"howdy" character
"99" character
"1,200" character

Pay close attention to which value entries have quotes and which values do not.

Did any of these results surprise you?

When you run as.numeric("howdy") or as.numeric("1,200"), you should get an NA value, which is a missing value, along with a warning:

as.numeric("howdy")
Warning: NAs introduced by coercion
[1] NA

Unlike an error, which means that your code did not actually run, when you get a warning, your code has run, but R is telling you it’s not happy with you. When you get a warning, it’s a good idea to take a pause and consider that perhaps your code may not have done what you expected.

The warning NAs introduced by coercion happens when you try to convert characters to numbers. Characters cannot be converted to numbers, unless the character contains a number without any additional characters, as you should have seen when filling in your table above.

This means that this works:

as.numeric("99")
[1] 99

But this does not:

as.numeric("1,200")
Warning: NAs introduced by coercion
[1] NA

1,200 may look like a number, but the presence of the comma , means that R cannot parse the number inside the quotes. What is obvious to us is not always obvious to our computer overlords.

Extracting numeric values from characters

If you do want to convert a character containing a number, such as "1,200" to a numeric type, you can use the parse_number() function from the “readr” R library. You’ll learn more about libraries in future chapters, so don’t worry about running this code now–I just wanted to let you know that this exists!

# uncomment and run the next line of code to install the "readr" library:
# install.packages("readr")

library(readr)
parse_number("1,200")
[1] 1200
parse_number("I have 49 bananas")
[1] 49

3.3 NA values

Let’s talk briefly about the NA value (missing values). They are everywhere. You will often find that once they make their way into your data in R, missing values have a way of permeating your existence.

A missing value, NA, is a special type of object. Like TRUE and FALSE, your NA must be in all caps (i.e., you must yell when you type it).

For example, this is the NA value:

NA
[1] NA

But R thinks that the lowercase version, na, is a variable (and R then complains when I type na because I haven’t defined a variable called na):

na
Error in eval(expr, envir, enclos): object 'na' not found

NA values are annoying mostly because the result of any mathematical operation with an NA is always NA:

NA + 5
[1] NA
NA * 0
[1] NA

Armed with the knowledge that character values will be converted to NA when you apply as.numeric(), but numeric values can be converted to character values using as.character() just fine, try the following exercise.

Without running the following pieces of code, which of the following pieces of code will work, and what do you think the output will be?

as.numeric("TRUE") + 3
as.character(TRUE + 12)
as.character(as.numeric("35"))
as.numeric("TRUE") + 3
Warning: NAs introduced by coercion
[1] NA
as.character(TRUE + 12)
[1] "13"
as.character(as.numeric("35"))
[1] "35"

3.4 Asking questions with logical operations

Let’s go ahead and create two band-new numeric variables, x and y:

x <- 2
y <- 4

I’m now going to ask R some questions about x and y.

First question: “Is x equal to 2?”

x == 2
[1] TRUE

R answered “Yes!” But in R, “Yes!” is TRUE.

To ask a question of equality, we used two equals symbols ==.

Next question: “Is x less than or equal to 1?”

x <= 1
[1] FALSE

Again, R came through with an answer (this time FALSE). To ask a question of “less than or equal to”, we used a “less than” symbol < followed by an equals symbol =, giving me <=.

Although both == and <= kind of look like the assignment operators = and <-, they’re not affiliated in any way.

== and <= are “question asking” operators, or “logical operators” if you want to sound fancy (they’re called “logical operators” because they always result in a TRUE or FALSE logical result).

Before we asked if x was equal to 2 (x == 2), but we can also ask whether x is equal to y:

x == y
[1] FALSE

As well as “is x not equal to y” using the “not equal to” logical question operator of an exclamation point followed by an equals symbol != (not equals):

x != y
[1] TRUE

In fact, for any logical question we ask, we can ask its inverse by placing the original question in parentheses and prefacing it with a !. So the following is another way to ask x != y:

!(x == y)
[1] TRUE

The parentheses are necessary to tell R that we want the inverse of the entire question x == y (not just of x).

Here are some more questions:

“Is x strictly greater than y?”

x > y
[1] FALSE

“Is x greater than or equal to y?”

x >= y
[1] FALSE

“Is x strictly less than y?”

x < y
[1] TRUE

It’s almost like we’re talking to R and it’s replying! This is going to be really important later, since we will use these kinds of logical operations to filter to various subsets of our data based on logical conditions/questions.