2  R Fundamentals

Learn R live!

Prefer to learn via live instruction? Register for my Introduction to R for Data Analysis seminar via Instats on January 15-16 2025.

2.1 Doing math with R

Who needs a calculator, when you have R! I legitimately use R as a basic calculator all the time. And while R can do a lot more than just compute 1 + 1, it’s worth taking a moment to discuss basic mathematical operations of R.

Here are some helpful math symbols in R:

  • Parentheses: (, )
  • Exponents: ^ or **
  • Multiply: *
  • Divide: /
  • Add: +
  • Subtract: -

To follow along with the code examples that I provide in this chapter (and in this book in general), I recommend creating a new quarto document and practicing writing your own code in code chunks in your quarto document and running the code in the console by either pressing the green play/arrow button in the top right corner of the code chunks or using the Command+Return shortcut. Feel free to make some of your own notes in your quarto document outside. I recommend compiling/rendering your quarto document every now and then too!

Some basic mathematical computations you can compute in R include power calculations:

(3 + 5)^2
[1] 64

Division:

2 / 7
[1] 0.2857143

Note that R doesn’t really care about spaces, so this is the same as

2/7
[1] 0.2857143

But my recommendation is to always place a single space around mathematical operators (i.e., *, +, -, etc, with the exception of the power operator ^), so:

2 + 1
[1] 3
5 * 3
[1] 15
5^2
[1] 25

When writing code, even if the language itself doesn’t require certain syntax like spaces, it is a good idea to choose a syntax style and stick with it.

You can place multiple computations in the same code chunk, like this:

```{r}
5 + 109
(4 + 2) * 4
```

When a code chunk contains multiple pieces of code, they will all be computed separately when you compile your document and the output will look like this:

5 + 109
[1] 114
(4 + 2) * 4
[1] 24

2.2 Code comments

When you have multiple pieces of code in a single code chunk (or even a single piece of code), it is recommended that you use code comments to explain what your code is doing. Since R treats everything inside a code chunk as code, if you want to write some text comments inside a code chunk, you can tell R that your text is not code by placing a # symbol at the beginning of your text like this:

# compute 4 times 5
4 * 5
[1] 20

R will ignore anything that follows a # symbol. So in the above code chunk, R will ignore the first line with the code comment “compute 4 times 5”, and then it will compute the R code on the next line, 4 * 5.

Code comments are really helpful for explaining what your code is doing. I usually reserve the text outside code chunks for more general discussion of my data, analysis, and results and I reserve code comments inside code chunks for explaining my code itself. Since I tend to forget the reasons behind certain decisions I made in my code, adding explanations in code comments helps me remember my motivations and intentions days, months, or even years later.

2.3 Scientific notation

When doing mathematical calculations in R, very quickly you are going to start encountering some very strange-looking output. For example, if I compute

1 / 70000
[1] 1.428571e-05

or

12^15
[1] 1.540702e+16

You can see that my output looks a little strange.

When a number is very big or very small, R gets lazy and decides that it doesn’t want to print all of its digits. Rather than just making up random numbers, R is printing these numbers in scientific notation. 2e-05 means “0.00002”, i.e., there is a 2 in the 5th decimal place. On the other hand, 2e+05 (with a + instead of -), corresponds to 200000, i.e., “2” with 5 0’s after it.

No commas allowed!

Note that R doesn’t allow for commas in numbers. If you want to write a large number, you have to remove the comma:

# this is fine
70000
[1] 70000
# this is not fine -- note the "error" message
70,000
Error: <text>:1:3: unexpected ','
1: 70,
      ^

2.4 Mathematical functions

While being able to do addition, subtraction, and multiplication is super awesome, sometimes you will need to use more complex mathematical operations in your computations, such as the logarithm, exponential, and square root. Fortunately, there are functions in R that let you compute these operations.

A function is a piece of R code that is referenced using an alias or a name. A function typically takes an “argument”, such as a number, and it does something to the argument, such as compute the logarithm, and then it returns the result.

To apply a function to a value, you write the name of the function (e.g., log), followed by some parentheses (), inside which you provide the argument or value that you want to apply the function to, as in: log(2).

# compute the square root of 2
sqrt(2)
[1] 1.414214
# compute the log of 2
log(2)
[1] 0.6931472
# compute e^2
exp(2)
[1] 7.389056

2.5 Defining variables

One of the main features of coding in R is defining “objects” or “variables” (I use these terms interchangeably). Creating a variable essentially involves giving a value a name, allowing you to reference that value later. When we are ready to load some actual data, we will give that data a name by storing it in a variable.

Let’s store the value 1 in a variable called my_variable using the assignment operator: my_variable <- 1. Think of the assignment operator <- as an arrow, pointing from the value on the right to the variable name on the left.

my_variable <- 1 

Note that when you define a variable, no output is shown.

You can view the value of my_variable by writing it’s name:

my_variable
[1] 1

You can think of my_variable as an alias for the value 1. This means that anything that I could do to the value 1, I can now do to my_variable, such as adding 2 to it:

my_variable + 2
[1] 3
R is case-sensitive

R is case-sensitive, which means that I must write my variable name exactly as it is written. For example, the following will yield an error:

my_Variable
Error in eval(expr, envir, enclos): object 'my_Variable' not found

because the variable is called my_variable, not my_Variable.

Defining variables using =

Another way to define a variable is using “=”.

Below, I create another_variable, assign it the value 3

another_variable = 3
another_variable 
[1] 3

However, convention in the R community prefers the use of the <- assignment operator over the = assignment operator. So while = will work just fine, it is less common among seasoned R programmers.

Whenever we do a mathematical calculation using numeric values, we create a new numeric value, for example, the computation

1 + 1
[1] 2

creates the value 2.

You can also assign the output of a mathematical calculation to a variable.

# assign the output of 1 + 1 to the variable one_plus_one
one_plus_one <- 1 + 1
one_plus_one
[1] 2

It is important to make the distinction that one_plus_one does not contain the mathematical equation 1 + 1. Instead, it contains the numerical value, 2, which is the output of the equation 1 + 1.

one_plus_one doesn’t remember that it was created by computing 1 + 1, it just knows that the value it contains is 2.

Define a new object prod that contains the output of the product of 5 and 2. Print out prod by writing its name

prod <- 5 * 2
prod
[1] 10

2.5.1 Overwriting variables

Below I define my_number to be a variable containing the numeric value 5.

my_number <- 5

Next, I define a new variable called result that contains the product of my_number and 7 and I print it out:

result <- my_number * 7
result
[1] 35

Here, result is defined based on the value of my_number.

What do you think would happen to result if I redefine my_number to now contain 8?

# update the value of my_number to be 8
my_number <- 8

Do you think result will have changed? Try it yourself in RStudio or click the “Answer” tab below.

What happens to result?

# define result using `my_number`
result <- my_number * 7
result
[1] 56
# modify my_number
my_number <- 8

Result does not change.

result
[1] 56

When we defined result <- my_number * 7, we assigned result to the output of my_number * 7, which is 56.

Once it has been defined, result forgets all about my_number, it just remembers the value 56.

This means that changing my_number after having defined result will have no effect on result. There is no link between the two variables, even though result was originally defined using my_number!

Without running the code below, guess what the output/result will be:

value <- 1
computed_result <- value * 10 + 3^2
value <- value + 2
computed_result 

Note that the first three lines of code all involve defining variables and so no output is shown when these are run. The final line of code will print out the value of computed_result.

The second line computed_result <- value * 10 + 3^2 defines computed_result using value. Then the third line value <- value + 2 updates value. Since computed_result is assigned to the output of value * 10 + 3^2, which is 19, it doesn’t care when value is subsequently updated, and so the computed_result is still just equal to 19:

value <- 1
computed_result <- value * 10 + 3^2
value <- value + 2
computed_result 
[1] 19

2.5.2 Variable names

While you can give your variables almost any name you like, there are a few rules that you need to follow.

While variable names can contain letters, numbers, underscores, and periods, the recommended convention specifies that variable names should contain purely lowercase letters and numbers, with words separated by underscores.

For example, var_name and my_var are considered “good” variable names, whereas varName, VarName, and var.name are not.

Note that variable names cannot begin with numbers or underscores. If you try to create variables whose names are illegal, you will get an error, such as:

1plus1 <- 1 + 1
Error: <text>:1:2: unexpected symbol
1: 1plus1
     ^
_var <- 1 + 1
Error: <text>:1:2: unexpected symbol
1: _var
     ^

Which of the following are valid R variable names? Which ones are good variable names?

min_height
max.height
_age
MaxLength
min-length
2widths
  • min_height: this is a good variable name

  • max.height: this is a valid variable name, but not necessarily a “good” variable name (words should be separated with _, not .)

  • _age: this is not a valid variable name (variable names cannot start with _)

  • MaxLength: this is a valid variable name, but not necessarily a “good” variable name (words should be lowercase and separated with underscores)

  • min-length: this is not a valid variable name (words should be separated with _, not the minus sign -)

  • 2widths: this is not a valid variable name (variable names cannot start with numbers)