print("This program is brought to you by caffeine and StackOverflow.")R essentials
Introduction
This note provides a pragmatic introduction to the programming language R. Its purpose is to provide you with all the tools that you need to tackle the R mini project and to serve as a reference for the R coursework. It is aimed at people that are not too familiar with R or need a quick refresher on key concepts. Even if you are familiar with R, do have a look at the note and in particular at the last section “Project submission”. This section provides some guidance on the submission of your R coursework.
Overview
In this note you will learn:
- What is R, and why bother
- How to install R and RStudio
- Basic data structures in R
- R functions
- Conditional statements and loops
- How to load, manipulate, and visualise data
- Create a RMarkdown file for your R mini project
Approach
This note is designed to be a “hands-on” self study tool. It will introduce you to some concepts and gives exemplatory code snippets like:
You can copy these code snippets by clicking on the icon that appears on the right end of the code block when you hover over it with you cursor. Simply paste this into your R console to reproduce whatever is shown in this note. Sometimes, code snippets are hidden, like below:
code
print("Peek-a-boo!")You can expand them by clicking on the icon. You are invited to come up with a solution yourself, before looking at the example code. Sometimes, code examples will go beyond the basic concepts that we are introducing in the text. This is done solely for your viewing pleasure and to hopefully get you excited about R. Generally, this note does not constitute examinable material, but it should serve as a toolkit and reference to help bring you up to speed, and support you with theR workshop exercises and the R mini project.
Hopefully, the interactive nature of this note will peaque your interest in the introduced concepts and motivate you to explore a bit further on your own. In my opinion, this is the best way to learn a new programming language.
The presentation is deliberately kept light and short and is thus not at all a comprehensive summary on R. Several excellent, far more detailed, resources exist. If you are interested, have a look at the free ebook Advanced R by Hadley Wickham, or the official Introduction to R document or just google – the internet is a vast and knowledgeable place.
What is R?
R is a free, open-source programming language primarily used for statistical computing, data analysis and data visualisation. It is available for a wide range of operating systems such as Linux, MacOS and Windows. Aside from the base functionality that comes with the standard R distribution, R provides a plethora of packages (\(> 20'000\) at the time of writing), most of which are hosted on the Comprehensive R Archive Network (CRAN). These R packages are a collection of functions, along with documentation, and sometimes datasets, that expand R and make it a versatile and powerful tool for statistical computing and data analysis.
Why use R?
R is an excellent choice for data analysis and statistics due to its flexibility, mature package ecosystem, and its prowess in data manipulation and data visualisation. Its extensive ecosystem of packages, such as ggplot2 for visualisations and dplyr for data manipulation, makes R a powerful tool for statistical applications in academia and industry alike. Additionally, being open-source with a strong community (have a look at this slightly outdated post on R queries on StackExchange) means that R is continuously evolving with new tools and techniques, making it a go-to for reproducible research and cutting-edge data science.
Installing R and RStudio
Installing R
You can download the latest version of R on CRAN – make sure it is compatible with your operating system (OS) before installing. If you are using an older computer, you might have to download an older version of R from the archive. Just find your operating system in the archive and choose the R version that is right for you – if in doubt, google.
Running R from the command line interface
Once you have R installed, you can use R’s command line interface to interactively run code. This is the most minimalistic approach that may sometimes come in handy when you just want to try something quickly.
On MacOS and Linux you can do this by typing R into your terminal, or into your command prompt on Windows. This launches a R session, and you can now type you R commands into the window.
If typing R in your terminal or command prompt does not launch R, it is likely that the system has not added R to the PATH. PATH is an environment variable that bascially tells your OS where to look for the R exectuable. If your R installation is not added to the PATH, your computer cannot find it and the command fails.
On MacOS and Linux, you can check the path to your installation by typing which R into you terminal. On my mac, for example, this command yields /usr/local/bin/R, which means that I would have to add the path /usr/local/bin/ to the PATH variable.
On Windows, the R executable is typically located at C:\Program Files\R\R-x.x.x\bin. You can locate your R installation through Windows Explorer. Then, right-click on This PC (or My Computer) and select Properties. Click Advanced system settings > Environment Variables. Under System Variables, find the PATH variable, select it, and click Edit. Click New and add the path to the R bin folder (e.g., C:\Program Files\R\R-x.x.x\bin).
Before messing with your PATH variable, make sure that this is indeed the cause why you cannot run R – Google is your friend.
Installing RStudio
While we can now run R from our command line interface, we can only do so line by line. This may quickly become tedious or confusing if we are working on larger or complex tasks. For this reason, we want to add an integrated development environment (IDE), which is essentially a software application that sits on top of R, to improve our workflow.
In this course we use RStudio, which is also free and is arguably standard in the R community. To install RStudio, visit their website and install the RStudio version that is right for your OS. Again, if you are using an older computer, you might have to install an older version. You will have to google which version is compatible with your version of R and OS.
The RStudio window
Once you have launched RStudio, you will see that the RStudio window has 3 distinct panes, but we will add a fourth, namely a pane for our R scripts. To do so, click File > New File > R Script, or click on the icon, which will open a new R script in the top left of your RStudio window.
We now have the following four panes:
- Top left: The source pane; In this pane we have a R script, in which we can write a series of R commands to accomplish a certain task.
- Bottom left: The console pane; This is where we can interactively execute code.
- Top right: The environment pane; In the environment tab, we can see our current R working environment with all saved R objects.
- Bottom right: The output pane; This pane has different tabs such as Files, which we can use to navigate to and load R scripts or data, the Help tab, which provides some helpful resources and documentations, and the Plots tab, which displays plots from our current session.
Installing R packages
R packages are collections of functions and sometimes datasets, along with documentation. You can install any R package from CRAN by using the command install.packages(“packagename”), where “packagename” should be replaced with the name of the package you want to install. For example, to install the ggplot2 package, we would type:
# installing a R package
install.packages("ggplot2")In RStudio, you can alternatively install packages by navigating to the Packages tab in the output pane and clicking on the icon.
Exercise: Running your first R script
1. Open RStudio
Find the software RStudio on your computer and launch it.
2. Create a course folder
In the bottom right pane, click on the Folder tab, navigate to a location of your choice, such as Desktop and create a folder for this course, e.g. “ST227”, which will hold all the scripts and data that you create for this course. You can create the folder by clicking the icon.
3. Open a R script
Open a new script either by clicking File > New File > R Script, or clicking the icon and choosing R Script. Save it under an informative name, e.g.
ST227_exercise_1.R, by clicking File > Save As… or by clicking the icon.
4. Run commands
We are now in a position to add our first R commands to our script. For example, we can produce a plot of normal random variables like so:
# histogram of normal random variables
set.seed(1)
normal_vars <- rnorm(500)
x <- seq(from = -6, to = 6, by = 0.01)
normal_density <- dnorm(x)
hist(normal_vars, freq = FALSE, xlim = c(-4,4))
lines(x, normal_density, col = "red", lwd = 2)Once you have this code in your script, you can run it either by pasting it into your console or running the script directly. To do so, either select the code that you wish to run and press the icon, or by typing Cmd-Enter (MacOS) or Ctrl-Enter (Windows). This should produce the plot below in your Plots tab in the bottom right pane.
If you are curious about what is happening in this example, feel free to investigate by typing ? followed by the command you are interested in into the console, e.g.
# accessing documentation of a function
? rnorm This will open the documentation of the function rnorm in the Help tab of your bottom right pane. You can access documentation like this for any R function that you are interested in. This is helpful if you already know the name of the R function you are interested in. If you do not know which R function to use, you can type ?? followed by something that you are interested in, R will scan all installed libraries for the expression that you are looking for. For example you could type:
# scanning documentations for an expression
?? histogram to scan the documentation of all your packages for the word histogram.
5. Quit the R session
Before quitting your session, make sure you have saved your work. You can quit your R session by typing:
q() into your consolse. This will prompt the following response:
Save workspace image? [y/n/c]:
If you answer y, all objects that you defined this session are available upon relaunch. Usually, it is best to answer n to get a clean workspace the next time. c allows you to return to your session.
Basic data structures in R
In this section, we will learn about the basic data structures that we are most likely to encounter in this course. If you want to dig deeper, have a look at this chapter from the book “Advanced R” by Hadley Wickham.
Names and values
Before we jump into data structures and how to manipulate them, we first need to understand how to create and call objects in R. The symbol for variable assignment in R is <- and the syntax is:
x <- c(1, 2, 3)Under the hood, when we run the above command, R creates an object, in this case a vector containing the values \(1,2,3\), and binds this object to a name, x. Now, when we call x by typing it in the console, R looks for the object that is bound to the name, and returns it
x[1] 1 2 3
You can also use the more conventional = symbol for variable assignment, however it is common practice to use the <- symbol for variable assignment and the = for named function arguments (more on this in the Functions & Control Flow section), and we too shall adhere to this.
Vectors
The basic data structures in R are vectors. You can think of vectors are one-dimensional containers of data. Even when you think you work with single elements, such as a real number, you are working with a vector of length one. In R, there are two different types of vectors: atomic vectors and lists. Atomic vectors require all of their elements to be of the same types, while lists can contain elements of any type, even lists. Vectors have two common properties that are relevant to us:
typeof(v): This tells us what type the vector islength(v): This tells us how many elements are stored in the vectors
We will look at three common kinds of atomic vectors, namely numeric, character and logical vectors as well as lists.
Numeric Vectors
Numeric are vectors that contain numbers, either integers or real numbers (called double in R).
The simplest way to create a numeric, or any other, vector is using the c() function, which combines individual values into a vector. Other ways to generate a vector that you may encounter are with the seq() function, which creates a sequence of numbers, or the rep(), which repeats values.
# creating numeric vectors
x <- c(1, 2, 3)
y <- seq(from = -10, to = 10, by = 2)
z <- rep(x, times = 2, each = 3)
x [1] 1 2 3
y [1] -10 -8 -6 -4 -2 0 2 4 6 8 10
z [1] 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3
Vector arithmetic
Simple arithmetic in R works as we would expect:
# a simple calculator
1 + 2 [1] 3
(3 * 4) - 7[1] 5
2 * cos(2 * pi) - exp(1)^log(2)[1] 2.220446e-16
You can check out ? Arithmetic or the R language documentation (Section 2) for more information on available operators. The last result is \(2.220446 \times 10^{-16}\), a very small number that is equivalent to R’s machine precision
# R's machine precision
.Machine$double.eps[1] 2.220446e-16
which is the distance between one and the next largest number that can be represented in R. This means that if \[0 \leq \delta < \texttt{.Machine\$double.eps}\,,\] then \(1+\delta = 1\) in R:
# computer maths
delta <- 0.999 * (.Machine$double.eps / 2)
1 + delta == 1 [1] TRUE
For our purposes, this means that numbers that are approximately equal to .Machine$double.eps are zero. If you wish to understand this deeper, have a look at the wiki page.
On vectors, operations act elementwise, which is called vectorisation in computer science. For example:
# vectorised operations
x <- c(1, 2, 3)
x^x [1] 1 4 27
A peculiar behaviour of R is recycling. That is, if two vectors involved in an operation are not of the same length, the shorter element will be repeated to match the length of the longer.
# R's recycling behaviour
(1:3) + (2:9) Warning in (1:3) + (2:9): longer object length is not a multiple of shorter
object length
[1] 3 5 7 6 8 10 9 11
If the length of the longer vector is a multiple of the length of the shorter, this is done without warning.
# recycling without warning
(0:1)^(0:5) [1] 1 1 0 1 0 1
(0:5)^(0:2)[1] 1 1 4 1 4 25
Generally, use of recycling is error-prone and is, in my opinion, best avoided. An exception is when the shorter element has length one.
We conclude with a collection of useful functions for numeric vectors that you might come across. These are:
sum(x)
mean(x)
var(x)
range(x)
length(x) and you can probably guess what they do, but as always you can check by typing ? sum etc. in your console.
Character Vectors
We said in the previous section that we can think of numeric vectors as containers that hold numeric elements. This poses the question whether we can populate vectors with other types of data, such as text – and the answer is yes!
The basic datatype in R that captures text is called character. We can define a character using " " or ' ' such as:
my_name <- "Peter Pan"
my_name [1] "Peter Pan"
The construction of character vectors is very similar to numeric vectors.
my_family <- c("Peter Pan", "Captain Hook", "Tinkerbell")
parrot <- rep("Patience, Iago, patience.", times = 3)We can combine elements of character vectors using the paste function, which may be useful when we need to define variable names:
numbered_vars <- paste("x", 1:4, sep = "")
numbered_vars [1] "x1" "x2" "x3" "x4"
Logical Vectors
We can also construct vectors containing a special classs of datatypes that are either true or false, which are called logicals in R. These often arise from operations where we essentially ask whether a certain statement is true or false. For example:
# logical dataypes in the wild
x <- 1
y <- 2
x < y [1] TRUE
is.character(x) [1] FALSE
The symbols that we use to construct such true-or-false statements from comparisons are called comparison operators. These are:
==: Check for equality!=: Check if two elements are not equal>,>=: Check whether one element is (strictly greater than the other)<,<=: Check whether one element is (strictly smaller than the other)
We can also chain together multiple such statements using so-called logical operators:
|: Logical OR operator||: Logical OR operator, evaluates expressions in order and stops if true&: Logical AND operator&&: Logical AND operator, evaluates expressions in order and stops if false
The functions any and all are useful to check whether any or all elements of a logical vector are true.
# chaining together comparisons using logical operators
set.seed(1)
nums <- runif(10)
extremes <- nums < 0.1 | nums > 0.9
any(extremes)[1] TRUE
Lists
Sometimes we may want to group data of different types and lengths to a single object. In R, we can do this with a list, which we can construct with the list function.
# do not try this at home
a_bad_recipe <- list(
dish_name = "Chocolate Lava Cake",
prep_time_minutes = 30,
ingredients = c("dark chocolate", "butter", "flour", "sugar", "eggs"),
ingredient_quantities = c(200, 175, 30, 125, 3),
calories_per_serving = 450.75,
is_vegan = FALSE,
steps = list(
step_1 = "Melt chocolate and butter",
step_2 = "Mix in whisekd egg + sugar, and flour",
baking_time_minutes_min_max = c(12, 20),
baking_temp_C = 180
)
)
a_bad_recipe $dish_name
[1] "Chocolate Lava Cake"
$prep_time_minutes
[1] 30
$ingredients
[1] "dark chocolate" "butter" "flour" "sugar"
[5] "eggs"
$ingredient_quantities
[1] 200 175 30 125 3
$calories_per_serving
[1] 450.75
$is_vegan
[1] FALSE
$steps
$steps$step_1
[1] "Melt chocolate and butter"
$steps$step_2
[1] "Mix in whisekd egg + sugar, and flour"
$steps$baking_time_minutes_min_max
[1] 12 20
$steps$baking_temp_C
[1] 180
We can also construct lists without explicitly naming their components in the construction. If we wish, we can assign and change names using the names() function:
# Creating unnamed list and assigning names using `names()`
another_list = list(rnorm(10), letters, cars[1:10,])
names(another_list) <- c("Random numbers", "Letters", "Cars Data")
another_list$`Random numbers`
[1] -0.8204684 0.4874291 0.7383247 0.5757814 -0.3053884 1.5117812
[7] 0.3898432 -0.6212406 -2.2146999 1.1249309
$Letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
$`Cars Data`
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
7 10 18
8 10 26
9 10 34
10 11 17
Matrices & Arrays
So far, we have only been concerned with one-dimensional data structures. We can extend these data structures to two or more dimensions in R using matrix and arrays. We construct matrices as
# construction of a matrix
a_matrix <- matrix(data = 1:12, nrow = 3, ncol = 4)
a_matrix [,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
data here is the data that we want to populate our matrix with, whereas nrow and ncol specify the number of rows and columns, respectively. Note that R fills matrices in column-major order, meaning they are filled column-wise from left to right by default. If our data is too short to populate the whole matrix, R again uses recycling – caution is advised.
# recycling -- it's a dangerous world out there
another_matrix <- matrix(data = 1:4, nrow = 3, ncol = 4)
another_matrix [,1] [,2] [,3] [,4]
[1,] 1 4 3 2
[2,] 2 1 4 3
[3,] 3 2 1 4
You can also construct, or append, matrices by adding vectors or matrices column-wise or row-wise with cbind or rbind respectively.
# using cbind and rbind to construct and append matrices
patched_matrix <- cbind(1:3, rep(10, 3),
seq(from = .1, to = 10, length.out = 3),
runif(3))
large_patched_matrix <- rbind(another_matrix, patched_matrix)
large_patched_matrix [,1] [,2] [,3] [,4]
[1,] 1 4 3.00 2.0000000
[2,] 2 1 4.00 3.0000000
[3,] 3 2 1.00 4.0000000
[4,] 1 10 0.10 0.4820801
[5,] 2 10 5.05 0.5995658
[6,] 3 10 10.00 0.4935413
Matrices come with a lot of useful operators, such as matrix multiplication %*%, transpose t() or ?solve , ?crossprod , ?qr , ?eigen , ?svd.
Arrays are extensions of matrices to arbitrary dimensions. We construct arrays as follows:
# constructing an array
an_array <- array(data = 1:27, dim = c(3,3,3)) where the dim keyword specifies the number of elements in each dimension.
Matrices and arrays, unlike lists or dataframes, can only hold a single data type. R will try to convert datatypes to meet this requirement and if it succeeds, it will not error or warn you.
# changing data types of a matrix
another_matrix[1,1] = FALSE
another_matrix [,1] [,2] [,3] [,4]
[1,] 0 4 3 2
[2,] 2 1 4 3
[3,] 3 2 1 4
another_matrix[3,4] = my_name
another_matrix [,1] [,2] [,3] [,4]
[1,] "0" "4" "3" "2"
[2,] "2" "1" "4" "3"
[3,] "3" "2" "1" "Peter Pan"
Dataframes
A dataframe in R is a matrix-like object, in which columns must have the same data type, but these data types are allowed to vary between columns. Dataframes are arguably the most common data type for data analysis in R. You can think of a dataframe as a list of equal-length vectors, which is what they secretly are. For example, consider the iris dataset that comes with R. It is a dataframe, but when we check its type, R returns:
# dataframes are lists
typeof(iris)[1] "list"
If we wanted to make sure that we are indeed dealing with a dataframe, we can use the attributes function. Attributes is just a list of metadata that any R object can have. For example
# attributes of a dataframe
attributes(iris)$names
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
$class
[1] "data.frame"
$row.names
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
[109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
[127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
[145] 145 146 147 148 149 150
We see that dataframes also have name attributes. We can evaluate, or change, the column names using names or columnames and the row names using rownames.
Accessing elements
We have now seen how we can construct and manipulate different data structures. Oftentimes, we want to access particular elements of our data. For this we have two different techniques, accessing by name and accessing by index.
Accessing by name
For objects whose components have been named, we can use these names to access objects.
Lists
In named lists, we access named elements using the syntax list_name$element_name. For example:
# accessing list elements by name
a_bad_recipe$ingredients[1] "dark chocolate" "butter" "flour" "sugar"
[5] "eggs"
This is equivalent to list_name[["element_name"]]:
# accessing list elements by name
all(a_bad_recipe$ingredients == a_bad_recipe[["ingredients"]])[1] TRUE
list_name[["element_name"]] returns the element with name element_name, while list_name["element_name"] returns a list containing the element that binds to element_name.
Matrices
If matrices have either row or column names specified, we can access elements of such a matrix by name. The syntax is matrix_name["row_name", "column_name"]. If we want a whole row or column, we can access it by matrix_name["row_name",] and matrix_name[,"column_name"], respectively.
# accessing elements of a named matrix
payoffs <- c("Both get 1yr",
"B gets 3yrs, A free",
"A gets 3yrs, B free",
"Both get 2 yrs")
prisoners_dilemma <- matrix(payoffs, nrow = 2, ncol = 2)
colnames(prisoners_dilemma) <- c("B stays silent", "B testifies")
rownames(prisoners_dilemma) <- c("A stays silent", "A testifies")
prisoners_dilemma["A testifies", "B testifies"][1] "Both get 2 yrs"
Dataframes
Since dataframes have both list and matrix like behaviour, we can use both the dataframe_name$column_name and the [, "column_name"] syntax to access its columns. If it has named rows, we can also use matrix_name["row_name", "column_name"] just like for matrices.
# Access elements by row and column name
pets <- c("cat", "dog", "brachiosaurus", "cactus")
people <- c("Jana", "Jane", "June")
set.seed(1)
pet_ratings <- as.data.frame(matrix(sample(1:10,
size = length(pets) * length(people),
replace = TRUE),
nrow = length(people)))
rownames(pet_ratings) <- people
colnames(pet_ratings) <- pets
pet_ratings$dog[1] 1 2 7
pet_ratings["June", c("cat", "cactus")] cat cactus
June 7 10
Acessing by index
Very often, our data objects are not named or we may want to access a selection of particular elements across different dimensions. We can do this by using indices.
For vectors, matrices, arrays and dataframes alike, we can do this with []. Let’s revisit the iris dataframe:
# select elements by numeric index
i_1 <- c(4,1,3)
i_2 <- 2:4
iris[i_1, i_2] Sepal.Width Petal.Length Petal.Width
4 3.1 1.5 0.2
1 3.5 1.4 0.2
3 3.2 1.3 0.2
The comma separator [ , ] is used to specify the dimensions of the element (? dim).
We can also select all elements except a selection using the - operator. If we want all elements across a particular dimension, we just leave it empty.
# select only the last 5 observations
number_obs <- nrow(iris)
all_but_last_five <- 1:(number_obs-5)
iris[-all_but_last_five,] Sepal.Length Sepal.Width Petal.Length Petal.Width Species
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
Finally, we can also access elements using logical vectors as indices.
# Access elements using logical vectors
janas_favourites = pet_ratings["Jana",] >= 5
janas_favourites cat dog brachiosaurus cactus
Jana TRUE FALSE FALSE TRUE
pets[janas_favourites] [1] "cat" "cactus"
Loading data
There exist several ways to load different data types. Here we will only look at two common data types: .csv-files, and .xlsx-files. You can almost certainly load any other data format that you encounter, just google – StackOverflow is a great resource.
First, let us download a dataset from the internet. This is a fictional dataset which investigates the effect of alcohol on percieved attractiveness from Andy Field’s Discovering Statistics Using R and RStudio.
# Loading fictional beer goggles dataset from internet
url <- "https://www.discovr.rocks/csv/goggles.csv"
file_name <- "goggles.csv"
download.file(url = url, destfile = file_name, quiet = TRUE)This downloads the file in the current working directory, which you can check via getwd(). It is good practice to set the appropriate working directory at the beginning of an R script.
Next, we can load the file using the read.table function, where we specify the symbol that separates data, in our case “,”, since we are loading a csv (Comma Separated Values) file.
# Load dataset as dataframe
beer_goggles <- read.table("goggles.csv", sep = ",", header = TRUE)
head(beer_goggles) id facetype alcohol attractiveness
1 vfnoxj Attractive Placebo 6
2 hqfxap Attractive Placebo 7
3 obicov Attractive Placebo 6
4 oobiyc Attractive Placebo 7
5 snafxn Attractive Placebo 6
6 vihqnn Attractive Placebo 5
The code below investigates the influence of alcohol consumption on percieved attractiveness in this dataset.
code
# Compute mean and 25% and 75% quantile of attractiveness
library("dplyr")
beer_goggles <- beer_goggles %>%
group_by(facetype, alcohol) %>%
summarise(
attractive_mean = mean(attractiveness),
attractive_lq = quantile(attractiveness, 0.25),
attractive_uq = quantile(attractiveness, 0.75)
)
# Plot mean percieved attractiveness and 25% and 75% quantiles
library("ggplot2")
ggplot(beer_goggles, aes(x = alcohol,
y = attractive_mean,
color = facetype,
group = facetype)) +
geom_point(size = 3) +
geom_line() +
geom_errorbar(aes(ymin = attractive_lq,
ymax = attractive_uq),
width = 0.5) +
labs(x = "Alcohol consumption",
y = "Percieved atrractiveness") +
theme_minimal() To read excel files, we shall use the readxl package, which we can install by install.packages("readxl"). This package also contains some example datasets, which we can use for our demonstration.
# Load .xls file using `readxl`
library("readxl")
data_location <- readxl_example("datasets.xlsx")
example_data <- as.data.frame(read_excel(data_location, sheet = 3))
head(example_data) weight feed
1 179 horsebean
2 160 horsebean
3 136 horsebean
4 227 horsebean
5 217 horsebean
6 168 horsebean
We can now visualise the influence of different feed types on weight using barplots by running the code below.
code
# plotting influence of feed ype on weight
ggplot(example_data, aes(x = feed, y = weight)) +
geom_boxplot(fill = "lightblue", color = "black") +
labs(x = "Feed Category",
y = "Weight",
title = "Weight per Feed Category") +
theme_minimal() In the examples above, we have used the R packageas dplyr and ggplot. This is solely for illustration purposes and you are not expected to know how to work with these.
Exercise: Loading and manipulating a dataset
We start by opening RStudio, and open a new file, which we call exercise_2.R. In the script we use the # to add comments to our code. Before starting to work on our data, it is good practice to set the working directory and load all packages that we wish to work with in this script. You can check your current directory by typing getwd() and set it using the setwd() function. For example:
##########################################
## ST227 R Exercise 2: Data manipulation ##
##########################################
# set working directory
my_path <- "" # replace with path to your ST227 folder here
setwd(my_path)
# load packages
library("readxl")
# Problem 1: loading a dataset ...1. Download a dataset
Go to whatdotheyknow and save the dataset FoI 5067 ii.xlsx in your ST227 folder. This dataset contains
predicted A-level, IB, and AP grades breakdown of students offered admissions from China (Mainland) by course.
Have a look at the data in Excel. How is it organised, are there column headers?
2. Load dataset
We want to load the second sheet of our dataset using the read.xlsx function from the readxl package and convert the loaded data into a dataframe
code
# loading the dataset
admission_data <- read_xlsx("FoI 5067 ii.xlsx", sheet = 2)
# convert to dataframe
admission_data <- as.data.frame(admission_data)
# inspect first 5 elements
head(admission_data, n = 5) In RStudio, you can also browse through the data by clicking on admission_data in your top right pane.
3. Manipulate data
Let us simplify the data a little. Let us check, which different qualification types occur in this dataset and how often. We can use the unique and table functions for this.
code
# looking at unique qualifictation types
types <- admission_data$"Qualification Type"
unique_types <- unique(types)
unique_types_perc <- table(types) / length(types)Let us focus on the “International Baccalaureate Diploma”, which makes up the majority of the dataset. That is, we only want rows for which “Qualification Type” is “International Baccalaureate Diploma”.
code
# subsetting data
indices <- types == unique(types)[1]
admission_subdata <- admission_data[indices, c(1,4)]Note that the “Predicted Grade” column has type character, so we need to convert it to type double.
code
# converting a column of type `character` to type `double`.
grades <- admission_subdata$"Predicted Grade"
typeof(grades[1])
admission_subdata$"Predicted Grade" <- as.double(grades) Now we can check the average predicted grade per programme. We can either do this one by one, subsetting the data and using the mean function, or more efficiently, using aggregate.
code
# computing mean predicted grade per programme
aggregate(`Predicted Grade` ~ Programme,
data = admission_subdata,
FUN = mean)These are a lot of programmes. Let us further simplify the data by only selecting programmes that have a substantial maths component. We can to this by specifying a collection of words that occur in maths oriented programmes, and then search the dataframe for these words using grepl.
code
# selecting only programmes with maths component
maths_words <- c("Mathematics", "Finance", "Actuarial", "Economics")
pattern <- paste(maths_words, collapse = "|")
maths_inds <- grepl(pattern, admission_subdata$Programme)
admission_maths <- admission_subdata[maths_inds, ]
aggregate(`Predicted Grade` ~ Programme,
data = admission_maths,
FUN = mean)Finally, let us visualise the distribution of different grades for subjects that are maths oriented.
code
# Boxplot of predicted grades per mathsy programme
par(mar = c(10, 4, 4, 1), cex.axis = 0.5)
boxplot(`Predicted Grade` ~ Programme, data = admission_maths,
main = "Boxplot of Predicted Grade per Programme",
xlab = "",
ylab = "Predicted Grade",
col = "lightblue",
border = "black",
las = 2)4. Save the data and script
Let us now save our cleaned up data. The most efficient way to save R objects is to save them as .RData. We do this using the save function.
code
# save data as .RData
save(admission_data,
admission_subdata,
admission_maths,
file = "admission.RData") We can check whether this has worked by first deleting all objects on our workspace using rm and then re-loading the data using load.
code
# Removing all objects from workspace
rm(list = ls())
# loading admission datasets
load(file="admission.RData")
# check data is loaded
ls() Functions & Control Flow
Functions
In R, functions are objects that take a set of inputs, and give some outputs based on a series of statements. We have already come across a number of functions, for example + or mean.
When we are working on a complex task, functions are a great way to simplify our program, automate repetitive tasks to make our workflow more efficient, more readable and less error-prone compared to just writing one large file with a series of consecutive commands.
In R, we can define function as follows:
# defining a function
function_name <- function(arguments){
# function body
}We see that a function is comprised of the following components:
- A
function_name, which we use to call the function - An enumeration of function
arguments, that we supply to the function - The function body, where we transform our inputs in some way
For example, we can define
# function to compute the square of a number
square <- function(x){
x_squared <- x * x
return(x_squared)
}
square(2) [1] 4
The return statement is optional in R, we could also call the object that we want to return at the end of the function body, e.g.
# function without explicit return statement
square <- function(x){
x * x
}We can construct functions with as many arguments as we want, all separated by a “,”. It is also possible to supply default values to some or all of our arguments. For example:
# define a function with three arguments
chat <- function(my_name, your_name, controversial = FALSE){
our_chat <- paste("Hello ",
your_name,
"! ",
"I am ",
my_name,
". ",
sep = "")
if(controversial){
our_chat <- paste(our_chat,
"I like pineapple on pizza.",
sep = "")
}else{
our_chat <- paste(our_chat,
"Would you like to see pictures of my cat?",
sep = "")
}
return(our_chat)
}
chat("Romeo", "Juliet") [1] "Hello Juliet! I am Romeo. Would you like to see pictures of my cat?"
Since the argument has controversial has a default value, we do not need to supply it. If we wish to supply another value, we simply call:
chat("Romeo", "Juliet", TRUE) [1] "Hello Juliet! I am Romeo. I like pineapple on pizza."
We can also supply arguments by name, in which case the order in which they appear does not matter:
chat(controversial = FALSE, your_name = "Juliet", my_name = "Romeo")[1] "Hello Juliet! I am Romeo. Would you like to see pictures of my cat?"
Control Flow Structures
Control Flow is a term from computer science that describes in which order the statements of a program are executed. We will look at two control flow structures, namely conditional statements and iterations.
Conditional statements, or if-else statements, take the form
# a basic if-else statement
if(condition){
# do something if condition == TRUE
}else{
# do something if condition == FALSE
}In the code above, R first evaluates the condition of the if statement. This condition can either be an explicitly defined logical variable or an expression like x > 2 – but its type must always be logical. Depending on whether the condition is true or not, it will then evaluate the if(){} or the else{} block.
We can also nest together multiple if-else statements using else if(){} as follows:
# multiple conditional statements
if(x > 0){
# do something if x > 0
}else if(x == 0){
# do something if x == 0
}else{
# do something if neither x > 0 nor x == 0, i.e. x < 0
}or just have the if(){} statement wihtout an else{}.
# A conditional statement without else-clause
if(happy_and_know_it){
print("Clap your hands")
}We can also nest conditional statements like so:
# A nested conditional statement
if(happy_and_know_it){
if(really_want_to_show_it){
print("Clap your hands")
}
}The second control flow structure we consider is iteration, or looping. Iterations are programming structures that repeat a set of code until a specific condition is met. The two most common iterations are for-loops and while-loops.
A for-loop is used to iterate over values in a vector. Their basic structure is:
# basic structure of a for-loop
for(item in vector){
# do something
}For example, we can use it to compute the Fibonacci sequence, which is given by
\[ F_0 = 0, \, F_1 = 1\,, F_n = F_{n-1} + F_{n-2} \,(n >2)\] as follows:
# compute first n values of fibonacci sequence
fibonacci <- function(n){
sequence <- rep(NaN, n)
for(i in 1:n){
if(i == 1){
sequence[i] <- 0
}else if(i == 2){
sequence[i] <- 1
}else{
sequence[i] <- sequence[i - 1] + sequence[i - 2]
}
}
return(sequence)
}
fibonacci(5) [1] 0 1 1 2 3
Contrary to the for-loop, which iterates over a finite collection of elements and terminates thereafter, a while-loop iterates indefinetely until a certain condition is met.
It has the basic structure:
# basic structure of a while-loop
while(condition){
## do something as long as condition == TRUE
}R does not check whether a while-loop might run for eternity, so be careful to make sure they do terminate, the loop below, for example, would run forever.
# this stunt is performed by trained professionals, do not try this at home
while(TRUE){
print("I should have known better")
}If you run the code above, you will have to forcibly stop execution. You can do this by typing Ctrl + C into your console or by clicking the icon in your output pane. If RStudio is frozen, you will have to force quit RStudio, which might cause you loosing unsaved work.
A more sensible illustration of a while-loop is provided by the function below, which approximates the golden ratio, the limit of consecutive quotients from the Fibonacci sequence:
\[\varphi = \lim_{n \to \infty} \frac{F_n}{F_{n-1}}\]
# compute the golden ratio
golden_ratio <- function(error_tol = 1e-9, max_iter = 1000){
iter_count <- 0
ratios <- rep(0,2)
fibonacci_last_two <- c(0,1)
error = Inf
while(error > error_tol && iter_count < max_iter){
new_fibonacci <- sum(fibonacci_last_two)
new_ratio <- new_fibonacci / fibonacci_last_two[1]
ratios[2] <- ratios[1]
ratios[1] <- new_ratio
fibonacci_last_two[2] <- fibonacci_last_two[1]
fibonacci_last_two[1] <- new_fibonacci
error <- abs(diff(ratios))
iter_count <- iter_count + 1
}
return(c(ratios[1], iter_count))
}
golden_ratio() [1] 1.618034 25.000000
Note that we have added an iteration counter to ensure that the while-loop terminates eventually. Even though we know that consecutive quotients from the Fibonacci sequence convergence, including an interation counter is good practive, as there are no guarantees that our implementation is correct.
Data Visualisation
In this section we will look at some basic ways to visualise data using R. One of the appeals of R is its prowess in porducing high-quality plots with very simple code. Once could go as far as to say that plots using R’s ggplot2 package have become the gold standard in producing plots for statistical analysis and research. If you intend to continue working with R in your academic or professional future, I highly encourage you the familiarise yourself with this package.
In this note, we content ourselves with the plotting funcitonalities of base R. The plotting functionality of base R is still highly versatile and a lot simpler than ggplot.
Line and scatter plots
Lines and scatter plots are produced using R’s plot function. The basic inputs are:
- The \(x\) and \(y\) coordinates of the points that we wish to plot. If we only supply one vector, then R plots these values as \(y\)-coordinates against the vector indices.
- An optional
typeargument which specifies the type of the plot that we want to create. “p” (points, default option) creates a scatter plot, “l” creates a line plot and “o” overlays the points onto the lines. Other types are available, but these are the most useful. - An optional
colargument, which specifies the colouring of the plot - Optional
xlab, ylab, mainarguments which specify labels for the \(x\)-axis, \(y\)-axis, and the main title respectively.
You are encouraged to have a look at the plot documentation to check out ohter features. If there is a particular plot that you want to achieve, just check the documentation or google, most likely it is possible and has already been done before.
Let us look at some plots now. We start with a simple scatter plot.
# A simple scatter plot
set.seed(1)
xs <- rnorm(100)
plot(xs)Here, we have only supplied one set of values, so R plots the values against the indices of the supplied vector.
We can also plot two vectors against each other, for example in a QQ-plot.
# A simple QQ-plot
n <- 200
set.seed(1)
xs <- rnorm(n)
sorted_xs <- sort(xs)
quantile_positions <- (1:n) / (n + 1)
theoretical_quantiles <- qnorm(quantile_positions)
plot(theoretical_quantiles, sorted_xs,
xlab = "Theoretical quantiles",
ylab = "Sample quantiles",
main = "Normal QQ-plot")
lines(theoretical_quantiles, theoretical_quantiles,
col = "red",
lty = 2,
lw = 2)Here we added a \(45\)-degree line to an existing plot using the lines function. You can similarly add points using the points function.
Let us now make a line plot. The basic syntax is plot(x, y, type = "l"). To show you what is possible, and for your later reference, I have added a custom axis and a legend to the plot.
# A line plot
xs <- seq(from = 0, to = 2 * pi, length.out = 1000)
cosins <- cos(xs)
sins <- sin(xs)
xticks <- seq(from = 0, to = 2 * pi, by = pi/2)
xlabs <- c(0,
expression(pi/2),
expression(pi),
expression(3*pi/2),
expression(2*pi))
plot(xs, cosins,
type = "l",
xlab = "x",
ylab = "y",
main = "Plot of the sine and cosine function",
xaxt = "n")
lines(xs, sins, col = 3, lty = 2)
axis(1, at = xticks, labels = xlabs)
legend("right",
legend = c("cos(x)", "sin(x)"),
col = c(1,3),
lty = c(1, 2),
cex = .5)Histograms and density plots
We can plto a histogram with the hist function. The basic syntax is hist(x). The two most useful optional arguments are breaks, which allows one to set number of bins for the histogram and if desired their position. freq allows to specify whether histogram depicts counts (TRUE), or density (FALSE).
Rather than plotting a histogram to depict the empirical distribution of a random variable, we may sometimes want to plot a smooth estimate of its density. We can do this using the density function in R, which provides a kernel density estimate.
As before, we can overlay other line plots and even histograms over an existing histogram plot. Below is an example.
# A histogram
set.seed(1)
xs <- rnorm(100)
smoothed_xs <- density(xs)
x_grid <- seq(from = -4, to = 4, length.out = 1000)
densities <- dnorm(x_grid)
hist(xs,
col = "lightblue",
freq = FALSE,
xlab = "x",
main = "Histogram with smoothed and theoretical density")
lines(smoothed_xs, col = "black", lw = 2)
lines(x_grid, densities, col = "magenta", lw = 2)
legend("topright",
legend = c("Histogram",
"Smoothed density",
"Theoretical denistiy"),
col = c("lightblue", "black", "magenta"),
pch = c(15, NA, NA),
lty = c(NA, 1, 1),
lwd = c(NA, 2, 2),
cex = .5)Box plots
Another option to depict the distribution of a random variable is via a boxplot. In R the basic syntax for a boxplot of a single variable is boxplot(x).
What is really useful in R’s boxplot function is that we can simultaneously produce boxplots across groups from a dataframe. We have already come across a boxplot in our exercise on data manipulation. Recall the dataset admission_maths:
load(file="admission.RData")
head(admission_maths, n = 5) Programme Predicted Grade
13 BSc in Actuarial Science 40
15 BSc in Economics 45
16 BSc in Economics 45
17 BSc in Economics 44
18 BSc in Economics 45
We can now produce a boxplot of the grade distribution per Programme using a so-called formula, which consists of two variable names from our dataframe, separated by a ~ symbol. In our case this is `Predicted Grade` ~ Programme. The first component specifies the variable of interest, the second the grouping variable. We had to enclose the variable Predicted Grade in `` as it contains a space character. Below is the full boxplot
# A grouped boxplot
par(mar = c(10, 4, 4, 1), cex.axis = 0.5)
boxplot(`Predicted Grade` ~ Programme, data = admission_maths,
main = "Boxplot of Predicted Grade per Programme",
xlab = "",
ylab = "Predicted Grade",
col = "lightblue",
border = "black",
las = 2)The par function above allows one to set graphical parameters, such as the margins of a plot, which we had to change to accommodate the large labels of our Programme variable.
Exporting graphics from R
To export graphics in R, we can eihter navigate to the output pane, select the Plots tab and then click on the icon.
Alternatively, we can run the following series of commands.
Before the plot, we run:
# exporting plots: before plotting
png(file = my_path,
width = 4,
height = 4) where my_path should point to the directory where you want to save the plot, and icnlude the name of the plot and the type suffix. For example, this could be “/Users/LSE/ST227/a_plot.pdf”. Possible file types are pdf, jpg, png. The width and height arguments are optional and specify the width and height in inches, respectively.
Next we run our plot, and any other low level plotting commands, such as modifying the axes, adding a legend or a line etc.
# exporting plots: run your plots
hist(rnorm(100), freq = FALSE)
lines(seq(-4,4,0.01), dnorm(seq(-4,4,0.01)))Finally, after plotting we run
# exporting plots: after plotting
dev.off()Project submission
For the R mini project, you will be asked to produce a pdf file or your work. This file will contain:
- code that you used to compute your answers
- plots
- plain text explanations
The, at first sight simplest way to do this, is to write a R script with everything that you need to solve the problems and generate plots and then copy paste code and pictures into a word file, which you save as pdf. While, there is nothing inherently wrong with this approach, it feels somewhat sloppy and time consuming.
A very simple way to combine code, output and plain text, is via RMarkdown. RMarkdown is a file format that allows one to easily combine plain text with embedded R code and the resulting output.
It is written in markdown, an easy-to-write plain text format, but don’t worry. To the extent that we will be using RMarkdown it will feel just like writing normal text.
Creating a RMarkdown file
To create a new RMarkdown in RStudio, we simply click File > New File > RMarkdown or click the icon and then select RMarkdown. This will open the window in Figure Figure 2.
Here, we can name our file, add an Author name, and set the date.
For the mini project, submissions are anonymised, so do not write your name in the author field. The file name should be your candidate number followed by “_ST227_Rproject.Rmd”.
We then select pdf as our output format, which should open a file like this:
To render RMarkdown documents, we need to install some packages. Simply paste
install.packages("rmarkdown")
install.packages("knitr")
install.packages("pandoc")into your Console to do so.
You only need to install these packages once, they need not be included in your markdown document.
Next, before rendering we need to save our document. Then, to render, we simply click on the icon or click File > Knit Document. And this should create a pdf file with some default output as illustrated below:
Now that we know how to render a markdown document let’s have a closer look at the actual RMarkdown file. At the top of the file you will see what is called a YAML header, that looks like this
---
title: "R mini project"
author: "your candidate number"
output: pdf_document
date: "2024-09-24"
---This header contains metadata that describes the document, such as the title, author, output format, and other settings. For our purposes, we simply choose a suitable title, add the candidate number as our name – or delete author altogether – and set output pdf_document.
Immediately below, you will finde the code chunk
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
``` which is an initial setup chunk used to configure global options for the document and ensures that the code is displayed in the output (i.e., printed alongside the output of the code). We leave this unchanged.
Markdown basics
We can now populate our RMarkdown with document with text, such as explanations or interpretations of our code, as well as code chunks. Let’s have a look at these in turn.
Text
We can simply write plain text in our RMarkdown document and it will be rendered on the pdf. For example, the RMarkdown file
Headers
To format our output we might want to add headers to our document. Headers consist of one to six # symbols followed by the header text. For example,
# This is a level one header
## This is a level two header renders:
This is a level one header
This is a level two header
Italic text is achieved by enclosing the text to be emphasised in *, like *this is italic* and bold text is achieved by enclosing text in **, like **this is a bold statement**.
Lists
You can generate numbered and unnumbered lists like
A numbered list:
1. First item
2. Second item
An unnumbered list:
- Some item
- Another item are displayed as:
A numbered list:
- First item
- Second item
An unnumbered list:
- Some item
- Another item
LaTex
Finally, if you are familiar with LaTeX, you can easily include this in your markdown document too. Anything between two $ characters is understood as TeX math and anything between two $$ is understood as display math. For example:
Numbers that are greater than $10'000$ in absolute value scare me. Hence, I avoid looking at the set:
$$S = \left\{x \in \Re: \sqrt{|x|}> 100 \right\}$$
close to bedtime. would render:
Numbers that are greater than \(10'000\) in absolute value scare me. Hence, I avoid looking at the set: \[S = \left\{x \in \Re: \sqrt{|x|}> 100 \right\}\] close to bedtime.
And that is all the markdown syntax that we will need. If you want to know more, have a look at the pandocs markdown documentation or Quarto’s markdown basics.
Code chunks
The whole point of RMarkdown is that we can easily embed code and code output into our document. This is done via so-called code chunks, which look like this:
```{r chunk-label, chunk-options}
#some R code
``` Code chunks have the following components:
- Each chunk of code is enclosed in
```{r} ```which specifies the beginning and end of the code block and the programming language of said code - The
chunk-labelis optional, but helps with error tracking and other useful things that are beyond our requirements - The
chunk-optionsallow us to specify how each code chunk behaves in terms of execution, output and appearance.
The most relevant chunk-options for us are:
eval = TRUE/FALSE: Determines whether the code chunk should be evaluated (run). IfFALSE, the code is shown but not executed.echo = TRUE/FALSE: Controls whether the code itself is displayed in the document. IfFALSE, the code is hidden, but the results are shown.include = TRUE/FALSE: IfFALSE, both the code and its output are hidden, but the code is still run.warning = TRUE/FALSE: IfFALSE, any warnings generated by the chunk are suppressed.message = TRUE/FALSE: IfFALSE, messages produced by the code are not displayed.error = TRUE/FALSE: IfFALSE, errors from the code will be hidden in the output. IfTRUE, errors are shown but do not stop the document from knitting.fig.width / fig.height: Controls the width and height of plots generated by the chunk.
You can check out other chunk options in the knitr documentation.
Exercise: Creating a mock submission
On November 27th, you will be asked to submit a mock R mini project on Moodle. This submission will not be graded; its purpose is solely to help you familiarise yourself with the submission process ahead of your actual mini project.
1. Download Data
Go to the moodle page for ST227 and click on Introduction to R and R Mock Submission. Therein you should find a dataset called ST227 R mock submission data and a pdf file called ST227 R Mock Submission, which contains the same questions as this exercise.
Download the data into your designated ST227 folder.
2. Creating a RMarkdown document
Although you are free to choose whichever way you want to combine code, text and plots, and other outputs in you R mini project submission, we will do it via a RMarkdown document here. This is the most elegant, and in my opinion, simplest way to do so.
In RStudio, we click File > New File > RMarkdown and in the pop-up window of Figure 2, set ST266 R mock submission as title, your candidate number as author, and choose pdf as output format.
Delete everything below the code chunk
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
``` Save the document in your ST227 folder. The file name should be your candidate number followed by “_ST227_R_mock_submission.Rmd”. Make sure your document renders as expected by clicking File > Knit Document.
3. Loading and inspecting data
We want to load the data using the readxl package. For this, we create a new code chunk
code
```{r loading-data}
# load package
library("readxl")
# load data
unknown_data <- read_xlsx(path = "ST227_R_mock_submission_data.xlsx")
# convert to dataframe
unknown_data <- as.data.frame(unknown_data)
# print first five observations
head(unknown_data, n = 5)
```4. Data analysis and plotting
Here, we are dealing with a dataset that has information on the number of science awarded science degrees and the relative search volume for avocado toas per year. You can find more info on this dataset on here if you are interested.
Let us test for correlation between science_degrees and avocado_toast_searches then:
code
```{r testing-correlation}
# Pearson's test for correlation
with(unknown_data,
cor.test(science_degrees, avocado_toast_searches,
method = "pearson")
)
```Let us plot the data. This plot is a bit more involved and solely for your viewing pleasure, you are not expected to produce such plots in ST227.
code
```{r plotting-data, fig.height=6, fig.width=8}
# plotting science degrees and relative avocado toast searches
with(unknown_data, {
par(mar = c(5, 4, 4, 4) + 0.3)
plot(year, science_degrees,
type = "o",
ylab = "Awarded science degrees",
xlab = "Year",
main = "Awarded science degrees and Avocado toast searches",
lwd = 2,
bty = "n",
xaxt = "n")
par(new = TRUE)
plot(year, avocado_toast_searches,
type = "o",
axes = FALSE,
bty = "n",
xlab = "",
ylab = "",
col = "red",
lty = 2,
lwd = 2)
axis(side=4,
at = pretty(range(avocado_toast_searches)),
col.axis = "red")
mtext("Rel. search volume",
side = 4,
line = 3,
col = "red")
axis(1, at = year, labels = year)
mtext("Year", side = 1, line = 3)
}
)
```5. Submission
Feel free to add other analyses or plots to this document. Once you are happy with your submission, render the file. You should now have a pdf file named <your_candidate_number>_ST227_R_mock_submission.pdf. Go to Moodle, navigate to the R mock submission, and submit your file in the designated submission window. Well done! I hope you see the advantages RMarkdown can offer compared to copy pasting outputs into a Word document.