Section 3 Variables and vectors

We can assign a numerical value to what we refer to as a variable, and then use the variable within various R commands. For example

x <- 3

defines a variable called x, which takes the value 3. You won’t see any output when you type this command, but if you type the variable name on its own, R will tell you its value:

x
## [1] 3

We can then use the variable in other commands, e.g.:

2 * x
## [1] 6

Everything in R is case sensitive: x is not the same as X.

3.1 Vectors

We can define a vector variable using the command c(), with a list of the elements in your vector, separated by commas, inside the brackets. For example, to create a vector of the numbers 2, 4, 6, 8, 10, and assign it to a variable y type

y <- c(2, 4, 6, 8, 10)

We can do element-wise operations with two vectors. For example:

z <- c(3, 5, 7, 9, 11)
y + z
## [1]  5  9 13 17 21

3.1.1 Sequences of integers

A convenient way to create a sequence of integers (as a vector) is to use :, for example

3:10
## [1]  3  4  5  6  7  8  9 10

and we can assign the result to a vector variable in the usual way.

x <- 3:10

3.2 Testing for equality and inequalties

Given a vector such as

x
## [1]  3  4  5  6  7  8  9 10

we can test to see if elements of this vector equal a particular value, e.g.

x == 4
## [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

which produces another vector, where the i-th element is TRUE if the i-th element of x is equal to 4, and FALSE otherwise. Similarly, we can test for an inequality, for example

x < 5
## [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

If we sum the result, each TRUE is counted as a 1, and each FALSE is counted as a 0, so we can find out how many elements of x satisfy the inequality (or equality):

sum(x < 5)
## [1] 2

3.3 Subsetting vectors

Suppose we have first defined a vector x:

x <- c(12, 14, 16, 18, 20)

We use square brackets [] to extract elements of x. For example, to get the third element we do

x[3]
## [1] 16

Exercise 3.1 If x has been defined as

x <- c(12, 14, 16, 18, 20)

predict which elements of x would be returned with the following, then try these commands in R:

x[2:4]
x[c(1, 3, 5)]
x[-4]

We can also replace elements of x, for example

x[2] <- 0
x
## [1] 12  0 16 18 20

3.3.1 Logical subsetting

Given our definition of x, if we first do

x <- c(12, 14, 16, 18, 20)
x < 15
## [1]  TRUE  TRUE FALSE FALSE FALSE

We see that TRUE is returned in position i, if the i-th element of x is less than 15. We can use this to extract the elements of x that satisfy the condition of being less than 15:

x[x < 15]
## [1] 12 14

3.4 Character strings

We can make vectors whose elements are text (known as strings or character strings) rather than numbers.

x <- c("Monday", "Tuesday", "Wednesday")
x
## [1] "Monday"    "Tuesday"   "Wednesday"

The quote marks " " are important here: if, for examle, we tried

y <- Monday

We would get the message Error: object 'Monday' not found: R would attempt to find a variable with the name Monday, rather than assigning the string "Monday" to the variable y.

3.5 Factors

In statistical modelling, we often work with categorical variables, for example, a patient’s symptoms might be recorded as one of “none”, “mild”, “moderate”, or “severe”. In R, we can have factor variables that are similar to strings, but which carry additional information about the possible levels. We create these with the factor() command. For example

x <- factor(c("mild" ,"mild", "none", "severe"))
x
## [1] mild   mild   none   severe
## Levels: mild none severe

Note that when we display our vector of factors x, we do not see quotes, and the levels are also displayed.

When defining a factor, it may be helpful to specify all the possible levels, even if some levels have not been observed. We specify these in the factor command:

x <- factor(c("mild" ,"mild", "none", "severe"),
            levels = c("none", "mild" ,"moderate", "severe"))
x
## [1] mild   mild   none   severe
## Levels: none mild moderate severe

(Note that the first two lines in the input display are a single command: the line break after the first comma is ignored by R.)

3.6 The Environment window

In RStudio, you can see all the variables defined in your workspace in the Environment window. The Environment window will also list any data sets and functions that you have created; you can click on these for more details.

Exercise 3.2 Suppose we want to create a vector called responses with three elements: yes, no and no.

  1. Create the vector responses as a vector of character strings.
  2. How would you define responses, if you instead wanted it to be a factor, with levels yes, no and undecided?