Section 4 Variables and vectors

We can assign a numerical value to what we refer to as a variable, and then use the variable within various R commands. For example

x <- 3

defines a variable called x, which takes the value 3. We won’t see any output when you type this command, but if we type the variable name on its own, R will tell us its value:

x
## [1] 3

We can then use the variable in other commands, e.g.:

2 * x
## [1] 6

4.1 Vectors

We can define a vector variable using the command c(), with a list of the elements in your vector, separated by commas, inside the brackets. For example, to create a vector of the numbers 2, 4, 6, 8, 10, and assign it to a variable y type

y <- c(2, 4, 6, 8, 10)

We can do element-wise operations with two vectors. For example:

z <- c(3, 5, 7, 9, 11)
y + z
## [1]  5  9 13 17 21

4.1.1 Sequences of integers

A convenient way to create a sequence of integers (as a vector) is to use :, for example

3:10
## [1]  3  4  5  6  7  8  9 10

and we can assign the result to a vector variable in the usual way.

x <- 3:10

4.2 Subsetting vectors

Suppose we have first defined a vector x:

x <- c(12, 14, 16, 18, 20)

We use square brackets [] to extract elements of x. For example, to get the third element we do

x[3]
## [1] 16

Exercise 4.1 If x has been defined as

x <- c(12, 14, 16, 18, 20)

predict which elements of x would be returned with the following, then try these commands in R:

x[2:4]
x[c(1, 3, 5)]
x[-4]

We can also replace elements of x, for example

x[2] <- 0
x
## [1] 12  0 16 18 20

4.2.1 Logical subsetting

Given our definition of x, if we first do

x <- c(12, 14, 16, 18, 20)
x < 15
## [1]  TRUE  TRUE FALSE FALSE FALSE

We see that TRUE is returned in position i, if the i-th element of x is less than 15. We can use this to extract the elements of x that satisfy the condition of being less than 15:

x[x < 15]
## [1] 12 14

4.3 Character strings

We can make vectors whose elements are text (known as strings or character strings) rather than numbers.

x <- c("Monday", "Tuesday", "Wednesday")
x
## [1] "Monday"    "Tuesday"   "Wednesday"

The quote marks " " are important here: if, for examle, we tried

y <- Monday

We would get the message Error: object 'Monday' not found: R would attempt to find a variable with the name Monday, rather than assigning the string "Monday" to the variable y.

4.4 Factors

In statistical modelling, we often work with categorical variables, for example, a patient’s symptoms might be recorded as one of “none”, “mild”, “moderate”, or “severe”. In R, we can have factor variables that are similar to strings, but which carry additional information about the possible levels. We create these with the factor() command. For example

x <- factor(c("mild" ,"mild", "none", "severe"))
x
## [1] mild   mild   none   severe
## Levels: mild none severe

Note that when we display our vector of factors x, we do not see quotes, and the levels are are also displayed.

When defining a factor, it may be helpful to specify all the possible levels, even if some levels have not been observed. We specify these in the factor command:

x <- factor(c("mild" ,"mild", "none", "severe"),
            levels = c("none", "mild" ,"moderate", "severe"))
x
## [1] mild   mild   none   severe
## Levels: none mild moderate severe

(Note that the first two lines in the input display are a single command: the line break after the first comma is ignored by R.)

4.5 The Environment window

In RStudio, you can see all the variables defined in your workspace in the Environment window. The Environment window will also list any data sets and functions that you have created; you can click on these for more details.

Exercise 4.2 Suppose we want to create a vector called responses with three elements: yes, no and no.

  1. Create the vector responses as a vector of character strings.
  2. How would you define responses, if you instead wanted it to be a factor, with levels yes, no and undecided?

4.6 Further reading

Strings can be quite difficult to work with. For example, the character strings "Monday", "monday", "Mon" might all be intended to mean the same thing, but R will not treat them as being equal to each other:

"Monday" == "monday"
## [1] FALSE

We will study working with strings in a later section, but see also Chapter 14 of R for Data Science