Section 4 Variables and vectors
We can assign a numerical value to what we refer to as a variable, and then use the variable within various R commands. For example
defines a variable called x
, which takes the value 3. We won’t see any output when you type this command, but if we type the variable name on its own, R will tell us its value:
## [1] 3
We can then use the variable in other commands, e.g.:
## [1] 6
4.1 Vectors
We can define a vector variable using the command c()
, with a list of the elements in your vector, separated by commas, inside the brackets. For example, to create a vector of the numbers 2, 4, 6, 8, 10, and assign it to a variable y
type
We can do element-wise operations with two vectors. For example:
## [1] 5 9 13 17 21
4.2 Subsetting vectors
Suppose we have first defined a vector x
:
We use square brackets []
to extract elements of x
. For example, to get the third element we do
## [1] 16
We can also replace elements of x
, for example
## [1] 12 0 16 18 20
4.3 Character strings
We can make vectors whose elements are text (known as strings or character strings) rather than numbers.
## [1] "Monday" "Tuesday" "Wednesday"
The quote marks " "
are important here: if, for examle, we tried
We would get the message Error: object 'Monday' not found
: R would attempt to find a variable with the name Monday
, rather than assigning the string "Monday"
to the variable y
.
4.4 Factors
In statistical modelling, we often work with categorical variables, for example, a patient’s symptoms might be recorded as one of “none”, “mild”, “moderate”, or “severe”. In R, we can have factor variables that are similar to strings, but which carry additional information about the possible levels. We create these with the factor()
command. For example
## [1] mild mild none severe
## Levels: mild none severe
Note that when we display our vector of factors x
, we do not see quotes, and the levels are are also displayed.
When defining a factor, it may be helpful to specify all the possible levels, even if some levels have not been observed. We specify these in the factor command:
x <- factor(c("mild" ,"mild", "none", "severe"),
levels = c("none", "mild" ,"moderate", "severe"))
x
## [1] mild mild none severe
## Levels: none mild moderate severe
(Note that the first two lines in the input display are a single command: the line break after the first comma is ignored by R.)
4.5 The Environment window
In RStudio, you can see all the variables defined in your workspace in the Environment window. The Environment window will also list any data sets and functions that you have created; you can click on these for more details.
Exercise 4.2 Suppose we want to create a vector called responses
with three elements: yes
, no
and no
.
- Create the vector
responses
as a vector of character strings. - How would you define
responses
, if you instead wanted it to be a factor, with levelsyes
,no
andundecided
?
4.6 Further reading
Strings can be quite difficult to work with. For example, the character strings "Monday"
, "monday"
, "Mon"
might all be intended to mean the same thing, but R will not treat them as being equal to each other:
## [1] FALSE
We will study working with strings in a later section, but see also Chapter 14 of R for Data Science