1 R Practical 1: Brexit

1.1 Introduction

The spreadsheet Brexit.csv, available on Blackboard, contains data from the 2016 referendum on the UK’s membership of the European Union.

Each row represents one local authority district in the UK. The columns are as follows.

  • Region: a geographical region of the UK in which the district is located;
  • Area: the name of the local authority district;
  • remain: the percentage of votes in the district to remain in the European Union;
  • unemployed: the percentage of unemployed residents in the district, aged 16 to 74, in 2011.
  • level4: the percentage of residents in the district, aged 16 to 74, with qualifications at level 4 and above in 2011. (An A-level is a level 3 qualification).
  • medianage: the median age of residents in the district, in 2011.

(To simplify this practical, we have omitted the data for Northern Ireland and Gibraltar. Their remain vote percentages were 55.78% and 95.51% respectively.)

Your task is to investigate whether there is any relationship between the remain vote percentage in each district and the other variables.

1.2 Tasks

  1. If you haven’t already done so, create a folder for this module on your computer (or U: drive if on campus). Inside that folder, create another folder: Practical 1.

  1. Download the files Brexit.csv and Practical1.Rmd from Blackboard, and put them into your Practical 1 folder. Open the file Brexit.csv to inspect it.

  1. Open the R Markdown document Practical1.Rmd in RStudio.
    • Change the author to your name
    • Run the first code chunk to load the tidyverse package (click on the green arrow).

Your solutions to the remaining tasks should all go in this R Markdown document, with one code chunk per task.


  1. Import the data Brexit.csv into R, storing it as a data frame called brexit. Inspect the first ten rows. Check this against the data in Excel.

  1. What were the lower quartile, median and upper quartile of percentage of remain voters in the 380 districts?

You can get the median and quartiles using the summary() command. Here is an example.


  1. Find the percentage of the remain voters in Sheffield. (You will need to select the row from the Brexit data frame in which the Area column takes the value Sheffield).

You could just search through the spreadsheet, but you should practise using the filter() command to select rows from a data frame. Here is an example.


  1. Find the districts with the highest 10 percentages of remain voters and lowest 10 percentages of remain votes. What do you notice about the regions?

You can use the arrange() command to arrange the rows of the data frame in order of the remain variable. Here is an example.


  1. Produce three scatter plots, with the remain vote percentage on the \(y\)-axis in each plot, and each of the unemployed, level4, medianage variables on the \(x\)-axis. For each plot

    • change the axes labels to make them more informative;
    • display a linear trend on each scatterplot;
    • use different colours of points to represent different regions;
    • add a caption to your plot, which includes the value of Pearson’s correlation coefficient between the variables, and states what conclusion you would draw from the plot.

  1. Obtain the mean percentage of the remain voters in each Region, and arrange in order of the mean remain vote.

  1. Produce a web page that presents your solutions.
  • Click on the Knit arrow, and choose the output type.

  1. You have an install.packages() command somewhere in your R Markdown document: delete it. It’s best to run install.packages() commands in the console, as they only need to be run once.
  2. There is no command within your R Markdown document that loads the data (you imported the data some other way).