Data handling, exploratory analysis, and reporting in R
2024-09-03
Section 1 Introduction
These notes are written for students on MAS61004 The Statistician’s Toolkit. Topics covered include
- working with R and RStudio;
- importing data, and getting data into a suitable format for analyses with R;
- making plots with
ggplot2
; - writing reports with R Markdown;
- making web apps with
shiny
.
We do not cover R programming (e.g. writing your own functions); this is included in MAS61006 Bayesian Statistics and Computational Methods.
1.1 About these notes
These notes will get you started on various topics, but are not intended to cover everything you might need to know. There are lots of excellent free, online resources for learning R, and links to further reading will be given where appropriate. After studying these notes, you should be able to find things out quickly for yourself, if necessary.
You don’t need to know everything straight away! Try to get a basic understanding of how things work, and what sorts of things are possible, and then search/study the details when you start working on a particular project.
1.2 Books
Although it can be easy to search for and find help online, you have to know what you are looking for. I strongly recommend that you browse some of the following books (the graphics/data visualisation books in particular), to get a broader understanding of what you could do with your data.
All the following books can be read online for free. (I have hard copies of most of these, which I prefer to study. I do not recommend buying books just for this module, but if a particular book looks useful to you more widely, I would recommend buying a hard copy.)
A good general reference book for MAS61004 and MAS6024 is
- Wickham, H. and Grolemund, G. (2017), R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.
For R Markdown, I recommend
Xie, Y., Allaire, J. J. and Grolemund G. (2019), R Markdown: The Definitive Guide. Chapman and Hall/CRC.
Xie, Y., Dervieux, C. and Riederer, E. (2020), The R Markdown Cookbook. Chapman and Hall/CRC.
and if you plan to use R Markdown for your dissertation
- Xie, Y. (2017), bookdown: Authoring Books and Technical Documents with R Markdown. Chapman and Hall/CRC.
For graphics/data visualisation, I recommend
- Chang, W. (2018), R Graphics Cookbook (2nd edition). O’Reilly Media, Inc.
- Wilke, C. O. (2019), Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O’Reilly Media, Inc.
- Healy, K. (2019). Data Visualization: A Practical Introduction. Princeton University Press
Healy (2019) is a beautiful looking book (thereby proving his point!) and covers ggplot2
. Wilke (2019) purposefully doesn’t include any code, but the discussion and advice is excellent. (He has made all the R code used to produce the book available here).
1.3 Acknowledgements
The bookdown package by Yihui Xie has been invaluable for producing these notes. The content of this course is dependent on the work of the R Core team, RStudio and numerous package developers, who have all made their work available for free; if I didn’t appreciate and enjoy using all these tools, I would not be teaching this course! I will cite authors as I go along, and a reference list is given at the end. Many thanks also to Allison Horst, for generously sharing her artwork.