STAT0023 Week 1
Course overview and R revision
Richard Chandler and Ioanna Manolopoulou
Course Overview: R and SAS
Software used in the course
Course covers two different statistical ‘packages’: R and SAS.
R is a ‘free software environment for statistical computing and
graphics’ (see https://www.r-project.org/).
Modern environment based on the S programming language
Many standard statistical procedures are implemented directly
Programming language provides complete flexibility in defining
new procedures, customising and enhancing graphics etc.
Currently over 15,000 additional specialist packages available
from the Comprehensive R Archive Network (CRAN).
Increasingly widely used in many sectors, both in research and
SAS (‘Statistical Analysis System’) is an older, commercial
product dating back to the 1970s:
For many years the industry standard in sectors including
pharmaceuticals and insurance, currently the dominant
commercial software worldwide in “advanced analytics”.
Has both command language and ’point and click’ interface.
R and SAS in context
Many other statistical / analytical packages exist: Matlab,
Minitab, SPSS, Statistica, Systat, . . .
But once you’ve learned R and SAS, many of these others
should feel familiar to you
As well as experience with statistical software packages, this
course provides you with generic programming skills that are
valued by many employers
Programming concepts more readily illustrated using R, hence
60% of the course uses R and the remaining 40% uses SAS.
Commands versus ‘point and click’
We will use both R and SAS by typing commands / programs,
rather than using a graphical interface (‘point and click’).
Some advantages of this are:
The commands to perform a particular analysis can be saved in
a file (known as a script), which can be edited later and / or
You can include documentation and comments within a script.
With a script you can see whether a mistake has been made,
‘Point and Click’ rapidly becomes tedious (and prone to error)
when repeating similar tasks.
Note: we cannot teach you every single command that you
will need: the course aims to give you the confidence to find
appropriate new commands for yourself.
R already introduced in STAT0004 and STAT0006
Used in conjunction with RStudio for easy organisation of files
The ’Useful information and resources’ section of the
STAT0023 Moodle page contains links to:
Summaries of STAT0004 material for students wishing to refresh
their memories (see “Useful books and online resources” link)
R and RStudio home pages, for students wishing to install the
software on their own computers (see “Obtaining R and SAS for
The ’Preparation for the course’ section of the STAT0023
Moodle page contains a “quick-start” Introduction to R,
summarising what you’re expected to know at the start of this
R revision: an example script
See analysis of Galapagos island biodiversity data on
STAT0023 Moodle page (script Workshop1_Galapagos.r).
Use of comments, to ensure code is clear and readable
Use of <- to assign result of an operation to an object
Reading data from file using read.table()
Use of a data frame (species.data) to store collection of
variables (all numeric or integer here but could also include
character, logical or factor variables)
Some ways to find information about an R object e.g. using
names(), str(), summary(), class()
Use of  to extract parts of an object according to a logical
Use of $ to work with named components of an object.
Plotting different types of R object, with control over labels and
Saving graphics to files in different formats (PDF, JPEG, PNG,
. . .)
Etc. (more in the Week 1 self-study materials)
R revision: other things you should know about
Vectors, matrices and arrays: including different types (numeric,
character, logical etc.) and extracting subsets using  — either
with a logical condition (e.g. species.data[big.island,]) or
with numeric expressions (e.g. species.data[16,]).
Simple statistical procedures: summary statistics (mean(),
var(), sd(), median(), min(), max(), range(), quantile(),
table() etc.), simple test procedures and confidence interval
calculations (t.test() for means in one or two groups,
var.test() for F-test for variances in two groups, chisq.test()
and fisher.test() for testing association in contingency tables)
Simple graphics: scatterplots (plot() and pairs()), boxplots
(boxplot()), histograms (hist()), bar charts (barplot()) and
density plots (density()).
Using the help system: ? for help about a specific command, ??
to search the help system.
R revision: exploratory analysis and graphics
Aims of an exploratory analysis
1 To gain a preliminary understanding of structure in a dataset
2 To look for possible outliers or data quality problems
3 To suggest some initial assumptions (e.g. normality of residuals,
constant variance) that may be reasonable as a starting point
in subsequent modelling and analysis
R revision: exploratory analysis and graphics
|Achieving the aims
Summary statistics, tables etc. can be helpful but well-designed
graphics can often save a lot of work later on
Example: look at the Fisher-Anderson iris data (script
Workshop1_Iris.r on the STAT0023 Moodle page)
Using graphics effectively
Key principle: message should be clear “at a glance”
Labelling: title, axis labels (including units of measurement),
legend where necessary — and choose text size that ensures labels
Scaling: subject to other constraints, choose scales so that data fill
up as much of plotting region as possible
‘Other constraints’ might include use of common scales to aid
comparison of two sets of data
Colours / symbols / line types: consider possible loss of quality
when photocopying / transporting to Powerpoint / etc. and never
rely exclusively on colour (NB some people are colour-blind:
red-green is particularly problematic)
This week’s live workshop
Aim to (re)familiarise yourself with the basics of R
Work through the Galapagos and Iris example scripts
Ensure that you understand everything — there will be helpers
in the sessions, use them if necessary!
Also get used to using the R help system
Use Moodle quizzes to check that you have understood the
material and get ahead with the assessments
- Assignment status: Already Solved By Our Experts
- (USA, AUS, UK & CA PhD. Writers)
- CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS