STAT0023 Week 1

Course overview and R revision

Richard Chandler and Ioanna Manolopoulou

Course Overview: R and SAS

Software used in the course

Course covers two different statistical ‘packages’: R and SAS.

R is a ‘free software environment for statistical computing and

graphics’ (see https://www.r-project.org/).

Modern environment based on the S programming language

Many standard statistical procedures are implemented directly

Programming language provides complete flexibility in defining

new procedures, customising and enhancing graphics etc.

Currently over 15,000 additional specialist packages available

from the Comprehensive R Archive Network (CRAN).

Increasingly widely used in many sectors, both in research and

industry.

SAS (‘Statistical Analysis System’) is an older, commercial

product dating back to the 1970s:

For many years the industry standard in sectors including

pharmaceuticals and insurance, currently the dominant

commercial software worldwide in “advanced analytics”.

Has both command language and ’point and click’ interface.

R and SAS in context

Many other statistical / analytical packages exist: Matlab,

Minitab, SPSS, Statistica, Systat, . . .

But once you’ve learned R and SAS, many of these others

should feel familiar to you

As well as experience with statistical software packages, this

course provides you with generic programming skills that are

valued by many employers

Programming concepts more readily illustrated using R, hence

60% of the course uses R and the remaining 40% uses SAS.

Commands versus ‘point and click’

We will use both R and SAS by typing commands / programs,

rather than using a graphical interface (‘point and click’).

Some advantages of this are:

The commands to perform a particular analysis can be saved in

a file (known as a script), which can be edited later and / or

easily re-run.

You can include documentation and comments within a script.

With a script you can see whether a mistake has been made,

and where.

‘Point and Click’ rapidly becomes tedious (and prone to error)

when repeating similar tasks.

Note: we cannot teach you every single command that you

will need: the course aims to give you the confidence to find

appropriate new commands for yourself.

R revision

Using R

R already introduced in STAT0004 and STAT0006

Used in conjunction with RStudio for easy organisation of files

etc.

The ’Useful information and resources’ section of the

STAT0023 Moodle page contains links to:

Summaries of STAT0004 material for students wishing to refresh

their memories (see “Useful books and online resources” link)

R and RStudio home pages, for students wishing to install the

software on their own computers (see “Obtaining R and SAS for

home use”).

The ’Preparation for the course’ section of the STAT0023

Moodle page contains a “quick-start” Introduction to R,

summarising what you’re expected to know at the start of this

course.

R revision: an example script

See analysis of Galapagos island biodiversity data on

STAT0023 Moodle page (script Workshop1_Galapagos.r).

Script illustrates:

Use of comments, to ensure code is clear and readable

Use of <- to assign result of an operation to an object

Reading data from file using read.table()

Use of a data frame (species.data) to store collection of

variables (all numeric or integer here but could also include

character, logical or factor variables)

Some ways to find information about an R object e.g. using

names(), str(), summary(), class()

Use of [] to extract parts of an object according to a logical

condition (big.island)

Use of $ to work with named components of an object.

Plotting different types of R object, with control over labels and

formatting.

Saving graphics to files in different formats (PDF, JPEG, PNG,

. . .)

Etc. (more in the Week 1 self-study materials)

R revision: other things you should know about

Vectors, matrices and arrays: including different types (numeric,

character, logical etc.) and extracting subsets using [] — either

with a logical condition (e.g. species.data[big.island,]) or

with numeric expressions (e.g. species.data[16,]).

Simple statistical procedures: summary statistics (mean(),

var(), sd(), median(), min(), max(), range(), quantile(),

table() etc.), simple test procedures and confidence interval

calculations (t.test() for means in one or two groups,

var.test() for F-test for variances in two groups, chisq.test()

and fisher.test() for testing association in contingency tables)

Simple graphics: scatterplots (plot() and pairs()), boxplots

(boxplot()), histograms (hist()), bar charts (barplot()) and

density plots (density()).

Using the help system: ? for help about a specific command, ??

to search the help system.

R revision: exploratory analysis and graphics

Aims of an exploratory analysis

1 To gain a preliminary understanding of structure in a dataset

2 To look for possible outliers or data quality problems

3 To suggest some initial assumptions (e.g. normality of residuals,

constant variance) that may be reasonable as a starting point

in subsequent modelling and analysis

R revision: exploratory analysis and graphics

Achieving the aims Summary statistics, tables etc. can be helpful but well-designed |

AssignmentTutorOnline

graphics can often save a lot of work later on

Example: look at the Fisher-Anderson iris data (script

Workshop1_Iris.r on the STAT0023 Moodle page)

Using graphics effectively

Key principle: message should be clear “at a glance”

Some guidelines:

Labelling: title, axis labels (including units of measurement),

legend where necessary — and choose text size that ensures labels

are legible.

Scaling: subject to other constraints, choose scales so that data fill

up as much of plotting region as possible

‘Other constraints’ might include use of common scales to aid

comparison of two sets of data

Colours / symbols / line types: consider possible loss of quality

when photocopying / transporting to Powerpoint / etc. and never

rely exclusively on colour (NB some people are colour-blind:

red-green is particularly problematic)

This week’s live workshop

Aim to (re)familiarise yourself with the basics of R

Work through the Galapagos and Iris example scripts

Ensure that you understand everything — there will be helpers

in the sessions, use them if necessary!

Also get used to using the R help system

Use Moodle quizzes to check that you have understood the

material and get ahead with the assessments

- Assignment status: Already Solved By Our Experts
*(USA, AUS, UK & CA PhD. Writers)***CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS**

**NO PLAGIARISM**– CUSTOM PAPER