1

1

STAM4000

Quantitative Methods

Week 9

Correlation and simple linear

regression

https://www.google.com/search?q=causation+versus+correlation+comic+cartoon+funny&tbm=isch&ved=2ahUKEwjA1YH7vsfuAhXBn0sFHWGhDYwQ2-

cCegQIABAA&oq=causation+versus+correlation+comic+cartoon+funny&gs_lcp=CgNpbWcQAzoHCCMQ6gIQJzoECCMQJzoECAAQQzoICAAQsQMQgwE6BQgAELEDOgcIABCxAxBDOgoIABCxAxCDARBDOgIIADoECAAQGDoGCAAQChAYOgQIABAeUO2AAljA3AJg094C

aAJwAHgAgAGrAogB9kGSAQYwLjQ3LjKYAQCgAQGqAQtnd3Mtd2l6LWltZ7ABCsABAQ&sclient=img&ei=mVMXYIClHMG_rtoP4cK24Ag&bih=470&biw=1013&rlz=1C1CHBF_enAU841AU846&hl=en#imgrc=I3tF8lFRL1EZiM

STAM4000 students are expected to know how to INTERPRET EXCEL output;

But NOT expected to know how to create EXCEL output.

In this class, we will learn about the following:

• Correlation measures the strength and direction of a linear association between

two quantitative variables.

• Simple linear regression is used to model the relationship between two or more

variables and the model may be used to predict future values.

Kaplan Business School (KBS), Australia 1

2

COMMONWEALTH OF AUSTRALIA

Copyright Regulations 1969

WARNING

This material has been reproduced and communicated to you by or on behalf of Kaplan

Business School pursuant to Part VB of the Copyright Act 1968 (the Act).

The material in this communication may be subject to copyright under the Act. Any further

reproduction or communication of this material by you may be the subject of copyright

protection under the Act.

Do not remove this notice.

2

Kaplan Business School (KBS), Australia 2

3 d #1 #2 #3 Examine the relationship between two quantitative variables Differentiate between correlation and causation Model with simple linear regression #4 Create and assess reliability of forecasts |
Week 9 Correlation an simple linear regression Learning Outcomes |

AssignmentTutorOnline

By the end of this class, students will be able to:

• Create and describe a scatterplot between two quantitative variables.

• Understand the difference between correlation and causation.

• Create and interpret sections of simple linear regression Excel output.

• Use a regression equation to predict values and assess the reliability of those predictions.

Kaplan Business School (KBS), Australia 3

4

Why does this matter?

If there is an

association

between two

quantitative

variables, we can

model the

relationship to

predict future

values.

https://www.google.com/search?q=regression+cartoon&rlz=1C1CHBF_enAU841AU846&sxsrf=ALeKk02pME45VK0MPH4OfnjIO_5qRS3Jag:1611892844986&source=lnms&tbm=isch&sa=X&ved=2ahUKEwibuMPjoMDuAhV9zTgGHRK

EAxwQ_AUoAXoECBIQAw&biw=1024&bih=444#imgrc=kC7IUqSGWMulMM

Our problem objective is to analyse the relationship between numerical variables; regression analysis is the tool we

will study in this class.

Regression analysis is used to predict the value of one variable (the dependent variable) on the basis of another

variable(s) (the independent variable(s).

Dependent variable: denoted Y

Independent variables: denoted X1, X2, …, Xk

Kaplan Business School (KBS), Australia 4

5

#1 Examine the relationship

between two quantitative variables

https://www.google.com/search?q=regression+cartoon&rlz=1C1CHBF_enAU841AU846&sxsrf=ALeKk02pME45VK0MPH4OfnjIO_5qRS3Jag:1611892844986&source=lnms&tbm=isch&sa=X&ved=2ahUKEwibuMPjoMDuAhV9zTgGHRKEAxwQ_AUoAXoECBIQAw&b

iw=1024&bih=444#imgrc=HDY2nP8j3G4I5M

When we first examine the relationship between two quantitative or numerical variables (numbers with

units), we should create the visualisation of a scatterplot.

The scatterplot will show us how the variables relate to each other:

• direction

• shape or form

• strength

• unusual features

Kaplan Business School (KBS), Australia 5

Describe the relationship between two quantitative variables

We have two quantitative variables and we want to understand the relationship, so we draw a picture.

We use cartesian axes, also called X, Y axes, to create a scatterplot of two quantitative variables: the Y variable against, or

versus, the X variable.

Scatterplots are a tool for representing the relationship between two variables. They are useful when thinking about

constructing a mathematical model of a data set, since they provide an insight to the type of model we may need.

X is the “independent” or “explanatory” variable or “predictor” variable.

Y is the “dependent” or “response” variable.

Before creating a scatterplot, it is best to decide which variable is responding to the other.

The cartesian axes have four quadrants:

• The X and Y axis intersect at the “0”, called the origin.

• The X axis is the horizontal axis, negative x values to the left of origin and positive x values to the right of the origin.

• The Y axis is the vertical axis; negative y values are beneath the origin and positive y values are above the origin.

• Each point on the scatterplot is a coordinate (x, y).

Note:

• The scatterplot should be titled using the names of the variables.

• A scatterplot title is usually of the form: “Scatterplot of the named Y variable versus, or against ,the named X variable”.

• In business, we mainly have positive numbers and deal with the top right quadrant of the cartesian axes.

• Axes should also be labelled with the name of the variable and the corresponding units of the variable in brackets.

6

#1 Make a picture

Scatterplots are a tool for representing the relationship

between two quantitative variables, (numbers with units).

X is the “independent” or “explanatory” or “predictor” variable

Y is the “dependent” or “response” variable

Before creating a scatterplot, it is best to decide which

variable is responding to the other.

Note: in business, we usually have positive numbers and deal

with the top right quadrant of the X, Y axes.

-10

-5 -5 0 |
5 X |

5 10 |

0

15

-10 10

Scatterplot of Y against X

5 0

10

15

0 2 4 6

Y (units)

X (units)

Scatterplot of of Y against

Y X

We say that the Y variable is RESPONDING to values of the X

variable.

7

#1

Here are two examples of scatterplots representing the

relationship between two quantitative variables.

Linear model Non-linear model

0

50

100

0 5 10

Exam score (%), Y

Cups of coffee before a test, X

Scatterplot of exam score against

cups of coffee before a test

0

10

20

30

0 50 100

Exam score (%), Y

Hours of study, X

Scatterplot of exam score against

hours of study

This Photo by Unknown Author is licensed under CC BY-NC-ND

Illustration

Illustration

The first scatterplot:

X = Hours of study

Y = Exam score (%)

This scatterplot has points that seem to be following a straight line – the general form or shape of the

scatterplot is linear.

The points do not need to be exactly all on the same line – the general shape of a line is enough to

suggest that there is a linear relationship between these two quantitative variables of exam score and

hours of study.

Here, as the hours of study increase, the exam score tends to increase, or follows, an upward

direction, suggesting the exam score responds positively to hours of study.

The second scatterplot:

X = Cups of coffee before a test

Y = Exam score

This scatterplot has points that seem to be following a curve.

Patterns that are not straight lines are called non-linear.

At the lower end of cups of coffee, the exam scores are increasing, and peak at 5 cups of coffee.

However, after 5 cups of coffee, the exam scores start to decrease – suggesting that more than 5 cups

of coffee will have a negative (downward) effect on a test score.

Here, exam scores respond both positively and negatively to the number of cups of coffee.

8

8

1st scatterplot displays

how exam scores, Y,

RESPONDS to hours of

study, X, by the student.

LINEAR MODEL: where

we can “visualise” a

“linear pattern”

between the Y and the X

variable.

Linear model, we have a

CONSTANT “SLOPE” or

CONSTANT “RATE OF

CHANGE”

2nd scatterplot displays how

exam scores, Y, RESPONDS to

number of cups of coffee

before the exam, X.

Curved relationship is also

called a NON-LINEAR

relationship.

NON-LINEAR relationship,

the slope or rate of change

varies. In this chart:

• positive or upward

• zero

• negative or downward

8

9

9

9

How do we create a scatterplot in EXCEL?

For the data used in this example, go to the EXCEL file name STAM4000 Week 10

Excel.xls and the sheet named “Rent”.

Note: If using EXCEL to create the scatterplot, check that the X and Y variables are

displayed as you would like.

When highlighting the columns, it is best to highlight the X variable, first, and the Y

variable, second.

For the first scatterplot:

• Highlight the columns labelled “Distance” and “Rent”.

• Go to the INSERT tab.

• Select the Scatterplot icon.

• Highlight the chart.

• Use the “+” button to add/remove items of the scatterplot.

For the second scatterplot:

• Highlight the columns labelled “Bedrooms” and “Rent”.

• Go to the INSERT tab.

• Select the Scatterplot icon.

• Highlight the chart.

• Use the “+” button to add/remove items of the scatterplot.

10

#1

This Photo by Unknown Author is

licensed under CC BY

Example

A random sample of thirty rental properties was collected and values for the following

variables were recorded: weekly rent, ($/wk), distance from the city centre (km), number of

bedrooms, number of bathrooms and age of the property (year).

EXCEL was used to create the following scatterplots.

How could we describe the relationship between rent and distance?

How could we describe the relationship between weekly rent and number of bedrooms?

2, 1200 |

0

500

1000

1500

0 10 20 30 40

Rent ($/wk), Y

Distance (km), X

Scatterplot of rent, Y against distance,

X

0

500

1000

1500

0 1 2 3 4 5

Rent ($/wk), Y

Bedrooms, X

Scatterplot of rent, Y against

bedrooms, X

11

11

Before creating a scatterplot, it is best to decide which variable is responding to

the other.

If we want to do a scatterplot of rent with distance:

which variable do we believe is RESPONDING to the other?

1st scatterplot here: rent is RESPONDING to distance from the city centre.

11

Describing a scatterplot

We assess the scatterplot, discussing the following:

Direction: what is the general direction of the scatterplot?

• Positive also called upward sloping

• Negative also called downward sloping

Form: what is the general shape of the scatterplot?

• Do the points form a linear shape?

• Do the points form a curved shape?

Strength: how tightly clustered are the points giving that form?

• Strong: the points are tightly clustered to create that form

• Moderate: the points are reasonably clustered to create that form

• Weak: the points are very loosely clustered to create that form

Unusual features: is there anything unusual about the scatterplot?

• Are there any outliers (unusual points)?

• Are there any irregularities with the shape?

12

#1 Example

13

#2 Differentiate between correlation and causation

This Photo by Unknown Author is licensed under CC BY-SA

Correlation ≠

- Assignment status: Already Solved By Our Experts
*(USA, AUS, UK & CA PhD. Writers)***CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS**

**NO PLAGIARISM**– CUSTOM PAPER