STAT6110/STAT3110

Statistical Inference

Topic 1- Probability and random samples

Nan Zou

Topic 1

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 1 / 58

Contact Details

Lecturer: Nan Zou

I Location: Room 706, Level 7, 12 Wally’s Walk

I Email: [email protected]

I Consultation: on Zoom

F Time: Tue & Thu 9:45-10:45

F Zoom Link: https://macquarie.zoom.us/j/4942865292?pwd=

ai81cWQ1dWFQOTgxR1A3eWdHZDUzZz09

F Zoom Meeting Room ID: 494 286 5292

F Password: 621990

Tutor: TBA

I Email: [email protected]

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 2 / 58

Unit Outline

Topic 1: Probability and random samples

Topic 2: Large sample probability concepts

Topic 3: Estimation concepts

Topic 4: Likelihood

Topic 5: Estimation methods

Topic 6: Hypothesis testing concepts

Topic 7: Hypothesis testing methods

Topic 8: Bayesian inference

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 3 / 58

Population and Sample

Population Sample Inferences

based on the

sample

Extrapolate

e.g. all adults in a

population of interest

e.g. 300 adults

chosen at random

e.g. at least one-third

of adults have high

cholesterol

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 4 / 58

Statistical inference

This unit is about the theory behind Statistical Inference

Statistical inference is the science of drawing conclusions on the basis

of numerical information that is subject to randomness

The core principle is that information about a population can be

obtained using a representative” sample from that population

A representative” sample requires that the sample has been taken at

random from the population

To model variability in random samples we use probability models

This means we need probability concepts to study statistical inference

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 5 / 58

Probability and random samples

We usually interpret probability to be the long-run frequency with

which an event occurs in repeated trials

We can then model random variation in our sample using the

probabilistic variation in repeated samples from the population

This leads to the Frequentist approach to statistical inference, which

is the most common approach and will be our main focus in this unit

There is also another approach called Bayesian statistical inference,

which is based on a different interpretation of probability (we will do

one lecture on this later in the unit)

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 6 / 58

Relative frequency

Consider N samples” taken in identical fashion from a population of

interest

Consider an event of interest that could possibly occur in each of

these samples

Let fN be the number of samples where the event occurred

Then fN=N is called the relative frequency with which the event

occurred

The probability of the event to occur is then the limit of this relative

frequency

probability = lim

N!1fNN

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 7 / 58

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 8 / 58

Topic 1 Outline: Probability and random samples

Populations and random samples

Probability and relative frequency

Probability and set theory

Probability axioms

Random variables and probability distributions

Joint probability distributions

Independence

Common probability distributions including the normal distribution

Sampling variation and statistical inference

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 9 / 58

Set theory

A rigorous description of probability theory uses concepts from set

theory

A set is a collection of objects

An element of a set is a member of this collection

If ! is element of a set Ω we write ! 2 Ω

A is a subset of a set Ω, written A ⊂ Ω, if ! 2 A implies ! 2 Ω

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 10 / 58

Example

Suppose our sample consists of two individuals for whom we record

whether or not a particular infection is present or absent

Denote presence or absence of the infection by 1 and 0, respectively

One possible outcome is that both individuals have the infection,

denoted by (1; 1)

The sample space is the set of all possible outcomes, that is, all

possible pairs of infection statuses for the two individuals

Ω = f(0; 0); (0; 1); (1; 0); (1; 1)g

The event there is exactly one infected individual in the sample” is

denoted by the subset of the sample space f(0; 1); (1; 0)g

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 11 / 58

Outcomes, sample spaces and events

The term outcome, e.g., (1; 1) refers to a given realisation of this

sampling process

The set of all possible outcomes is referred to as the sample space;

e.g., Ω = f(0; 0); (0; 1); (1; 0); (1; 1)g

A subset of outcomes in the sample space is called an event; e.g.

f(0; 1); (1; 0)g

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 12 / 58

Set operations

Denote a union as A [ B

which means ! 2 A [ B )

! 2 A or ! 2 B.

Denote an intersection as

A B which means ! 2

A B ) ! 2 A and ! 2 B.

Denote a complement of A

as Ac (or A), so that ! 2 Ac

means that ! 2 Ω but ! 62 A.

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 13 / 58

Example (cont.)

The event either 1 or 2 individuals in the sample are infected”

corresponds to the event union

f(0; 1); (1; 0)g [ f(1; 1)g = f(0; 1); (1; 0); (1; 1)g

The event both 1 and 2 individuals in the sample are infected “

corresponds to the event intersection

f(0; 1); (1; 0)g f(1; 1)g = ;

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 14 / 58

Probability and sets

Since events are defined mathematically as sets, we can use set

operations to construct new events from existing events

The new event E1 [ E2 is interpreted as the event that either E1 or E2

or both occur; e.g. f(0; 1); (1; 0)g [ f(1; 1)g

Consider two events E1 and E2, then the new event E1 E2 is

interpreted as the event that both E1 and E2 occur; e.g.,

f(0; 1); (1; 0)g f(1; 1)g

The empty set ; is interpreted as an impossible event

If E1 E2 = ; then E1 and E2 are called mutually exclusive events

with the interpretation that the two events cannot both occur

The new event Ec

1 is interpreted as the event that E1 does not occur;

e.g. f(0; 0)gc

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 15 / 58

Valid probabilities

Consider an event E that is a subset of the sample space Ω

Probability is a function of events, or a function of subsets of the

sample space

Then Pr(E) denotes the probability that event E will occur

The function Pr” is allowed to be any function of subsets of the

sample space that satisfies certain requirements that make it a valid

probability

Any valid probability must satisfy the following intuitively natural

requirements, called axioms

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 16 / 58

Axioms of probability

1 The probability of any event E is a number between 0 and 1 inclusive.

That is,

0 ≤ Pr(E) ≤ 1

2 The probability of an event with certainty is 1 and the probability of

an impossible event is 0. That is,

Pr(Ω) = 1 and Pr(;) = 0

3 If two events E1 and E2 are mutually exclusive, so they cannot both

occur, the probability that either event occurs is the sum of their

respective probabilities. That is,

if A B = ; then Pr(A [ B) = Pr(A) + Pr(B)

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 17 / 58

Probability properties

Many properties follow from the probability axioms. For example:

1 If A ⊂ B, then Pr(A) ≤ Pr(B)

2 Pr(Ac) = 1 – Pr(A)

3 Pr(A [ B) = Pr(A) + Pr(B) – Pr(A B)

These types of properties can be illustrated using a Venn diagram similar

to those on slide 10 (see also tutorial)

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 18 / 58

Example (cont.) – 3 probability assignments

Event probability 1 probability 2 probability 3

; 0 0 0

f(0; 0)g 0.9025 0.3025 0.3000

f(0; 1)g 0.0475 0.2475 0.3000

f(1; 0)g 0.0475 0.2475 0.3000

f(1; 1)g 0.0025 0.2025 0.3000

f(0; 0); (0; 1)g 0.9500 0.5500 0.6000

f(0; 0); (1; 0)g 0.9500 0.5500 0.6000

f(0; 0); (1; 1)g 0.9050 0.5050 0.6000

f(0; 1); (1; 0)g 0.0950 0.4950 0.6000

f(0; 1); (1; 1)g 0.0500 0.4500 0.6000

f(1; 0); (1; 1)g 0.0500 0.4500 0.6000

f(0; 0); (0; 1); (1; 0)g 0.9975 0.7975 0.9000

f(0; 0); (0; 1); (1; 1)g 0.9525 0.7525 0.9000

f(0; 0); (1; 0); (1; 1)g 0.9525 0.7525 0.9000

f(0; 1); (1; 0); (1; 1)g 0.0975 0.6975 0.9000

Ω 1 1 1

Table: 1

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 19 / 58

Example (cont.)

The probability axioms are only satisfied for probability assignments 1

and 2. Probability assignment 3 is invalid because

Prf(0; 0); (0; 1); (1; 0); (1; 1)g = PrΩ = 1

6= 1:2 = Prf(0; 0)g + Prf(0; 1)g + Prf(1; 0)g + Prf(1; 1)g

Consider event E1 exactly one individual is infected” and event E2

the first individual is infected”

E1 = f(0; 1); (1; 0)g E2 = f(1; 0); (1; 1)g

Notice in case 1,

Pr(E1 [ E2) = Pr(f(0; 1); (1; 0); (1; 1)g) = 0:0975

= 0:0950 + 0:0500 – 0:0475

= Pr(f(0; 1); (1; 0)g) + Pr(f(0; 1); (1; 1)g) – Pr(f(1; 0)g)

= Pr(E1) + Pr(E2) – Pr(E1 E2)

So property 3 in the Probability Properties slide holds in case 1.

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 20 / 58

Random variables

A random variable is a function of outcomes in the sample space

I the number of infected people is a random variable

A random variable that can take on only a discrete set of values then

it is referred to as a discrete random variable

I the number of infected people is discrete

A random variable that can take on a continuum of values is referred

to as a continuous random variable

I the cholesterol level of a randomly sampled individual is continuous

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 21 / 58

Random variables and probabilities

Statements about a random variable taking on a particular value or

having a value in a particular range are events

For a random variable X and a given number x, statements such as

X = x and X ≤ x are events

We can therefore assign probabilities Pr(X = x) and Pr(X ≤ x) to

such events

A general convention is that random variables are denoted by

upper-case letters, while the values that they can take on are

denoted by lower-case letters

This distinction will be important in subsequent lectures

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 22 / 58

Probability distributions

The probability distribution for a random variable is a rule for

assigning a probability to any event stating that the random variable

takes on a specific value or lies in a specific range

There are various ways to specify the probability distribution of a

random variable

We will use 3 functions for specifying the probability distribution of a

random variable

1 Cumulative distribution function (or simply called distribution function)

2 Probability function (only for discrete variables)

3 Probability density function (only for continuous variables)

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 23 / 58

Cumulative distribution function

The cumulative distribution function of a random variable X is a

function FX (x) such that for any value x,

FX (x) = Pr(X ≤ x)

FX (x) specifies the probability that a random variable will fall into

any given range, since

Pr(l < X ≤ u) = Pr(X ≤ u) – Pr(X ≤ l) = FX (u) – FX (l)

Any valid cumulative distribution function must therefore satisfy the

following three properties:

(i) lim

x!1

FX (x) = 1 (ii) lim

x!-1

FX (x) = 0

(iii) FX (x1) ≥ FX (x2); where x1 ≥ x2:

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 24 / 58

Probability function

For a discrete random variable X, the probability function is a

function that gives the probability that the random variable will equal

any specific value

The probability function is

fX

(x) = Pr(X = x)

fX

(x) specifies the probability that a discrete random variable falls

into any given range, for example

Pr(X 2 f1; 2; 3g) = Pr(X = 1) + Pr(X = 2) + Pr(X = 3)

= fX (1) + fX (2) + fX (3)

For any discrete random variable, Px fX (x) = 1, where the

summation is taken over all possible values that X can take on

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 25 / 58

Probability density function

For any continuous random variable X and any x, Pr(X = x) = 0;

hence here Pr(X = x) is not very informative.

For a continuous random variable X, the probability density

function is the derivative of the cumulative distribution function

fX

(x) = d

dx FX (x)

fX

(x) specifies the probability that a continuous random variable will

fall into any given range, since

Pr(l ≤ X ≤ u) = Pr(l < X ≤ u) = FX (u) – FX (l) = Zl u fX (x)dx

fX

(x) must therefore always integrate to 1 over (-1; 1) (why?)

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 26 / 58

Attributes of probability distributions: Expectation

Based on the probability distribution, we can design various attributes

to summarise the way the random variable behaves

The expectation, or mean, of a random variable is the average value

that the random variable takes on

For discrete random variables the expectation is

E(X ) = X

x

xfX (x)

where the summation is over all possible values of X

For continuous random variables the expectation is given by

E(X ) = Z-1 1 xfX (x)dx

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 27 / 58

Attributes of probability distributions: Expectation (cont.)

For discrete random variable X, function g, the expectation of g(X)

has the property

E(g(X)) = X

x

g(x)fX (x)

where the summation is over all possible values of X

For continuous random variable X, function g, the expectation of

g(X) has the property

E(g(X)) = Z-1 1 g(x)fX (x)dx

Since the sum or integral of a linear function yields a linear function

of the sum or integral, expectations possess an important linearity

property, namely, for constants c0 and c1

E(c0 + c1X) = c0 + c1E(X)

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 28 / 58

Attributes of probability distributions: Variance

The variance of a random variable is a measure of the degree of

variation that a random variable exhibits

For both continuous and discrete random variables, variance is defined

as

Var(X) = EX – E(X)2

For both continuous and discrete random variables, variance has the

property

Var(X) = E(X 2) – (E(X))2

Unlike expectations, the linearity property does not hold for variances,

but is replaced by the equally important property

Var(c0 + c1X) = c12Var(X)

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 29 / 58

Attributes of probability distributions: Percentiles

Another important attribute are percentiles

For α 2 (0; 1), the α-percentile of a probability distribution is the

point below which 100α% of the distribution falls

The α-percentile of a probability distribution with cumulative

distribution function FX (x) is the point pα that satisfies

FX (pα) = α

For example, the 0.5 percentile, called the median, is the point below

which half of the probability distribution lies

The 0.25 and 0.75 percentiles, called quartiles, specify the points

below which one-quarter and three-quarters of the distribution lies

Other percentiles of a probability distribution will also be of interest,

particularly when we come to discuss confidence intervals in

subsequent topics.

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 30 / 58

Example (cont.)

Define a random variable T to be the number infected in the sample

of 2 people

T is a discrete random variable since its possible values are 0, 1 and 2

The table gives the value of T for each outcome in the sample space

The table also gives the probability distribution of T under the

probability assignment 1 discussed earlier

t Event T = t fT(t) FT(t)

0 f(0; 0)g 0.9025 0.9025

1 f(0; 1); (1; 0)g 0.0950 0.9975

2 f(1; 1)g 0.0025 1

E(T) = 0 × 0:9025 + 1 × 0:0950 + 2 × 0:0025 = 0:1

Var(T) = (02 × 0:9025 + 12 × 0:0950 + 22 × 0:0025) – (0:12) = 0:095

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 31 / 58

Conditional probability

The probability of an event might change once we know that some

other event has occurred, this means this event depends on the other

event

For two events E1 and E2, the conditional probability that E1 occurs

given that E2 has occurred is denoted Pr(E1jE2) and is defined as

Pr(E1jE2) = Pr(E1 E2)

Pr(E2)

This is defined only for events E2 that are not impossible, so that

Pr(E2) 6= 0 in the denominator

It does not make sense for us to condition on the occurrence of an

impossible event

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 32 / 58

Independence

A property that applies to both events and random variables

Using the definition of conditional probability, two events E1 and E2

are independent events if

PrE1jE2 = PrE1

The occurrence of the event E2 does not affect the probability of

occurrence of the event E1 (and vice versa)

We can re-express this definition by saying that E1 and E2 are

independent events if they satisfy the multiplicative property

PrE1 E2 = PrE1 PrE2

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 33 / 58

Independent random variables

Statistical inference makes more use of the concept of independence

when applied to random variables

Consider two random variables X1 and X2, with cumulative

distribution functions F1(x1) and F2(x2)

X1 and X2 are said to be independent random variables if

Pr(X1 ≤ x1 j X2 ≤ x2) = Pr(X1 ≤ x1) = F1(x1)

Pr(X2 ≤ x2 j X1 ≤ x1) = Pr(X2 ≤ x2) = F2(x2)

where x1 and x2 are in the range of possible values of X1 and X2

Knowing the value of one random variable does not affect the

probability distribution of the other

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 34 / 58

Independent random variables (cont.)

Like independence of events, independence of random variables can

be defined using the multiplicative property

Pr(fX1 ≤ x1gfX2 ≤ x2g) = Pr(X1 ≤ x1) Pr(X2 ≤ x2) = F1(x1)F2(x2)

We can see from this form that independence of random variables is

defined in terms of independence of the two events X1 ≤ x1 and

X2 ≤ x2

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 35 / 58

Joint probability distributions

The above discussion introduces us to the concept of the joint

probability distribution of two random variables

Generalisation of the definition of a probability distribution for a

single random variable to define distribution for two or more random

variables

The joint probability distribution of two random variables is a rule

for assigning probabilities to any event stating that the two random

variables simultaneously take on specific values or lie in specific ranges

Like the probability distribution of a single random variable, the joint

probability distribution can be characterised by various functions

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 36 / 58

Joint cumulative distribution function

The first such function is a generalisation of the cumulative

distribution function

Consider the shorthand notation

Pr(X1 ≤ x1; X2 ≤ x2) ≡ Pr(fX1 ≤ x1g fX2 ≤ x2g)

Then the joint cumulative distribution function of two random

variables X1 and X2 is the function of two variables

FX1;X2(x1; x2) = Pr(X1 ≤ x1; X2 ≤ x2)

So independence of two random variables is equivalent to their joint

cumulative distribution function factoring into the product of their

individual cumulative distribution functions

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 37 / 58

Joint probability function

The joint probability function of two discrete random variables X1

and X2 is the function of two variables

fX1

;X2(x1; x2) = Pr(X1 = x1; X2 = x2)

The multiplicative property for independence of two discrete random

variables can equivalently be expressed in terms of their joint

probability function

That is, two discrete random variables X1 and X2 are independent if

fX1

;X2(x1; x2) = f1(x1)f2(x2)

where f1(x1) and f2(x2) are the probability functions of X1 and X2

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 38 / 58

Joint probability density function

The joint probability density function of two continuous random

variables X1 and X2 is the function of two variables

fX1

;X2(x1; x2) = @

@x1 @@x2 FX1;X2(x1; x2)

where the symbol @ means partial differentiation of a multivariable

function, rather than the symbol d used in univariable differentiation

The joint probability density function specifies the probability that the

two continuous random variables will simultaneously fall into any two

given ranges through the relationship

Pr(l1 ≤ X1 ≤ u1; l2 ≤ X2 ≤ u2) = Zl1u1 Zl2u2 fX1;X2(x1; x2)dx2dx1

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 39 / 58

Correlation and covariance

The covariance of X and Y is defined as

Cov(X; Y ) = EX – E(X)Y – E(Y ) = E(XY ) – E(X)E(Y )

We say that X and Y are uncorrelated when Cov(X; Y ) = 0, i.e.

when

E(XY ) = E(X)E(Y )

Being uncorrelated random variables is a weaker property than being

independent random variables

Independent implies uncorrelated but not vice versa

Covariance is a generalisation of variance

Cov(X; X) = Var(X)

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 40 / 58

Correlation and covariance (cont.)

A measure of the extent to which two random variables depart from

being uncorrelated is the correlation

Corr(X; Y ) = pVar( Cov(X,Y) X )Var(Y )

Correlation is scaled such that it always lies between -1 and 1, with 0

corresponding to being uncorrelated

It is important in studying the linear relationship between two

variables, with the extremes of -1 and 1 corresponding to a perfect

negative and positive linear relationship, respectively

Although being uncorrelated implies that there is no linear

relationship between two variables, it does not preclude that some

other relationship exists. This is another reason why independence is

a stronger property than being uncorrelated

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 41 / 58

Correlation example

Suppose (X; Y ) can be either

(2; 2) (-1; 1) (1; -1) |
with 10% probability, with 40% probability, with 40% probability, |

AssignmentTutorOnline

(-2; -2) with 10% probability.

The random variables X and Y

are certainly dependent, since if

we know what one of them is,

we can figure out what the

other one is too.

6 – r |
2 1 u |

-2 -1 -2 -1 r |
1 2 u |

X

Y

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 42 / 58

Correlation example (cont.)

On the other hand, E[XY ]; E[X] and E[Y ] are all zero; for instance,

E[XY ] = 10% × 2 × 2 + 40% × (-1) × 1

+ 40% × 1 × (-1) + 10% × (-2) × (-2)

= 0:4 – 0:4 – 0:4 + 0:4

= 0;

so the correlation between X and Y is zero

X and Y are uncorrelated but not independent

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 43 / 58

Independent random samples

The main use of the concept of independence in this unit is for

modelling a random sample from a population

We will often use a collection of n random variables to represent n

observations in a random sample and assume that these observations

are independent

For a random sample, independence means that one observation does

not affect the probability distribution of another observation

n random variables X = (X1; : : : ; Xn) are (mutually) independent if

their joint cumulative distribution function factors into the product of

their n individual cumulative distribution functions or likewise for the

joint density or probability functions

FX (x) = Pr(X1 ≤ x1; : : : ; Xn ≤ xn) =

n

Y i =1 |
nY i =1 |

Fi(xi) fX (x) =

fi(xi)

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 44 / 58

Independence example

Random variable T0 is 1 if only one individual is infected and 0

otherwise

Random variable T1 is 1 if the first individual is infected and 0

otherwise

Random variable T2 is 1 if the second individual is infected and 0

otherwise

Consider events T0 = 1, T1 = 1, T2 = 1,denoted as E0, E1, E2

E0 = f(0; 1); (1; 0)g and Pr(E0) = 0:095 based on Table 1

Likewise we have E1 = f(1; 0); (1; 1)g and Pr(E1) = 0:05, as well as

E2 = f(0; 1); (1; 1)g and Pr(E2) = 0:05

Conditional probability

Pr(E1jE0) = Pr( Pr( E1EE 0)0) = Pr(0f:(1 095 ;0)g) = 00::0475 095 = 0:5

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 45 / 58

Independence example (cont.)

Thus, given we know exactly one person is infected, it is equally likely

to be individual 1 or 2

E0 and E1 are not independent events since Pr(E1jE0) 6= Pr(E1)

Knowledge that there is one infected individual provides information

about whether individual 1 is infected

On the other hand, T1 = 1 and T2 = 1 are independent events

Pr(E1 E2) = Pr(f(1; 1)g) = 0:0025 = 0:05 × 0:05 = Pr(E1) Pr(E2)

Same process can be followed for any other value of the random

variables T1 and T2 to show that

Pr(T1 = t1; T2 = t2) = Pr(T1 = t1) Pr(T2 = t2) t1 = 0; 1 t2 = 0; 1

That is, the random variables T1 and T2 are independent random

variables

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 46 / 58

Common probability distributions

Probability distributions commonly used in statistical inference are

based on a simple and flexible function for fX(x) or FX(x)

In subsequent lectures we will use many common probability

distributions

All of these are summarised in the accompanying document Common

Probability Distributions” (which will be reviewed in the lecture)

Common discrete distributions include: binomial, Poisson, geometric,

negative binomial and hypergeometric distributions

Common continuous distributions include: normal, exponential,

gamma, uniform, beta, t, χ2 and F distributions

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 47 / 58

Normal distribution

The most important distribution for statistical inference

In large samples it unifies many statistical inference tools

The large sample concepts will be considered in Topics 2 and 3

For now we will simply review some of the key features

Consider a continuous random variable X with

µ = E(X) and σ2 = Var(X)

X has a normal distribution, written

X ∼ N(µ; σ2);

if the probability density function of X has the form

fX

(x) = 1

σp2π exp-(x 2-σ2µ)2 x 2 (-1; 1)

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 48 / 58

Standard normal distribution

Cumulative distribution function FX(x) is not convenient and needs

to be calculated numerically

This is done using a special case, called the standard normal

distribution, which is the N(0; 1) distribution

Let the standard normal cumulative distribution distribution be

Φ(x) = p12π Z-1 x exp-u22 du

Then the cumulative distribution function associated with any other

normal distribution is

FX(x) = Φx -σ µ

α-percentile of the standard normal distribution is zα

Φ(zα) = α or zα = Φ-1(α)

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 49 / 58

Standard normal distribution – percentiles

-4 -2 0 2 4

0.0 0.1 0.2 0.3 0.4

x

probability density

x

probability density

0 1 σ 2π

µ – 2σ µ µ + 2σ

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 50 / 58

Bivariate normal distribution

The bivariate normal distribution is a joint probability distribution

Consider two normally distributed random variables X and Y with

Corr(X; Y ) = ρ

We call µ the mean vector and Σ the variance-covariance matrix

µ = µµYX and Σ = ρσσXX2σY ρσσXY2σY

Then X and Y have a bivariate normal distribution, written

X ∼ N2(µ; Σ) | where X = YX | |

if their joint probability density function is of the form | ||

fX ;Y (x; y) = fX(x) = |
1 | TΣ-1(x-µ) |

2πσX σY p1 – ρ2 exp-1 2(x-µ)Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 51 / 58

Multivariate normal distribution

Generalisation of the normal distribution, giving the joint distribution

of a k × 1 vector of random variables X = (X1; : : : ; Xk)T

The joint probability density function is

fX

(x) = (2π)k det(Σ)- 12 exp-1 2(x – µ)TΣ-1(x – µ) x 2

µ = (µ1; : : : ; µk)T is called the mean vector

The k × k matrix Σ is called the variance-covariance matrix and must

be a non-negative definite matrix

Its main use in this unit is as the distribution of estimators in large

samples { more on this in later topics

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 52 / 58

Inference example

We will now consider how to use a probability model for the sampling

variation in a simple introductory example

Example: Assessment of disease prevalence in a population

I We are interested in the proportion of a population that has a

particular disease, called θ

I We sample n individuals at random from the population

I We observe the number of individuals who have the disease

I We assume our sample is truly random and not biased i.e. assume we

have not systematically over- or under-sampled diseased individuals

I How would we use the sample to make inferences about θ?

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 53 / 58

Inference about the population

The population prevalence θ is considered to be a fixed constant

Our goal is to use the sample to estimate this unknown constant and

also to place some appropriate uncertainty limits around our estimate

The starting point is the natural estimate of the unknown population

prevalence, that is, by the observed proportion in our sample

By using the observed sample prevalence to make inferences about

the disease prevalence in the population, we are extrapolating from

the sample to the population

The reason why such sampling and extrapolation is necessary is that

we can’t assess the entire population

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 54 / 58

Sampling variation

How much do we trust” the observed sample prevalence as an

estimate of the population prevalence?

The answer depends on the sampling variation

Sampling variability reflects the extent to which the sample

prevalence tends to vary from sample to sample

If our sample included n = 1000 individuals we would trust” the

observed sample prevalence more than if our sample included n = 100

individuals

Consider a plot of repeated samples with difference sample sizes

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 55 / 58

Sample Prevalence (%)

5 10 15 20 25

Sample size=100

Sample size=1000

Figure 1: Results from 10 prevalence studies with sample size

100, and 10 prevalence studies with sample size 1000.

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 56 / 58

Probability model

In order to quantify our trust” in the sample prevalence, we need

some way of describing its variability

This can be done using a probability model

In this example the binomial distribution provides a natural model for

the way the sampling has been carried out assuming:

I n is fixed not random

I individuals are sampled independently

We then have a probability model for the observed number of

diseased individuals X and the sample prevalence

P = X

n

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 57 / 58

Binomial model

Pr(X = x) = n!

(n – x)!x!θx(1 – θ)n-x x = 0; : : : ; n

or

Pr(P = p) = n!

(n – pn)!(pn)!θpn(1 – θ)n-pn pn = 0; : : : ; n

We can use this distribution to quantify our trust in the sample

prevalence as an indication of the population prevalence, particularly

using the distribution’s mean and variance

We can also use this model to calculate a confidence interval, which

is an important summary of our trust” in the sample

We will come back to this in Topic 3, after discussing the large

sample normal approximation to the binomial distribution and some

key estimation concepts

Nan Zou (Topic 1) STAT6110/STAT3110 Statistical Inference 58 / 58

- Assignment status: Already Solved By Our Experts
*(USA, AUS, UK & CA PhD. Writers)***CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS**

**NO PLAGIARISM**– CUSTOM PAPER