0% found this document useful (0 votes)
38 views29 pages

Statistics for Engineers: Probability & Inference

ENGG 2780A / ESTR 2020 is a course on Statistics for Engineers, focusing on probability theory, statistical inference, and various statistical methods including Bayesian and classical statistics. The course covers topics such as random variables, hypothesis testing, confidence intervals, and the Central Limit Theorem, with a structured schedule for lectures and exams. Students will learn to apply statistical concepts to real-world data and problems, emphasizing the relationship between theory and practice.

Uploaded by

henry123au
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views29 pages

Statistics for Engineers: Probability & Inference

ENGG 2780A / ESTR 2020 is a course on Statistics for Engineers, focusing on probability theory, statistical inference, and various statistical methods including Bayesian and classical statistics. The course covers topics such as random variables, hypothesis testing, confidence intervals, and the Central Limit Theorem, with a structured schedule for lectures and exams. Students will learn to apply statistical concepts to real-world data and problems, emphasizing the relationship between theory and practice.

Uploaded by

henry123au
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

ENGG 2780A / ESTR 2020: Statistics for Engineers

Spring 2025

L1: (Probability and)


Statistics

Sinno Jialin Pan


Outline

Probability v.s. statistics

Statistical inference problems


• Bayesian inference
• Classical inference

Review of probability
What is Probability

Probability is a mathematical language


for quantifying uncertainty

🌧
Number of Number of The range of
heads out of rainy days per hourly stock
100 flips month price
Random variable +¿ Probability theory
Probability Theory
1
𝑝=
2
𝑥 Binomial (1 00 , 𝑝)
𝜆=5

🌧 𝑥 Poisson ( 𝜆)
,
2
𝑥 Normal (𝜇 , 𝜎 )
Probability Theory (cont.)
Assume probability distribution is
known
• A family of distributions
• The parameter(s) of the
Data
distribution
generating
process
𝑃 ( 𝑥=𝑘) OR 𝑃 (𝑘1 ≤ 𝑥 ≤𝑘2 )

Independence: 𝑃 ( 𝑥𝑖 ∨𝑥 𝑗 ) =𝑃 (𝑥 𝑖 )
Conditional Independence:

Bayes’ Rule: 𝑃 ( 𝑥𝑖 ∨ 𝑦 ,𝑥 𝑗 ) =𝑃 (𝑥 𝑖∨ 𝑦)
The Central Dogma of Statistics

data = independent
samples

We have samples of observed data,


but don’t know the underlining
distribution
Statistics
Observations of
heads for 100
flips
𝑥 Binomial (100 , 𝑝)
Historical records of number
of rainy days for the past few

🌧
years
𝑥 Poisson ( 𝜆)
Historical hourly prices of
the stock for the past few
months
2
𝑥 Normal (𝜇 , 𝜎 )
Probability v.s. Statistics

Probability
theory
The Central
Data Limit
Observed
generating Theorem
data
process

Statistical
inference
“Theory without Practice is empty; but Practice
without Theory is blind” – Immanuel Kant
Descriptive statistics v.s. Inferential
statistics
Descriptive statistics: use numbers to
summarize and describe data

Do not involve
generalization beyond
the data at hand
021 Report on Annual Earnings and Hours Survey (from [Link]
Statistical inference tasks

HTTHTTHTTT
etc.
Estimation:Binomial (10,𝑝 )
𝜃
Classical Bayesian
statistics are
Parameters statistics
Parameters are
considered as considered as random
deterministic variables with prior
quantities that 𝜃distributions
𝑓 or𝑝 (𝜃)
Point
happenestimation
to unknown
Θ Θ
with observed data Bayes’
𝑃 ( 𝑥|𝜃 ) 𝑃 (𝜃)
rule 𝜃∨𝑥 𝑓 Θ∨ X or 𝑝 Θ∨ X (𝜃)
𝑃 ( 𝜃|𝑥 ) =
𝑃 (𝑥 )
Statistical inference tasks

HTTHTTHTTT
Hypothesis testing:biased or fair?
Statistical inference tasks

HTTHTTHTTT
Confidence interval
estimation
95%
confidence
Schedule
Week Date
Lectur
Topic
Bayesian
e statistics
Week 1 Jan 6 L1 Probability vs Statistics
Week 2 Jan 13 L2 Bayesian statistics
Prediction, estimation, &
Week 3 Jan 20 L2 & L3
hypothesis testing
Prediction, estimation, &
Week 4 Jan 27 L3
hypothesis testing
Week 5 Feb 3 Lunar New Year Vacation (no class)
Week 6 Feb 10 L4 Sampling statistics
Week 7 Feb 17 L5 Classical point estimation
Week 8 Feb 24 Midterm Exam (during lecture)
Week 9 Mar 3 Reading Week (no class)
Week 10 Mar 10 L6 Confidence interval I
Week 11 Mar 17 L7 Confidence interval II
Week 12 Mar 24 L8 Hypothesis test
Week 13 Mar 31 L9 Composite hypothesis test
Week 14 Apr 7 L10 Comparing populations
Review of Probability
Random variables: quantify outcomes of
random events Non-deterministic
Discrete random
variables are defined by a Probability Mass
Distributions
Function (PMF), 𝑘
𝑃 ( 𝑋 = 𝑥 𝑖 ) =𝑝 ( 𝑥 𝑖 ) ,𝑖=1 , … ,𝑘 ∑ 𝑝 ( 𝑥 𝑖 ) =1
𝑖=1
Continuous random
variables
Distributions are defined by a Probability Density
Function (PDF),
𝑏
Note 𝑃 ( 𝑋 =𝑥 ) ≠ 𝑓 ( 𝑥)
𝑃 ( 𝑎 ≤ 𝑋 ≤ 𝑏 )=∫ 𝑓 ( 𝑥 ) 𝑑𝑥
: 𝑓 (𝑥)
𝑎
𝑃 ( 𝑋 =𝑥 ) =0
∞ 𝑥+𝛿
∫ 𝑓 ( 𝑥) 𝑑𝑥=1 𝑃 ( 𝑋 =𝑥 ) =lim ∫ 𝑓 ( 𝑥) 𝑑𝑥≈ 𝑓 (𝑥) 𝛿
−∞ 𝛿→0 𝑥 𝑋
𝑥𝑥+ 𝛿
Binomial random variables

Parameter for Bernoulli variable


𝑋𝑖 0 1
PMF ( 𝑋 𝑖 ) 1 −𝑝 𝑝

𝑋 1+ …+ 𝑋 𝑛 = 𝑋 Binomial ( 𝑛 , 𝑝 )
PMF ( 𝑘, 𝑛 , 𝑝 )=𝑃 ( 𝑋 =𝑘 ;𝑛 ,𝑝 )
¿ ( )
𝑛
𝑘
𝑘
𝑝 (1 −𝑝 )
𝑛 −𝑘
How likely to get 2 heads in 3 coin flips
if
the probability of3heads is
𝑃 ( 𝐻=2 )= (2 ) 𝑝 (1− 𝑝)
2

𝐻 Binomial( 3 , 𝑝)

0.5 : 2
3 × 0.5 ×0.5=0.375

0.7 : 2
3 × 0.7 ×0.3= 0.441

1: 2
3 × 1 × 0= 0
How likely to get 200 heads in 300 coin
flips if
the probability of heads is200
𝑃 ( 𝐻=200 )=300
(200 ) 𝑝 300 −200
(1− 𝑝)

𝐻 Binomial(300 , 𝑝)

0.5 : ≈ 2 × 10
−9

0.7 : ≈ 0.022

1: ¿0
and random variables

𝑧 𝒩 ( 0 ,1 )
𝑥 −𝜇
𝑧=
𝜎

𝑥=𝜎 𝑧 +𝜇
PDF 1 ( 𝑥 −𝜇 )
2 PDF 1
1
2
− − 𝑥
1 2 𝜎2
𝑓 ( 𝑥)= 𝑒 2
𝑓 ( 𝑥)= 𝑒
𝜎 √2 𝜋 √2𝜋
2
𝑡 𝑥
Cumulative Density 1 −

Function (CDF) of
Φ (𝑡 )=
√2 𝜋
∫𝑒 2
𝑑𝑥
−∞

PDF CDF
𝑃 ( 𝑋 ≤𝑡 )
Mean and variance

𝔼[ 𝑥] Var [ 𝑥 ]
𝑥 Bernoulli ( 𝑝) 𝑝 𝑝 (1 −𝑝 )

𝑥 B inomial ( 𝑛,𝑝) 𝑛𝑝 𝑛𝑝(1 −𝑝 )

𝑥 𝒩( 𝜇 , 𝜎 ) 𝜇
2 2
𝜎
The Central Limit Theorem

are independent with the same PMF/PDF


𝑛
𝔼 [ 𝑋 𝑖 ] =𝜇 Var [ 𝑋 𝑖 ] =𝜎 2> 0 𝑋=∑ 𝑋 𝑖
𝑖=1

For every (positive or


CDF of
negative):
lim 𝑃 ( 𝑋 ≤ 𝔼 [ 𝑋 ] +𝑡 √ Var [ 𝑋 ] ) =Φ (𝑡 )
𝑛→ ∞

¿ 𝑍
lim 𝑃
𝑛→ ∞ (√
𝑋 −𝔼 [ 𝑋 ]
Var [ 𝑋 ] )
≤ 𝑡 =Φ (𝑡 )
𝑋 𝒩 (𝔼 [ 𝑋 ] ,Var [ 𝑋 ] )
𝑋 𝒩 (𝑛 𝜇 ,𝑛 𝜎 )
can be 2
approximated by
Use CLT to estimate probability of at
least 200 heads in 300 coin-flips if is
𝐻 Binomial(300 , 𝑝) 𝜇=300 𝑝 𝜎 = √ 300 𝑝 (1 −𝑝 )

0.5 : 𝜇=150 , 𝜎 ≈ 8.66


𝑃 ( 𝐻 ≥ 200)≈ 𝒩 ( 𝑥 ≥ 200;
≈ 𝒩 ( 𝑧 ≥ 5.77; 0 , 1 )
2 200 −150
𝜇, 𝜎 ) 8.66

≈0

𝑃 ( 𝐻 ≥ 200 ) ≈ 𝒩 ( 𝑥 ≥ 200; 𝜇 , 𝜎 )
≈ 𝒩 ( 𝑧 ≥ −1.26 ; 0 , 1 )
0.7 : 2
200 −210
7.94

¿ Φ (1.26 ≈) 0.896
Bayesian statistical inference

1. Assign prior probabilities to


parameters
2. Observe data

3. Update probabilities via Bayes’ rule


𝑃Θ ( 𝜃) 𝑃 𝑋 ∨Θ ( 𝑥|𝜃 ) 𝑃 Θ ( 𝜃 ) 𝑃 𝑋 ∨Θ ( 𝑥|𝜃 )
𝑃 Θ∨ 𝑋 ( 𝜃|𝑥 ) = =
𝑃 𝑋 ( 𝑥) ∑ 𝑃Θ (𝜃 ′ ) 𝑃 𝑋 ∨Θ ( 𝑥|𝜃′ )

𝜃
Bayes’ rule (four versions)
Prior likeliho
𝑝 Θ ( 𝜃 ) 𝑝 𝑋 ∨Θ ( 𝑥|𝜃 )
discrete, 𝑝 Θ∨ 𝑋 ( 𝜃 |𝑥 ) = od ′
discrete: ∑ 𝑝 Θ ( 𝜃 ) 𝑝 𝑋 ∨Θ ( 𝑥|𝜃 )

𝜃
Posterior
𝑝Θ ( 𝜃 ) 𝑓 ( 𝑥|𝜃 )
discrete, 𝑝 Θ∨ 𝑋 ( 𝜃 |𝑥 ) = 𝑋 ∨Θ

continuous: ∑ 𝑝 Θ ( 𝜃′ ) 𝑓 𝑋 ∨Θ ( 𝑥|𝜃 ′ )

𝜃

𝑍 (𝑥)
𝑓 Θ ( 𝜃) 𝑝 𝑋 ∨Θ ( 𝑥|𝜃 )
continuous, 𝑓 Θ ∨ 𝑋 ( 𝜃|𝑥 )=
discrete: ∫ 𝑓 Θ ( 𝜃 )𝑝 𝑋 ∨Θ ( 𝑥|𝜃 ) 𝑑 𝜃
′ ′ ′

continuous, 𝑓 Θ ( 𝜃) 𝑓 𝑋 ∨Θ ( 𝑥|𝜃 )
𝑓 Θ ∨ 𝑋 ( 𝜃|𝑥 )=
continuous: ∫ 𝑓 Θ ( 𝜃 ) 𝑓 𝑋 ∨Θ ( 𝑥|𝜃 ) 𝑑 𝜃
′ ′ ′

Note: these 4 versions are obtained by replacing by the PMF for


discrete variables and by the PDF for continuous variables
Denominator is a constant w.r.t. ?

E.g.,

∫ 𝑓 Θ (𝜃 ) 𝑓 𝑋∨Θ ( 𝑥|𝜃 ) 𝑑𝜃¿∫ 𝑓 𝑋 ,Θ (𝜃 ,𝑥) 𝑑𝜃


′ ′ ′

¿𝑓
′ ′
Marginalized
( 𝑥 )Or denoted by
𝑋

Only depends on the observed data


Constant w.r.t.
A coin might be of the following type:

H T H H T T
Prior 90% 5% 5%
𝜃=1 𝜃=2 𝜃=3
You flip a . How do you adjust your beliefs
(priors)? 𝑃 ( 𝐻 |𝜃=1 ) 𝑃 (𝜃 =1) 0.5 × 0.9 0.45
𝑃 ( 𝜃=1| 𝐻 1 )=
1
¿ ¿
𝑍 ( 𝐻 1) 𝑍 ( 𝐻 1 ) 𝑍( 𝐻 1)
𝑃 ( 𝐻 1|𝜃 =2 ) 𝑃 ( 𝜃= 2) 1× 0.05 0.05
𝑃 ( 𝜃=2| 𝐻 1 ) = ¿ ¿
𝑍 ( 𝐻 1) 𝑍 ( 𝐻 1 ) 𝑍( 𝐻 1)
𝑃 ( 𝜃= 3|𝐻 1 ) =0

𝑍 (𝐻 1)=0.45+ 0.05+ 0=0.5


𝑃 ( 𝜃=1| 𝐻 1 )= 0.9 𝑃 ( 𝜃= 2| 𝐻 1 ) =0.1 𝑃 ( 𝜃= 3|𝐻 1 ) =0
𝑃 ( 𝑏|𝑎, 𝑐 ) 𝑃 (𝑎∨𝑐)
𝑃 ( 𝑎|𝑏, 𝑐 ) =
Adjusted priors:Bayes’ rule
variant:
𝑃 (𝑏∨𝑐 )
𝑍 (𝑏, 𝑐)

H T H H T T
Prior 90% 10% 0%
𝜃=1∨𝐻 1 𝜃=2∨𝐻 1 𝜃=3∨ 𝐻 1
You flip another . How do you
readjust?
𝑃 ( 𝜃= 3|𝐻 𝐻 ) =0
2
and are independent given a
1

𝑃 ( 𝐻 2|𝜃=1 ,specific coin ) 0.5× 0.9


𝐻 1 ) 𝑃 ( 𝜃=1∨𝐻 0.45
𝑃 ( 𝜃=1| 𝐻 2 𝐻 1 ) =
1
¿ ¿
𝑍 ( 𝐻2, 𝐻1) 𝑍 ( 𝐻2, 𝐻1) 𝑍( 𝐻 2 , 𝐻 1 )
𝑃 ( 𝐻 2|𝜃=2 , 𝐻 1 ) 𝑃 ( 𝜃=2∨𝐻 1) 1× 0.1 0.1
𝑃 ( 𝜃=2| 𝐻 2 𝐻 1 )= ¿ ¿
𝑍 ( 𝐻 2 , 𝐻 1) 𝑍( 𝐻 2 , 𝐻 1 ) 𝑍 ( 𝐻 2 , 𝐻 1 )
𝑍 (𝐻 2 𝐻 1)=0 .45+0.1+0=0.55
0.45
𝑃 ( 𝜃=1| 𝐻 2 𝐻 1 ) = ≈ 0.82 𝑃 ( 𝜃= 2| 𝐻 2 𝐻 1 ) ≈ 0.18 𝑃 ( 𝜃= 3|𝐻 2 𝐻 1 ) =0
0.55
Bayes’ rule for multiple random
variables
Use continuous variables as an example
𝑓𝑋 , …, 𝑋𝑛 ∨Θ ( 𝑥 1 ,… , 𝑥 𝑛|𝜃 ) 𝑓 Θ ( 𝜃)
𝑓 Θ ∨ 𝑋 , …, 𝑋 ( 𝜃|𝑥 1 , … , 𝑥𝑛 )= 1

1 𝑛
𝑍 ( 𝑥 1 ,… , 𝑥 𝑛)

∝𝑓 𝑋1 , …, 𝑋 𝑛 ∨Θ ( 𝑥 1 , … , 𝑥 𝑛|𝜃 ) 𝑓 Θ ( 𝜃)

¿ 𝑓 𝑋 ∨Θ ( 𝑥 1|𝜃 ) … 𝑓 𝑋 ∨Θ ( 𝑥 𝑛|𝜃 ) 𝑓 Θ (𝜃)


1 𝑛

if are independent given


Some commonly used prior
distributions

B eta ( 𝛼 , 𝛽 )=¿
Γ ( 𝛼 ) Γ ( 𝛽) for positive

Γ ( 𝛼 )=∫ 𝑥 𝛼 −1 𝑒− 𝑥 𝑑𝑥 integer
B ( 𝛼 , 𝛽 )=
Γ ( 𝛼+ 𝛽 ) 0

is widely used to model the (prior) distribution of a


random variable whose range is , where and are
(hyper)-parameters

As ∫ B eta ( 𝛼 , 𝛽 ) 𝑑 𝜃=1 , we have
−∞

∫ 𝜃
𝛼 −1
¿ ¿
− ∞
Some commonly used prior distributions (cont.)

{
𝛽 𝛼 𝛼− 1 − 𝛽𝜃
G amma ( 𝛼 , 𝛽 )= Γ ( 𝛼 ) 𝜃 𝑒 for 𝜃 >0
0 for 𝜃 ≤ 0

for positive integer



Γ ( 𝛼 )=∫ 𝑥 𝛼 −1 𝑒− 𝑥 𝑑𝑥
0

is widely used to model the (prior) distribution of a


non-negative random variable, where and are
(hyper)-parameters

As ∫ Gamma ( 𝛼 , 𝛽 ) 𝑑 𝜃=1 , we have
−∞
∞ ∞
Γ (𝛼 )
∫𝜃 𝛼 −1
𝑒 − 𝛽𝜃
𝑑 𝜃 =∫ 𝜃 𝛼−1
𝑒 − 𝛽𝜃
𝑑 𝜃=
𝛽
𝛼
−∞ 0

You might also like