What was the most important thing you learned during this class?
What important question remains unanswered for you?
coins <- sample(c(-1,1), 100, replace=TRUE)plot(1:length(coins), cumsum(coins), type='l')abline(h=0)
cumsum(coins)[length(coins)]
## [1] 0
samples <- rep(NA, 1000)for(i in seq_along(samples)) { coins <- sample(c(-1,1), 100, replace=TRUE) samples[i] <- cumsum(coins)[length(coins)]}head(samples, n = 15)
## [1] 10 6 14 -6 -12 28 -12 -6 -4 14 0 -4 12 -14 14
hist(samples)
(m.sam <- mean(samples))
## [1] 0.58
(s.sam <- sd(samples))
## [1] 9.726161
within1sd <- samples[samples >= m.sam - s.sam & samples <= m.sam + s.sam]length(within1sd) / length(samples)
## [1] 0.693
within2sd <- samples[samples >= m.sam - 2 * s.sam & samples <= m.sam + 2* s.sam]length(within2sd) / length(samples)
## [1] 0.965
within3sd <- samples[samples >= m.sam - 3 * s.sam & samples <= m.sam + 3 * s.sam]length(within3sd) / length(samples)
## [1] 0.999
f(x|μ,σ)=1σ√2πe−(x−μ)22σ2
x <- seq(-4,4,length=200); y <- dnorm(x,mean=0, sd=1)plot(x, y, type = "l", lwd = 2, xlim = c(-3.5,3.5), ylab='', xlab='z-score', yaxt='n')
pnorm(15, mean=mean(samples), sd=sd(samples))
## [1] 0.9309096
1 - pnorm(15, mean=mean(samples), sd=sd(samples))
## [1] 0.06909043
SAT scores are distributed nearly normally with mean 1500 and standard deviation 300. ACT scores are distributed nearly normally with mean 21 and standard deviation 5. A college admissions officer wants to determine which of the two applicants scored better on their standardized test with respect to the other test takers: Pam, who earned an 1800 on her SAT, or Jim, who scored a 24 on his ACT?
SAT scores are distributed nearly normally with mean 1500 and standard deviation 300. ACT scores are distributed nearly normally with mean 21 and standard deviation 5. A college admissions officer wants to determine which of the two applicants scored better on their standardized test with respect to the other test takers: Pam, who earned an 1800 on her SAT, or Jim, who scored a 24 on his ACT?
Z=observation−meanSD
SAT scores are distributed nearly normally with mean 1500 and standard deviation 300. ACT scores are distributed nearly normally with mean 21 and standard deviation 5. A college admissions officer wants to determine which of the two applicants scored better on their standardized test with respect to the other test takers: Pam, who earned an 1800 on her SAT, or Jim, who scored a 24 on his ACT?
Z=observation−meanSD
Converting Pam and Jim's scores to z-scores:
ZPam=1800−1500300=1
ZJim=24−215=0.6
Some problems1:
This example looks at the relationship between NZ dollar exchange rate and trade weighted index.
DATA606::shiny_demo('DualScales', package='DATA606')
My advise:
1 http://blog.revolutionanalytics.com/2016/08/dual-axis-time-series.html
2 http://ellisp.github.io/blog/2016/08/18/dualaxes
SAT scores are distributed nearly normally with mean 1500 and standard deviation 300.
68% of students score between 1200 and 1800 on the SAT.
95% of students score between 900 and 2100 on the SAT.
99.7% of students score between 600 and 2400 on the SAT.
To use the 68-95-99 rule, we must verify the normality assumption. We will want to do this also later when we talk about various (parametric) modeling. Consider a sample of 100 male heights (in inches).
Histogram looks normal, but we can overlay a standard normal curve to help evaluation.
DATA606::qqnormsim(heights)
A random variable X has a Bernoulli distribution with parameter p if
P(X=1)=pandP(X=0)=1−p for 0<p<1
Dr. Smith wants to repeat Milgrams experiments but she only wants to sample people until she finds someone who will not inflict a severe shock. What is the probability that she stops after the first person?
P(1stpersonrefuses)=0.35
the third person?
P(1stand2ndshock,3rdrefuses)=S0.65×S0.65×R0.35=0.652×0.35≈0.15
the tenth person?
Geometric distribution describes the waiting time until a success for independent and identically distributed (iid) Bernouilli random variables.
Geometric probabilities
If p represents probability of success, (1−p) represents probability of failure, and n represents number of independent trials
P(successonthenthtrial)=(1−p)n−1p
How many people is Dr. Smith expected to test before finding the first one that refuses to administer the shock?
The expected value, or the mean, of a geometric distribution is defined as 1p.
μ=1p=10.35=2.86
She is expected to test 2.86 people before finding the first one that refuses to administer the shock.
But how can she test a non-whole number of people?
μ=1p | σ=√1−pp2 |
Going back to Dr. Smith’s experiment:
σ=√1−pp2=√1−0.350.352=2.3
Dr. Smith is expected to test 2.86 people before finding the first one that refuses to administer the shock, give or take 2.3 people.
These values only make sense in the context of repeating the experiment many many times.
Suppose we randomly select four individuals to participate in this experiment. What is the probability that exactly 1 of them will refuse to administer the shock
Let’s call these people Allen (A), Brittany (B), Caroline (C), and Damian (D). Each one of the four scenarios below will satisfy the condition of “exactly 1 of them refuses to administer the shock”:
The probability of exactly one 1 of 4 people refusing to administer the shock is the sum of all of these probabilities.
0.0961 + 0.0961 + 0.0961 + 0.0961 = 4 × 0.0961 = 0.3844
The question from the prior slide asked for the probability of given number of successes, k, in a given number of trials, n, (k = 1 success in n = 4 trials), and we calculated this probability as
# of scenarios × P(single scenario)
Number of scenarios: there is a less tedious way to figure this out, we’ll get to that shortly...
P(singlescenario)=pk(1−p)(n−k)
The Binomial distribution describes the probability of having exactly k successes in n independent Bernouilli trials with probability of success p.
The choose function is useful for calculating the number of ways to choose k successes in n trials.
(n k)=n!k!(n−k)!
For example, :
(9 2)=9!2!(9−2)!=9×8×7!2×1×7!=722=36
choose(9,2)
## [1] 36
If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes
P(ksuccessesinntrials)=(n k)pk(1−p)(n−k)
n <- 4p <- 0.35barplot(dbinom(0:n, n, p), names.arg=0:n)
dbinom(1, 4, p)
## [1] 0.384475
Complete the one minute paper: https://forms.gle/p9xcKcTbGiyYSz368
What was the most important thing you learned during this class?
What important question remains unanswered for you?
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |