Project Presentations
Please take a look at the sign-up sheet even if you already signed up. We had to move the two Thursday evening time slots to Monday evening.
You can sign up for a presentation slot on this Google Sheet: https://docs.google.com/spreadsheets/d/1xcWRZuM4OO0ePR7rqO43tuXVGLQnc0asK1tODdE-qfo/edit?usp=sharing
Example project slides located here: https://github.com/jbryer/DATA606Spring2023/tree/main/Project
What was the most important thing you learned during this class?
What important question remains unanswered for you?
NYS publishes data for each school in the state. We will look at the grade 8 math scores for 2012 and 2013. 2013 was the first year the tests were aligned with the Common Core Standards. There was a lot of press about how the passing rates for most schools dropped. Two questions we wish to answer:
load('../course_data/NYSReportCard-Grade7Math.Rda')names(reportCard)
## [1] "BEDSCODE" "School" "NumTested2012" "Mean2012" ## [5] "Pass2012" "Charter" "GradeSubject" "County" ## [9] "BOCES" "NumTested2013" "Mean2013" "Pass2013"
reportCard
Data FrameBEDSCODE | School | NumTested2012 | Mean2012 | Pass2012 | Charter | GradeSubject | County | BOCES | NumTested2013 | Mean2013 | Pass2013 |
---|---|---|---|---|---|---|---|---|---|---|---|
010100010020 | NORTH ALBANY ACADEMY | 47 | 649 | 13 | false | Grade 7 Math | Albany | BOCES ALBANY-SCHOH-SCHENECTADY-SARAT | 45 | 268 | 0 |
010100010030 | WILLIAM S HACKETT MIDDLE SCHOOL | 212 | 652 | 30 | false | Grade 7 Math | Albany | BOCES ALBANY-SCHOH-SCHENECTADY-SARAT | 250 | 279 | 9 |
010100010045 | STEPHEN AND HARRIET MYERS MIDDLE SCHOOL | 262 | 670 | 50 | false | Grade 7 Math | Albany | BOCES ALBANY-SCHOH-SCHENECTADY-SARAT | 256 | 284 | 8 |
summary(reportCard$Pass2012)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.00 46.00 65.00 61.73 80.00 100.00
summary(reportCard$Pass2013)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.00 7.00 20.00 22.83 33.00 99.00
melted <- melt(reportCard[,c('Pass2012', 'Pass2013')])ggplot(melted, aes(x=value)) + geom_histogram() + facet_wrap(~ variable, ncol=1)
Since the distribution of the 2013 passing rates is skewed, we can log transfor that variable to get a more reasonably normal distribution.
reportCard$LogPass2013 <- log(reportCard$Pass2013 + 1)ggplot(reportCard, aes(x=LogPass2013)) + geom_histogram(binwidth=0.5)
ggplot(reportCard, aes(x=Pass2012, y=Pass2013, color=Charter)) + geom_point(alpha=0.5) + coord_equal() + ylim(c(0,100)) + xlim(c(0,100))
ggplot(reportCard, aes(x=Pass2012, y=LogPass2013, color=Charter)) + geom_point(alpha=0.5) + xlim(c(0,100)) + ylim(c(0, log(101)))
cor.test(reportCard$Pass2012, reportCard$Pass2013)
## ## Pearson's product-moment correlation## ## data: reportCard$Pass2012 and reportCard$Pass2013## t = 47.166, df = 1360, p-value < 2.2e-16## alternative hypothesis: true correlation is not equal to 0## 95 percent confidence interval:## 0.7667526 0.8071276## sample estimates:## cor ## 0.7877848
cor.test(reportCard$Pass2012, reportCard$LogPass2013)
## ## Pearson's product-moment correlation## ## data: reportCard$Pass2012 and reportCard$LogPass2013## t = 56.499, df = 1360, p-value < 2.2e-16## alternative hypothesis: true correlation is not equal to 0## 95 percent confidence interval:## 0.8207912 0.8525925## sample estimates:## cor ## 0.8373991
lm.out <- lm(Pass2013 ~ Pass2012, data=reportCard)summary(lm.out)
## ## Call:## lm(formula = Pass2013 ~ Pass2012, data = reportCard)## ## Residuals:## Min 1Q Median 3Q Max ## -35.484 -6.878 -0.478 5.965 51.675 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -16.68965 0.89378 -18.67 <2e-16 ***## Pass2012 0.64014 0.01357 47.17 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 11.49 on 1360 degrees of freedom## Multiple R-squared: 0.6206, Adjusted R-squared: 0.6203 ## F-statistic: 2225 on 1 and 1360 DF, p-value: < 2.2e-16
lm.log.out <- lm(LogPass2013 ~ Pass2012, data=reportCard)summary(lm.log.out)
## ## Call:## lm(formula = LogPass2013 ~ Pass2012, data = reportCard)## ## Residuals:## Min 1Q Median 3Q Max ## -3.3880 -0.2531 0.0776 0.3461 2.7368 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.307692 0.046030 6.685 3.37e-11 ***## Pass2012 0.039491 0.000699 56.499 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.5915 on 1360 degrees of freedom## Multiple R-squared: 0.7012, Adjusted R-squared: 0.701 ## F-statistic: 3192 on 1 and 1360 DF, p-value: < 2.2e-16
Yes! Whether we log tranform the data or not, the correlations are statistically significant with regression models with R2 creater than 62%.
To answer the second question, whether the drops were different for public and charter schools, we'll look at the residuals.
reportCard$residuals <- resid(lm.out)reportCard$residualsLog <- resid(lm.log.out)
ggplot(reportCard, aes(x=residuals, color=Charter)) + geom_density()
ggplot(reportCard, aes(x=residualsLog, color=Charter)) + geom_density()
H0: There is no difference in the residuals between charter and public schools.
HA: There is a difference in the residuals between charter and public schools.
t.test(residuals ~ Charter, data=reportCard)
## ## Welch Two Sample t-test## ## data: residuals by Charter## t = 6.5751, df = 77.633, p-value = 5.091e-09## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0## 95 percent confidence interval:## 6.411064 11.980002## sample estimates:## mean in group FALSE mean in group TRUE ## 0.479356 -8.716177
t.test(residualsLog ~ Charter, data=reportCard)
## ## Welch Two Sample t-test## ## data: residualsLog by Charter## t = 4.7957, df = 74.136, p-value = 8.161e-06## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0## 95 percent confidence interval:## 0.2642811 0.6399761## sample estimates:## mean in group FALSE mean in group TRUE ## 0.02356911 -0.42855946
It is possible to fit quatric models fairly easily in R, say of the following form:
y=b1x2+b2x+b0
quad.out <- lm(Pass2013 ~ I(Pass2012^2) + Pass2012, data=reportCard)summary(quad.out)$r.squared
## [1] 0.7065206
summary(lm.out)$r.squared
## [1] 0.6206049
summary(quad.out)
## ## Call:## lm(formula = Pass2013 ~ I(Pass2012^2) + Pass2012, data = reportCard)## ## Residuals:## Min 1Q Median 3Q Max ## -46.258 -4.906 -0.507 5.430 43.509 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.0466153 1.4263773 4.940 8.77e-07 ***## I(Pass2012^2) 0.0092937 0.0004659 19.946 < 2e-16 ***## Pass2012 -0.3972481 0.0533631 -7.444 1.72e-13 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 10.11 on 1359 degrees of freedom## Multiple R-squared: 0.7065, Adjusted R-squared: 0.7061 ## F-statistic: 1636 on 2 and 1359 DF, p-value: < 2.2e-16
ggplot(reportCard, aes(x=Pass2012, y=Pass2013)) + geom_point(alpha=0.2) + geom_smooth(method='lm', formula=y ~ x, size=2, se=FALSE) + geom_smooth(method='lm', formula=y ~ I(x^2) + x, size=2, se=FALSE) + coord_equal() + ylim(c(0,100)) + xlim(c(0,100))
reportCard$resid_linear <- resid(lm.out)ggplot(reportCard, aes(x = resid_linear)) + geom_histogram()reportCard$resid_quad <- resid(quad.out)ggplot(reportCard, aes(x = resid_quad)) + geom_histogram()ggplot(reportCard) + geom_density(aes(x = resid_linear), color = 'blue') + geom_density(aes(x = resid_quad), color = 'maroon')sum(reportCard$resid_linear^2) /sum(reportCard$resid_quad^2)
cube.out <- lm(Pass2013 ~ I(Pass2012^3) + I(Pass2012^2) + Pass2012, data=reportCard)summary(cube.out)$r.squared
## [1] 0.7168206
ggplot(reportCard, aes(x=Pass2012, y=Pass2013)) + geom_point(alpha=0.2) + geom_smooth(method='lm', formula=y~poly(x,2,raw=TRUE), size=2, se=FALSE) + geom_smooth(method='loess', formula = y ~ x, size=2, se=FALSE, color = 'purple') + coord_equal() + ylim(c(0,100)) + xlim(c(0,100))
library('VisualStats')library('ShinyDemo')shiny_demo('loess', package = 'VisualStats')
See this site for more info: https://jbryer.github.io/VisualStats/articles/loess.html
shiny::runGitHub('NYSchools','jbryer',subdir='NYSReportCard')
See also the Github repository for more information: https://github.com/jbryer/NYSchools
Complete the one minute paper: https://forms.gle/p9xcKcTbGiyYSz368
Project Presentations
Please take a look at the sign-up sheet even if you already signed up. We had to move the two Thursday evening time slots to Monday evening.
You can sign up for a presentation slot on this Google Sheet: https://docs.google.com/spreadsheets/d/1xcWRZuM4OO0ePR7rqO43tuXVGLQnc0asK1tODdE-qfo/edit?usp=sharing
Example project slides located here: https://github.com/jbryer/DATA606Spring2023/tree/main/Project
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |