+ - 0:00:00
Notes for current slide
Notes for next slide

Summarizing Data Part 2

DATA 606 - Statistics & Probability for Data Analytics

Jason Bryer, Ph.D. and Angela Lui, Ph.D.

February 15, 2023

1 / 29

One Minute Paper Results

What was the most important thing you learned during this class?

What important question remains unanswered for you?

2 / 29

Grammer of Graphics

3 / 29

Data Visualizations with ggplot2

  • ggplot2 is an R package that provides an alternative framework based upon Wilkinson’s (2005) Grammar of Graphics.

  • ggplot2 is, in general, more flexible for creating "prettier" and complex plots.

  • Works by creating layers of different types of objects/geometries (i.e. bars, points, lines, polygons, etc.) ggplot2 has at least three ways of creating plots:

    1. qplot
    2. ggplot(...) + geom_XXX(...) + ...
    3. ggplot(...) + layer(...)
  • We will focus only on the second.

4 / 29

Parts of a ggplot2 Statement

  • Data
    ggplot(myDataFrame, aes(x=x, y=y))

  • Layers
    geom_point(), geom_histogram()

  • Facets
    facet_wrap(~ cut), facet_grid(~ cut)

  • Scales
    scale_y_log10()

  • Other options
    ggtitle('my title'), ylim(c(0, 10000)), xlab('x-axis label')

5 / 29

Lots of geoms

ls('package:ggplot2')[grep('^geom_', ls('package:ggplot2'))]
## [1] "geom_abline" "geom_area" "geom_bar"
## [4] "geom_bin_2d" "geom_bin2d" "geom_blank"
## [7] "geom_boxplot" "geom_col" "geom_contour"
## [10] "geom_contour_filled" "geom_count" "geom_crossbar"
## [13] "geom_curve" "geom_density" "geom_density_2d"
## [16] "geom_density_2d_filled" "geom_density2d" "geom_density2d_filled"
## [19] "geom_dotplot" "geom_errorbar" "geom_errorbarh"
## [22] "geom_freqpoly" "geom_function" "geom_hex"
## [25] "geom_histogram" "geom_hline" "geom_jitter"
## [28] "geom_label" "geom_line" "geom_linerange"
## [31] "geom_map" "geom_path" "geom_point"
## [34] "geom_pointrange" "geom_polygon" "geom_qq"
## [37] "geom_qq_line" "geom_quantile" "geom_raster"
## [40] "geom_rect" "geom_ribbon" "geom_rug"
## [43] "geom_segment" "geom_sf" "geom_sf_label"
## [46] "geom_sf_text" "geom_smooth" "geom_spoke"
## [49] "geom_step" "geom_text" "geom_tile"
## [52] "geom_violin" "geom_vline"
6 / 29

Data Visualization Cheat Sheet

7 / 29

Scatterplot

ggplot(legosets, aes(x=pieces, y=US_retailPrice)) + geom_point()

8 / 29

Scatterplot (cont.)

ggplot(legosets, aes(x=pieces, y=US_retailPrice, color=availability)) + geom_point()

9 / 29

Scatterplot (cont.)

ggplot(legosets, aes(x=pieces, y=US_retailPrice, size=minifigs, color=availability)) + geom_point()

10 / 29

Scatterplot (cont.)

ggplot(legosets, aes(x=pieces, y=US_retailPrice, size=minifigs)) + geom_point() + facet_wrap(~ availability)

11 / 29

Boxplots

ggplot(legosets, aes(x='Lego', y=US_retailPrice)) + geom_boxplot()

12 / 29

Boxplots (cont.)

ggplot(legosets, aes(x=availability, y=US_retailPrice)) + geom_boxplot()

13 / 29

Boxplot (cont.)

ggplot(legosets, aes(x=availability, y=US_retailPrice)) + geom_boxplot() + coord_flip()

14 / 29

Histograms

ggplot(legosets, aes(x = US_retailPrice)) + geom_histogram()

15 / 29

Histograms (cont.)

ggplot(legosets, aes(x = US_retailPrice)) + geom_histogram() + scale_x_log10()

16 / 29

Histograms (cont.)

ggplot(legosets, aes(x = US_retailPrice)) + geom_histogram() + facet_wrap(~ availability)

17 / 29

Density Plots

ggplot(legosets, aes(x = US_retailPrice, color = availability)) + geom_density()

18 / 29

ggplot2 aesthetics

19 / 29

Likert Scales

Likert scales are a type of questionnaire where respondents are asked to rate items on scales usually ranging from four to seven levels (e.g. strongly disagree to strongly agree).

library(likert)
library(reshape)
data(pisaitems)
items24 <- pisaitems[,substr(names(pisaitems), 1,5) == 'ST24Q']
items24 <- rename(items24, c(
ST24Q01="I read only if I have to.",
ST24Q02="Reading is one of my favorite hobbies.",
ST24Q03="I like talking about books with other people.",
ST24Q04="I find it hard to finish books.",
ST24Q05="I feel happy if I receive a book as a present.",
ST24Q06="For me, reading is a waste of time.",
ST24Q07="I enjoy going to a bookstore or a library.",
ST24Q08="I read only to get information that I need.",
ST24Q09="I cannot sit still and read for more than a few minutes.",
ST24Q10="I like to express my opinions about books I have read.",
ST24Q11="I like to exchange books with my friends."))
20 / 29

likert R Package

l24 <- likert(items24)
summary(l24)
## Item low neutral
## 10 I like to express my opinions about books I have read. 41.07516 0
## 5 I feel happy if I receive a book as a present. 46.93475 0
## 8 I read only to get information that I need. 50.39874 0
## 7 I enjoy going to a bookstore or a library. 51.21231 0
## 3 I like talking about books with other people. 54.99129 0
## 11 I like to exchange books with my friends. 55.54115 0
## 2 Reading is one of my favorite hobbies. 56.64470 0
## 1 I read only if I have to. 58.72868 0
## 4 I find it hard to finish books. 65.35125 0
## 9 I cannot sit still and read for more than a few minutes. 76.24524 0
## 6 For me, reading is a waste of time. 82.88729 0
## high mean sd
## 10 58.92484 2.604913 0.9009968
## 5 53.06525 2.466751 0.9446590
## 8 49.60126 2.484616 0.9089688
## 7 48.78769 2.428508 0.9164136
## 3 45.00871 2.328049 0.9090326
## 11 44.45885 2.343193 0.9609234
## 2 43.35530 2.344530 0.9277495
## 1 41.27132 2.291811 0.9369023
## 4 34.64875 2.178299 0.8991628
## 9 23.75476 1.974736 0.8793028
## 6 17.11271 1.810093 0.8611554
21 / 29

likert Plots

plot(l24)

22 / 29

likert Plots

plot(l24, type='heat')

23 / 29

likert Plots

plot(l24, type='density')

24 / 29

Pie Charts

There is only one pie chart in OpenIntro Statistics (Diez, Barr, & Çetinkaya-Rundel, 2015, p. 48). Consider the following three pie charts that represent the preference of five different colors. Is there a difference between the three pie charts? This is probably a difficult to answer.

25 / 29

Pie Charts

There is only one pie chart in OpenIntro Statistics (Diez, Barr, & Çetinkaya-Rundel, 2015, p. 48). Consider the following three pie charts that represent the preference of five different colors. Is there a difference between the three pie charts? This is probably a difficult to answer.

Source: https://en.wikipedia.org/wiki/Pie_chart.

26 / 29

Just say NO to pie charts!

"There is no data that can be displayed in a pie chart that cannot better be displayed in some other type of chart"

John Tukey
27 / 29

Additional Resources

For data wrangling:

For data visualization:

28 / 29

One Minute Paper

Complete the one minute paper: https://forms.gle/p9xcKcTbGiyYSz368

  1. What was the most important thing you learned during this class?

  2. What important question remains unanswered for you?

29 / 29

One Minute Paper Results

What was the most important thing you learned during this class?

What important question remains unanswered for you?

2 / 29
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow