+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to DATA 606

Statistics & Probability for Data Analytics

Jason Bryer, Ph.D. and Angela Lui, Ph.D.

Spring 2023

1 / 21

Agenda

  • About your instructors
  • Syllabus
  • Class meetups
  • Course Schedule
  • Assignments (how you will be graded)
    • Participation
    • Labs
    • Data Project
    • Exams
  • Software
    • The DATA606 R Package
    • Using R Markdown
3 / 21

A little about Jason...

  • Assistant Professor at CUNY in Data Science and Information Systems
  • Principal Investigator for a Department of Education Grant (part of their FIPSE First in the World program) to develop a Diagnostic Assessment and Achievement of College Skills (www.DAACS.net)
  • Authored over a dozen R packages including:
  • Specialize in propensity score methods. Three new methods/R packages developed include:
4 / 21

Also a Father...

5 / 21

Runner...

6 / 21

And photographer.

7 / 21

A little about Angela...

Angela Lui


NYU
Hunter
UAlbany
Rutgers
CUNY SPS
DAACS
8 / 21

Teaching Experience

  • Introduction to Statistics in Social Sciences

  • Special Issues in Testing

  • Evaluation

  • Motivation in Education

  • Introduction to the Psychological Processing of Schooling

  • Educational Psychology in Adolescent Development

9 / 21

Homeowner

10 / 21

11 / 21

Syllabus

Syllabus and course materials are here: https://spring2023.data606.net

The site is built using the Blogdown R package and hosted on Github. Each page of the site has a "Improve this page" link at the bottom right, use that to start a pull request on Github.

We will use Blackboard primary for submitting assignments only. Please submit:

  • A PDF or link to the built HTML (e.g. Rpubs, Github)

PDFs are preferred for the homework as there is some LaTeX formatting in the R markdown files. The tineytex R package helps with install LaTeX, but you can also install LaTeX using MiKTeX (for Windows) and BasicTeX (for Mac) See this page for more information: https://spring2023.data606.net/course-overview/software/

12 / 21

Meetups

We will have meetups on Wednesday evenings at 8:00pm.

Meetups will be recorded and made available the next day on the course website.

Though attending live is not strictly required, We expect everyone to watch the lectures during the week. I use the class meetups to convey important information and announcements. Very often I will cover some topics not in the textbook. Students who attend the meetups tend to do well on the assignments.

One Minute Papers - Complete the one minute paper after each Meetup (whether you watch live or watch the recordings). It should take approximately one to two minutes to complete. This allows me to 1) verify you have attended/watch the meetup and 2) get feedback about what you learned and what you may still be unclear.

Please note: Students who participate in this class with their camera on or use a profile image are agreeing to have their video or image recorded solely for the purpose of creating a record for students enrolled in the class to refer to, including those enrolled students who are unable to attend live. If you are unwilling to consent to have your profile or video image recorded, be sure to keep your camera off and do not use a profile image. Likewise, students who un-mute during class and participate orally are agreeing to have their voices recorded. If you are not willing to consent to have your voice recorded during class, you will need to keep your mute button activated and communicate exclusively using the "chat" feature, which allows students to type questions and comments live.

13 / 21

Schedule

Start End Topic
Wednesday, January 25, 2023 Sunday, February 05, 2023 Chapter 1 - Intro to Data, R, and RStudio
Monday, February 06, 2023 Sunday, February 19, 2023 Chatper 2 - Summarizing Data
Monday, February 20, 2023 Sunday, February 26, 2023 Chapter 3 - Probability
Monday, February 27, 2023 Sunday, March 05, 2023 Chapter 4 - Distributions
Monday, March 06, 2023 Sunday, March 12, 2023 Chatper 5 - Foundation for Inference
Wednesday, March 15, 2023 Sunday, March 19, 2023 Midterm
Monday, March 13, 2023 Sunday, March 19, 2023 Chapter 6 - Inference for Categorical Data
Monday, March 20, 2023 Sunday, March 26, 2023 Chapter 7 - Inference for Numerical Data
Monday, March 27, 2023 Sunday, April 23, 2023 Chapter 8 - Linear Regression
Monday, April 24, 2023 Sunday, May 07, 2023 Chapter 9 - Multiple & Logistic Regression
Monday, May 08, 2023 Tuesday, May 16, 2023 Intro to Bayesian Analysis
Wednesday, May 17, 2023 Sunday, May 21, 2023 Final Exam
14 / 21

Textbooks

Diez, D.M., Barr, C.D., & Çetinkaya-Rundel, M. (2019). OpenIntro Statistics (4th Ed).

This will be our primary textbook for most of the semesters. Our goal is to cover all the chapters.

Open Intro Statistics

Navarro, D. (2018, version 0.6). Learning Statistics with R

This textbooks has a chapter on Bayesian analysis that we will use at the end of the semester.

Learning Statistics with R

15 / 21

Assignments

  • Participation (10%)
  • Labs (35%)
    • Labs are designed to introduce to you doing statistics with R.
    • Answer the questions in the main text as well as the "On Your Own" section.
  • Data Project (30%)
    • This allows you to analyze a dataset of your choosing. Projects will be shared with the class. This provides an opportunity for everyone to see different approaches to analyzing different datasets.
  • Exams
    • Midterm (10%)
    • Final exam (15%)
16 / 21

Communication

17 / 21

Software

This is an applied statistics course so we will make extensive use of the R statistical programming language. You have two options for using R in this course:

You will also need to have LaTeX installed as well in order to create PDFs. The tinytex R package helps with this process:

install.packages('tinytex')
tinytex::install_tinytex()
18 / 21

DATA 606 Package

The DATA606 R package contains many data sets and functions we will use throughout the semester. It also has a startLab function that will copy each of the labs to your current working directory. Use the following commands to install the package (only necessary once per R installation):

remotes::install_github('jbryer/DATA606')

To start the first lab...

DATA606::startLab('Lab1')

This will copy the R markdown file and any supporting files to your current working directory. Use the "Knit" button in R Studio to build a PDF of the document.

19 / 21

Next steps...

Before Monday (January 30th):

Then:

  • Start Lab 1 (due February 5th)
20 / 21
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow