Summarizing and graphing data

01/30/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • Characteristics of data
    • Frequency distributions
    • Histograms
    • Other kinds of good and bad plots

Lead and IQ example

Freqency table of IQ scores for sample data set, table 2-2 from textbook.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

  • We have data in which children who had either low or high lead exposure were given measurements for IQ scores.
  • If we want to see if lead has a significant effect on IQ, the table doesn’t tell us much.
  • Indeed, the table is very difficult to interpret and instead we would like to develop tools to summarize and characterize the data.

Characteristics of data

Diagram of the percent of outcomes contained within each standard deviation of the mean
for a standard normal distribution.

Courtesy of M. W. Toews CC via Wikimedia Commons.

  • We often use visual tools to understand samples and to simplify their analysis
  • We try to characterize data by a number of the features that it exhibits – these are some of the ways
    1. Center: A representative value that indicates where the middle of the data set is located.
    2. Variation: A measure of the amount that the data values vary.
    3. Distribution: The nature or shape of the spread of the data over the range of values (such as bell-shaped).
    4. Outliers: Sample values that lie very far away from the vast majority of the other sample values.
    5. Time: Any change in the characteristics of the data over time.
  • Understanding each of these features in data is essential to distinguishing different types of behaviors, and when different kinds of analysis are appropriate or not.
  • For example, the sample average is sensitive to outliers (extremely large or small values) which can move the sample average away from most of the data.
  • If we only consider the mean (average) without looking at other features, we will get an incomplete or misleading story from the data.
  • Our next topic will be how to use visual tools to understand and analyze the data.

Frequency distributions