Statistics -- The Science of Data

01/21/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Getting to know the class

  • For the next 2 minutes, interview someone sitting near you in class.

  • We will have small discussions throughout the course and you should get to know your neighbors.

  • You should be able to answer what their major of study is and how many years they have studied in the following poll.

Outline

  • The following topics will be covered in this lecture:

    • What is the subject of “statistics” and why do we study it?
    • What do we mean by “critical thinking” in statistics?
    • What is key vocabulary in statistics?
    • What is the process of a statistical investigation?

Introduction

  • Broadly, statistics consists of methods for:
    1. collecting;
    2. summarizing;
    3. analyzing; and
    4. interpreting data.
  • We are in a period of history where statistical methods are a diving force in society.
    • When we use internet devices, data is constantly collected in some form and is used to make business decisions.
      • Advertisements are often targeted to our interests based upon the data that is collected.
    • Polls and surveys are frequently used to understand and predict public opinions.
      • We frequently hear about public opinion polls in regards to upcoming elections for local and national offices.
    • As voters, we are presented with statistical data to inform our decisions on government policies, including economic forecasts and projections based on different choices.
      • We often must decide if a public project will have benefits that outweight the costs in, e.g., tax revenue.
  • Critical statistical thinking has become a basic literacy in how we interact with information in society.

An example

Histogram of voluntary computer virus poll.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

  • In the figure, America OnLine users were asked to respond to the question “Have you ever been hit by a computer virus?”
  • At a glance, it appears that about two to three times as many peopel have had a computer virus at some point versus those who haven’t.
  • However, this poll and the description of the data has several major flaws:
    1. The data vertical axis doesn’t start at zero, which makes the ratio look much greater than it actually is.
    2. The poll collected voluntary responses. Typically, people respond to voluntary polls only they have very strong feelings about the topic already.
    3. If we want to conclude a statement about all computer users, this sample population may not be representative… Who actually uses America OnLine anymore?
  • Understanding data collection, data interpretation, data presentation and how this impacts decisions based on the above is what we we call “critical statistical thinking”.
  • It is one of the primary goals that everyone can use critical statistical thinking at the end of this course.

Statistical Vocabulary

  • Discuss: can you describe what the word data might mean in statistics? Please provide an example of data.

    • Data – data are collections of observations, such as experimental measurements, survey responses, etc…
    • One example is the collection of all responses to an opinion poll to UNR students.
    • Another example is the collection of all measurements of temperature from a weather balloon given at one second intervals.
  • Discuss: can you describe what the word population might mean in statistics? Please provide an example of a population.

    • Population – the complete collection of all measurements (or possible-to-measure data points) being considered.
    • In the example of the opinion poll, the population is every UNR student, regardless of if they answered or not.
    • In the example of the weather balloon, the population is the temperature of the entire atmosphere, at all times even if we only measure certain locations at discrete times.
  • Discuss: can you describe what the word sample might mean in statistics? Please provide an example of a sample.

    • Sample – a sample is a subcollection of members selected from a population.
    • In the example of the opinion poll, the sample is the collection of UNR students who actually responded.
    • In the example of the weather balloon, the sample is the collection of temperature measurements at locations and times we have recorded.

The process of statistical thinking

  • Statistical thinking or a statistical inquiry has a natural progression.
    • Statistics always relies on some kind of data, which must be collected somehow.
    • Polls must be administered, weather balloons need to be released, etc…
  • The steps of statistical thinking can be loosely summarized as:
    1. Prepare;
    2. Analyze; and
    3. Conclude.
  • Each of these steps includes several elements which we will discuss in the following.

Prepare

  • In preparing a statistical study, we should consider the following:
    1. Context
      • What does the data mean?
      • What was the purpose of the study / data collection?
      • Is this data appropriate for our question of interest?

    2. Source of the data
      • Is the data from a source with a special interest?
      • Would there be pressure to obtain results that are favorable to the source?
      • E.g., health studies from the tobacco industry that denied the link between smoking and lung cancer.
    3. Sampling Method
      • Was the data collected in a way that was unbiased?
      • Are there reasons why the sample wouldn't reflect the population?
      • For example, are the answers all voluntary?
      • Do all participants self-select or are the participants selected methodically?
      • Are there reasons why certain segments of the population wouldn't respond to the poll?
      • In an experiment, was the measurement instrument used appropriate in this context?
      • Is there missing data or are there errors in the data?

Analyze

  • After prepaing the data, we use mathematical techniques to analyze the data.
  • Luckily, in these times computing power is cheap and very little of the analysis is done by hand.
  • In homework assignments, we will use StatCrunch to:
    1. Graph the data.
    2. Explore the data
      • Look at the “shape” of the data, e.g., outliers, many observations of the same value or few very extreme values.