01/23/2020
Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
The following topics will be covered in this lecture:
After prepaing the data, we use mathematical techniques to analyze the data.
We should remember that there are many of common errors in the process of data analysis that can dramatically affect the conclusions of a statistical inquiry.
A non-exhaustive list includes:
Percentages should be treated carefully – we will refresh on the approapriate usage of percentages and their meanings:
Percent – comes from Latin, meaining per 100 or “per centum”.
Percentage of: To find a percentage of an amount, we drop the % symbol and divide the percentage value by 100, then multiply. For example :
\[ 6\%\text{ of } 1200 \text{ responses } = \frac{6}{100} \times 1200 = 72 \]
\[ 0.25 \times 100\% = 25\% \]
\[ \frac{3}{4} = 0.75 \rightarrow .75 \times 100\% = 75\% \]
\[ \begin{align} \frac{85\%}{100\%} = 0.85 \end{align} \]
We must be more careful when we discuss the notion of percent change. Language about percent change is often used in measleading or incorrect ways, and it takes more care to analyze these statements.
Percent change – suppose our starting value is \( X_1 \) and this changes over time to the value \( X_2 \). We assume that \( X_1 \) and \( X_2 \) refer to physical quantities, so that \( X_1 , X_2 \geq 0 \), and \( X = 0 \) refers to none of some physical quantity.
Let's suppose that \( X_2 \geq X_1 \), so that the change is an increase. In this case, we can compute the percent increase as the difference of \( X_2 \) and \( X_1 \), relative to the orignal value \( X_1 \), converted to percent units:
\[ \text{percent increase} = \frac{X_2 - X_1}{X_1} \times 100 \% \]
Discuss with a neighbor: if \( X_2 \geq X_1 \), what possible values can a percent increase take? Particularly, what are the smallest and largest values?
\[ \frac{X_2 - X_1}{X_1} \times 100\% = \frac{ 0 }{X_1} \times 100\% = 0\% \]
\[ \frac{11 - 1}{1} \times 100\% = \frac{ 10 }{1} \times 100\% = 1000\%. \]
Percent change – suppose our starting value is \( X_1 \) and this changes over time to the value \( X_2 \). We assume that \( X_1 \) and \( X_2 \) refer to quantities, so that \( X_1 , X_2 \geq 0 \), and \( X = 0 \) refers to none of some quantity.
However, let's suppose that \( X_2 \leq X_1 \) so that the change is a decrease.
In this case, we can compute the percent decrease as the difference of \( X_1 \) and \( X_2 \), relative to the orignal value \( X_1 \), converted to percent units:
\[ \text{percent decrease} = \frac{X_1 - X_2}{X_1} \times 100 \% \]
Discuss with a neighbor: if \( X_2 \leq X_1 \), what possible values can a percent decrease take? Particularly, what are the smallest and largest values?
\[ \frac{X_2 - X_1}{X_1} \times 100\% = \frac{ 0 }{X_1} \times 100\% = 0\% \]
\[ \frac{X_1 - 0}{X_1} \times 100\% = \frac{X_1}{X_1} \times 100\% = 100\%. \]
A Gallup poll of 1018 adults reported that 39% believe in evolution. We notice that using the rules of percentages, we find that accordingly:
\[ .39 \times 1018 = 397.02 \]
respondents said they believed in evolution.
Discuss with a neighbor: can this number be correct? Did we make a mistake?
\[ \frac{397}{1018} \approx 0.3899804, \text{ so the true percent is } \approx 0.3899804 \times 100\% \approx 38.99804\%; \]
\[ \frac{398}{1018} \approx 0.3909627, \text{ so the true percent is } \approx 0.3909627 \times 100\% \approx 39.09627\%; \]
This is a relatively minor way in which stating \( 39% \) of respondents was misleading – we cannot tell which way the value was rounded, or therefore whether it was exactly 397 or 398 participants.
The difference of 397 or 398 responses isn't really important in this context. However, many times percentages are used in much more misleading ways.
An ad for Big Skinny wallets included the statement that one of their wallets “reduces your filled wallet size by 50%–200%.”
“Do you support the development of atomic weapons that could kill millions of innocent people?”
It was reported that 20 readers responded and that 87% said “no,” while 13% said “yes.”
Discuss with a neighbor: can you identify four major issues with this survey?
Here are four examples:
\[ 0.87 \times 17.4 = 17.4 \]
respondents said “no,” while
\[ 0.13 \times 20 = 2.6 \]
respondents said “yes.”
We have begun to consider the differences between a sample and population. Discuss with a neighbor : what is the difference between a sample and a population?
We will introduce two new definitions related to samples and populations:
Discuss with a neighbor: suppose we want to find out the average age of students at UNR.
In many cases, such as if we studied the average age of all people living in the USA, we don't know and have no effective way to compute the parameter exactly.
Parameters are usually much more uncertain for this reason, and we generally must estimate parameters using statistical methodology.
If we use good methodology, we can provide good estimates of the population parameters, with estimates of how certain or uncertain we are about the value.
We will make an important distinction between data which we call quantitative and qualitative data.
Discuss with a neighbor: what is qualitative data? Can you provide an example of a piece of qualitative data? How is this distinguished from quantitative data?
Quantitative data most often carries some additional descriptors.
For example, quantitative data often caries a unit of measurement, which we should include in our analysis and discussion.
We can also distinguish whether the quantitative data is in continuous or discrete units.
Discuss with a neighbor: can you give an example of a continuous unit of measurement? Can you give an example of a discrete unit of measurement? What is the difference?
Nominal level of measurement – this data consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme (such as low to high) and mathematical operations have no meaning.
Discuss with a neighbor: can you give two examples of nominal leveled data, that is data without a natural order or mathematical operation?
Ordinal level of measurement – this data can can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless.
A simple example of ordinal level data can seen in food – suppose we have salsa labeled as “MILD”, “MEDIUM” and “SPICY”.
Discuss with a neighbor: using the definition above, why are these labels of ordinal level of measurement?
Rankings are ordinal leveled data.
Suppose we are asked to rank our first four favorite musicians – there is a natural order here between 1st, 2nd, 3rd and 4th place.
However, if we subtract \( 4\text{th} - 3\text{rd} \), does this mean we get first place?
If we take the average of the rankings, \[ \frac{1 + 2 + 3 + 4}{4} = 2.5 \] does this average have any meaning?
There aren't meaningful mathematical operations to be performed.
Interval level of measurement – this data can be arranged in order, and differences between data values can be found and are meaningful.
Temperature in degrees Celsius is a basic example of an interval measurement.
We can meaningfully order \( 10^\circ > -3^\circ \), and take the difference \( -3^\circ - 10^\circ \).
However, the value \( 0^\circ \) is arbitrary because it doesn't correspond to the physical quantity of heat – that is \( 0^\circ \) doesn't mean the absence of heat in the Celsius scale (as opposed to e.g., Kelvin scale).
For this reason, ratios don't have a consistent meaning in the Celsius (or Fahrenheit) scale.
Ratio level of measurement – this data can be aranged in order, differences can be found and are meaningful, and there is a natural zero starting point (where zero indicates that none of the quantity is present).
The presence of the natural zero corresponding to none of the quantity solves the issues we saw with Celsius – for data at this level, differences and ratios are both meaningful.
From an earlier example, the size of the wallet measured in \( cm^3 \) is data at the ratio level of measurment.
A wallet of size \( 0cm^3 \) corresponded to no wallet, which is why the percent decrease of 100% or greater was nonsense.
Discuss with a neighbor: can you identify data that is at the ratio level? Identify what is the natural zero corresponding to none of the quantity.