Introduction to statistical concepts part III

01/28/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Observational studies versus experiments
- Methods of sampling
- Types of observational studies
- Types of experimental design

Methods of sampling

We have begun to consider the differences between a sample and population.
The motivation of sampling is to represent the population with a smaller collection of data points, the sample.
However, if the data is not collected in methodological way, it the sample may grossly mis-represent the population.
- Loosely speaking, this is what is meant by bias in a sample.
We will now consider in detail how we can methodically choose samples, to reduce the effects of bias.

Observational studies versus experiments

We typically have data that can be categorized into on of the two following types of data:
- Observational studies – in an observational study, we observe and measure specific characteristics, but we don’t attempt to modify the subjects being studied.
- Experiments – in an experiment, we apply some treatment and then proceed to observe its effects on the subjects. (Subjects in experiments are called experimental units.)
The differences between the two types of data can be easily seen considering a clinical trial.
- Suppose we want to determine the effectiveness of a hair growth drug in a clinical trial.
- One group will be given a treatment of the new drug.
- A control group will be given a placebo, as a control.
- We will try to measure the effect of the treatment by the difference in the treatment and control group.
An observational version of this study might take the following form:
- Ten years after the hair growth drug is released, we examine rates of adult baldness, without having modified any of the subjects.
- Examining if the population trends have changed, i.e., rates of adult baldness have been reduced, we will try to conclude if the drug has had an effect.
A major difference between the types of observations is that observational studies do not have a way to control for non-measured variables that may have an effect on the study.
- A variable that has an effect on the outcome that is not part of the study is called a “lurking” or “latent” variable .

Observational studies versus experiments

Discuss with a neighbor: suppose a poll is given to UNR students about the quality of UNR food services. Is this poll an observational study or an experiment?
- This is an observational study because the data is collected without modifying the behavior or applying a treatment to the subjects.
Discuss with a neighbor: suppose UNR wants to examine student satisfaction with possible menu changes in its food services. One group of students is given the current menu, while another group is given a new menu, and the satisfaction of each student is recorded for a month. Is this study an observational study or an experiment?
- This is an experiment because there is a group of students given the treatment of a new menu, while the control group uses the same menu.
- By examinig the differences between the treatment and the control, we can try to measure the effect of the new menu.

Random Sampling

We have seen that voluntary sampling is flawed because it leads to certain groups (with strong feelings about the questions) to be highly represented while other groups (who may not care) to have limited representation.
Put another way, one group has a higher probability of responding than other groups.
This is the motivation for random sampling…
- we can try to make certain that the probability of getting any group of responses is the same as any other group.
Simple random sample: a simple random sample of \( n \) subjects is selected in such a way that every possible sample of the same size \( n \) has the same chance of being chosen.
- Suppose we are taking a poll of UNR students and we will have a sample size of 1000.
- This means, every possible combination 1000 UNR students is must be equally likely to be selected based on our sampling method.
- E.g., we can randomly select students to give interview responses based on their student ID numbers.
- Note: a simple random sample is often called a random sample, but strictly speaking, a random sample has the weaker requirement that all members of the population have the same chance of being selected.

Systematic sampling

Diagram of every 3rd individual in a population being sampled.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

Systematic sampling: systematic sampling, we select some starting point and then select every kth (such as every 50th) element in the population.
For example, a grocery store decides to give a poll to every 3rd customer who enters in a day.
This can be effective when the order of people entering doesn’t hide some pattern – we have no reason to believe that the 2nd, 5th, 7th etc… person would be different from this sequence in a systematic way.
This is different from a simple random sample because some groups of 4 out of 12 people do not have equal probability to be selected as group consisting of the the 3rd, 6th, 9th and 12 people.
For instance, the 1st, 2nd, 3rd and 4th entrants to the grocery store have zero probability to become a sample based on this rule.

Stratified sampling

Diagram of random sampling within political parties.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

Stratified sampling: we subdivide the population into at least two different subgroups (or strata).
Groups are chosen such that subjects within the same subgroup share the same characteristics (such as political party).
We then draw a sample from each subgroup (or stratum).
This technique works best when all members of the population belong to one and only one of the strata.

This allows us to keep the sample balanced with respect to the groups, where we can randomly sample each strata proportionately to their percentage of the population.
In the political party example, we need to make sure every member of the population is in one and only one group.
In a case like this, it can make sense to have groups such as “Democrat”, “Republican”, “Third Party” and “Unaffiliated ” so that we fulfill this requirement.

Cluster sampling

Diagram of every student within certain class sections being sampled -- selction of the classes are random.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

Cluster sampling: we first divide the population area into sections (or clusters).
Then we randomly select some of those clusters and choose all the members from those selected clusters.
This type of sampling works well when members within each cluster are not homogeneous, but the clusters should be relatively homogeneous between each other.
That is, each cluster should be a small-scale representation of the entire population.
In this example we randomly select classes at UNR. Each class section is a cluster.
We obtain responses from all students in the randomly selected classes.

This method of sampling would work in the class section example if we selected classes from a list of general requirements.
In this case, students in the classes would be from all majors and thus heterogeneous.
However, because students are required to take all of the classes in the list, the clusters are homogeneous with respect to other clusters.
The example from Triola is a case where this wouldn’t work well, unless all majors were required to take Architecture, Art History, Biology, Zoology, etc…
Otherwise, e.g., the clusters of Art History and Zoology could not be homogenous with respect to each other.

Multistage sampling methods

Often for complex, heterogenous populations like the United States, a multistage design is implemented to get a sample that can reflect the highly complex population with a small sample.
For example, the U.S. government’s unemployment statistics are based on surveyed households .
It would be very difficult to visit each member of a simple random sample, because individual households are spread all over the country – there is a practical limitation of where sampling can be concentrated.
The U.S. Census Bureau and the Bureau of Labor Statistics instead collaborate to conduct a survey called the Current Population Survey. This is sampled in the following steps:

The entire United States is partitioned into 2025 different regions called primary sampling units (PSUs) including metropolitan areas, large counties, or combinations of smaller counties. These 2025 PSUs are then grouped into 824 different strata.
In each of the 824 different strata, one PSU is selected in a way such that the probability of selection is proportional to the size of the population in each primary sampling unit.
In each of the 824 selected PSUs, the housholds are broken up into census enumeration disctricts of about 300 housholds – enumeration districts are then randomly selected.
Finally, about 4 households in each enumeration district are selected randomly.

This technique actually utilizes stratified, cluster and random sampling at different stages in the process.
This process ensures that the sample will reflect the spatially heterogenous population while it is practical to send interviewers to fewer, concentrated locations.

Examples of sampling methods

Discuss with a neighbor: what sampling method is being used in each of the following examples?
- Twitter poll – In a Pew Research Center poll, 1007 adults were called after their telephone numbers were randomly generated by a computer, and 85% of the respondents were able to correctly identify what Twitter is.
- This is random sampling of users with telephone numbers, where everyone with a telephone is equally likely to be selected .
- However, we are assuming that the probability of someone answering their phone is equal across variables of interest.
- Ecology – When collecting data from different sample locations in a lake, a researcher usesthe “line transect method” by stretching a rope across the lake and collecting samples at every interval of 5 meters.
- This is systematic sampling of the lake, taking observations at fixed intervals.
- Dictionary – Suppose we are studying the average number of letters in words in the English language. We collect sample data by randomly selecting 20 different pages from a printed version the English dictionary and then count the total number of letters for words defined in each of those pages.
- This is cluster sampling, because we select clusters for which the sub-population within the cluster is heterogeneous (words within a given page have many different lengths) but are homogeneous with respect to each other . (each page is similar enough to each other page).
- Public sentiment poll – the City of Reno designs a sample of Reno adult residents in which all participants are broken into age groups: 18 - 25, 26 - 35, 36 - 45, 46 - 55, 56 and older. Individuals are selected randomly from these age groups, with total number of participants from each group selected proportionally to the percent of adult residents of this age in Reno.
- This is stratisfied sampling because every adult resident in Reno can be placed within one and only one group, and we balance the sample by random, proportional draws within each strata .

Types of observational studies

Diagram observational studies: retrospective studies have observations from the past; cross-sectional studies have observations from one time-instance; prospective studies have observations taken at future time points.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

Retrospective (or case-control) study: data are collected from a past time period by going back in time (through examination of records, interviews, and so on).

Good example: studying reported cases of the outbreak and spread of a rare disease after the disease has become under control.

Cross-sectional study: data are observed, measured, and collected at one point in time, not over a period of time.

Good example: A poll is given once to decide on the name of a new building on campus.

Prospective (or longitudinal or cohort) study: data are collected in the future from groups that share common factors (such groups are called cohorts).

Good example: a group of current smokers and non-smokers are selected and measurements of their health are taken for the next 10 years.

Examples of observational studies

Discuss with a neighbor: what kind of observational study is being discussed in the example?
- Nurses’ Health Study – the Nurses’ Health Study was started in 1976 with 121,700 female registered nurses who were between the ages of 30 and 55. The subjects were surveyed in 1976 and every two years thereafter. The study is ongoing.
- This is a prospective, longitudinal or cohort study, following the cohort of the nurses with observations after the beginning of the study.
- Smoking Study – researchers from the National Institutes of Health want to determine the current rates of smoking among adult males and adult females. They conduct a survey of 500 adults of each gender.
- This is a cross-sectional study because it only uses observations from a single time point.
- Drinking and Driving Study – in order to study the seriousness of drinking and driving, a researcher obtains records from past car crashes. Drivers are partitioned into a group that had no alcohol consumption and another group that did have evidence of alcohol consumption at the time of the crash.
- This is a retrospective survey using observations over a series of past events.

Designs of experiments

Diagram of clinical trial in which women alone are in the treatment group and men alone are in the control group.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

Experimental studies have the benefit of applying a treatment to distinguish its effect from a placebo, observing differences with the control group.

However, there are many ways experiments can be made meaningless from poor design.

For example, suppose that we are studying the effect of the (hypothetical) hair growth drug from earlier.
Suppose that the treatment group consists entirely of women.
Suppose that the control group consists entire of men.
This would be a very poorly designed experiment:

If there is reason to believe that physiological differences would change the results of a treatment, the control group and the treatment group should be balanced across this variable .

There is a long history of biased medical studies in the 20th century that have practiced poor experimental design in a similar way.
Indeed, many studies were performed on male subjects alone, yet conclusions were geralized to the entire population, including women .

Designs of experiments – continued

Rather than try to exhaustively list ways experiments can go wrong, we will introduce elements of good experimental design.
Three essential elements of good design are the following:

Randomization;
Replication; and
Blinding.

We will consider these elements in the context of a historical example, the Salk Vaccine Experiment.
In 1954, a large-scale experiment was designed to test the effectiveness of the Salk vaccine in preventing polio.
Treatment group – 200,745 children were given a treatment consisting of Salk vaccine injections.
Control group – 201,229 children were injected with a placebo that contained no drug.
The children being injected did not know whether they were getting the Salk vaccine or the placebo.
Children were assigned to the treatment or placebo group through a process of random selection.
Treatment group – 33 later developed paralytic polio.
Control group – 115 later developed paralytic polio.
Elements of good experimental design helped to determine. if it would be surprising that the smaller number of cases in the treatment group was due to random chance .
We will discuss each of these elements in the following.

Randomization

The 401,974 children in the Salk vaccine experiment were assigned to the Salk vaccine treatment group or the placebo group via a process of random selection equivalent to flipping a coin.
You can encode wheter someone is in the treatment or control group in a binary way:
- Treatment / 1 / H
- Control / 0 / T
Randomly drawing “Treatment” or “Control” with equal probability is equivalent to flipping a fair coin.
The logic behind randomization is to use chance as a way to create two groups that are similar.
With a large enough sample for both treatment and control groups, this can be very effective when it is difficult to balance factors like age, gender, height, weight, etc… across groups.
Chance is being utlized to balance the many population factors across the control and treatment groups.
- This makes it so that each sub-sample better reflects the full population.
Randomization can lead to unbalanced samples ;
- this happens typically when very small sample sizes are involved.

Replication

Replication – the repetition of an experiment on more than one subject.
Good experiments have enough subjects to recognize differences resulting from different treatments.
- The treatment might have different results based on individual factors that can be quite varied.
- However, if we look at a large enough sample it increases the chance of recognizing different treatment effects, and understanding what effects are likely or not.
A large sample is not necessarily a good sample in itself.
Although it is important to have a sample that is sufficiently large, it is even more important to have a sample in which subjects have been chosen in some appropriate way.
- A large number of unbalanced samples will still lead to biased conclusions .
We must both:
- use a sample size that is large enough to see the true nature of any effects, and
- obtain the sample using an appropriate method like randomness to balance the treatment and control groups.

Blinding

Blinding – this is when the subject doesn’t know whether they are receiving a treatment or are in the control group.
Blinding enables us to determine whether the treatment effect is significantly different from a placebo effect.
- The placebo effect occurs when an untreated subject reports an improvement in symptoms.
- The improvement can be real or imagined.
Blinding minimizes the placebo effect or allows investigators to account for it.
The polio experiment was double-blind, which means that blinding occurred at two levels:

The children being injected didn’t know whether they were getting the Salk vaccine or a placebo, and
the doctors who gave the injections and evaluated the results did not know either.

Codes were used so that the researchers could objectively evaluate the effectiveness of the Salk vaccine.

Controlling for variables

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

Good experimental design limits the effects of variables that are not measured directly in the experiment.
Ideally, for every clinical trial we would need twins in the treatment and control groups with identical medical histories so that we could distinguish between the effect of the treatment and the placebo.
In a realistic clinical trial, the best we can do is to have balanced populations in the treatment and control, so that we can understand the trends (estimate parameters) in the population .
Confounding variables – occur in an experiment when the investigators are not able to distinguish among the effects of different factors.
If we take the hypothetical hair growth drug with the:

control group with men alone; and
treatment group with women alone;

we could not distinguish if the differences between the outcomes between the treatment and control were due to either:

the treatment itself;
or due to significant differences between the control and treatment sub-samples.

This is another reason why this it a bad experimental design to have unbalanced groups.

Completely randomized design

Diagram of clinical trial in which women and men are randomly selected betweent the treatment and control group.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

With respect to our hypothetical example, randomly assigning participants to treatment and control groups balances the populations across factors.
If the sample size is small and the population is complex, this still cannot give a representative sample for the population.
When we have sufficiently many random participants in each group, we can be more assured that the experimental design controls for non-measured variables .
For example, if this trial lasts several years, there are diverse factors in the subjets that might have an effect which we cannot measure.
This might include exercise patterns, diets, consumption of tobacco, alcohol, etc…
With a large enough random sample, we can have some assurance that the control and treatment groups will be balanced over a wide variety of possible factors.

Randomized block design

Diagram of clinical trial in which women and men are selected into blocks and subsequently randomly selected betweent the treatment and control group.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

Randomized block design follows the same idea as stratisfied sampling in forming distinct strata or groups for the population
This can be important to implement when there are certain sub-groups in the population for which it is important to have proportional representation in the samples.

This is especially the case if we believe that the effect of the treatment might differ within each strata.
I.e., the strata that each subject belongs to is itself a variable of interest.

If we believe that the experimental drug might have different effects in men and women, we can define these as strata.
Then within each strata, we randomly assign participants the treatment and the placebo.

Matched pairs design

Diagram of clinical trial in which subjects are selected and matched with other subjets in the treatment and control group.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

Matched pair design follows the idea of having twins in experiments.
In this design, similar subjects (for example twins) are matched between the treatment and control groups.
This helps to distinguish between the treatment effect and placebo effect.
In some experiments, this can take the form of before/ after measurements.
However, if there isn’t enough heterogeneity between the matched pairs, we will still not represent the full population.
Finding sufficiently many closely matched pairs to represent the population makes this approach difficult .
Rigorously controlled design – this refers to when in the above, we try to carefully match the treatment and control groups with respect to variables of interest in the experiment.
Controlling for variables like this is extremely difficult in addition to the above, where we must find matched pairs and isolate all variables of interest.
This is not often possible, and randomization (if done well) can balance populations agains unforseen confounding variables.

Sampling errors

Imagine now that we are flipping a fair coin.
Suppose we take a sample of 4 flips and get 3 heads and 1 tails.
In our sample it may appear that it was a \( 75\% \) probability of getting heads and \( 25\% \) probability of getting tails.
- These sample statistics don't match the population parameters of \( 50\% \) for both outcomes.
If we flipped the coin 100 more times, our sample statistics will approach the true population parameter on average.
The random discrepancy between the sample statistic and the population parameter is known as sampling error.
A nonsampling error is the result of human error, including:

wrong data entries;
computing errors;
questions with biased wording;
false data provided by respondents;
forming biased conclusions;
or applying statistical methods that are not appropriate for the circumstances.

A nonrandom sampling error is the result of using a sampling method that is not random, such as using a convenience sample or a voluntary response sample.