🃏 Permutation Test for Two Means
Introduction
We saw from the diagram created by Allen Downey that there is only one test! We will now use this philosophy to develop a technique that allows us to mechanize several Statistical Models in that way, with nearly identical code.
We will use two packages in R, mosaic
and the relatively new infer
package, to develop our intuition for what are called permutation
based statistical tests.
Hypothesis Testing using Permutation
From Reference #1:
Hypothesis testing can be thought of as a 4-step process:
State the null and alternative hypotheses.
Compute a test statistic.
Determine the p-value.
Draw a conclusion.
In a traditional introductory statistics course, once this general framework has been mastered, the main work is in applying the correct formula to compute the standard test statistics in step 2 and using a table or computer to determine the p-value based on the known (usually approximate) theoretical distribution of the test statistic under the null hypothesis.
In a simulation-based approach, steps 2 and 3 change. In Step 2, it is no longer required that the test statistic be normalized to conform with a known, named distribution. Instead, natural test statistics, like the difference between two sample means \(y1 − y2\) can be used.
In Step 3, we use randomization to approximate the sampling distribution of the test statistic. Our lady tasting tea example demonstrates how this can be done from first principles. More typically, we will use randomization to create new simulated data sets ( “Parallel Worlds”) that are like our original data in some ways, but make the null hypothesis true. For each simulated data set, we calculate our test statistic, just as we did for the original sample. Together, this collection of test statistics computed from the simulated samples constitute our randomization distribution.
When creating a randomization distribution, we will attempt to satisfy 3 guiding principles.
Be consistent with the null hypothesis. We need to simulate a world in which the null hypothesis is true. If we don’t do this, we won’t be testing our null hypothesis.
Use the data in the original sample. The original data should shed light on some aspects of the distribution that are not determined by null hypothesis. For example, a null hypothesis about a mean doesn’t tell us about the shape of the population distribution, but the data give us some indication.
Reflect the way the original data were collected.
From Chihara and Hesterberg:
This is the core idea of statistical significance or classical hypothesis testing – to calculate how often pure random chance would give an effect as large as that observed in the data, in the absence of any real effect. If that probability is small enough, we conclude that the data provide convincing evidence of a real effect.
Permutations tests using mosaic::shuffle()
The mosaic
package provides the shuffle()
function as a synonym for
sample()
. When used without additional arguments, this will permute
its first argument.
shuffle(1:10)
## [1] 8 9 3 2 4 1 6 7 5 10
Applying shuffle() to an explanatory variable in a model allows us to test the null hypothesis that the explanatory variable has, in fact, no explanatory power. This idea can be used to test
- the equivalence of two or more means,
- the equivalence of two or more proportions,
- whether a regression parameter is 0.
We will now see examples of each of these models using Permutations.
Testing for Two or More Means
Case Study-1: Hot Wings Orders vs Gender (From Chihara and Hesterberg)
A student conducted a study of hot wings and beer consumption at a Bar. She asked patrons at the bar to record their consumption of hot wings and beer over the course of several hours. She wanted to know if people who ate more hot wings would then drink more beer. In addition, she investigated whether or not gender had an impact on hot wings or beer consumption. Is the mean order value related to the gender of the person who is ordering?
##
## categorical variables:
## name class levels n missing distribution
## 1 Gender factor 2 30 0 F (50%), M (50%)
##
## quantitative variables:
## name class min Q1 median Q3 max mean sd n missing
## 1 ID integer 1 8.25 15.5 22.75 30 15.50000 8.803408 30 0
## 2 Hotwings integer 4 8.00 12.5 15.50 21 11.93333 4.784554 30 0
## 3 Beer integer 0 24.00 30.0 36.00 48 26.20000 11.842064 30 0
Let us calculate the observed difference in Hotwings
consumption
between Males and Females ( Gender
): (using the mosaic
package)
## F M
## 9.333333 14.533333
## diffmean
## 5.2
The observed difference in mean consumption of Hotwings between Males and Females is 5.2. There is also a “visible” difference in medians as seen from the pair of box plots above.
Could this have occurred by chance? Here is our formulation of the Hypotheses:
$$ H_0: _M = _F\
H_a: _M _F $$
Note that we have a two-sided test: we want to check for differences in mean order value, either way. So we perform a Permutation Test to check: we create a null distribution of the differences in mean by a shuffle
operation on gender
:
## diffmean
## 1 0.9333333
## 2 1.8666667
## 3 -2.2666667
## 4 2.0000000
## 5 2.8000000
## 6 -2.8000000
## prop_TRUE
## 0.001998002
The \(\color{red}{red\ line}\) shows the actual measured mean difference in Hot Wings consumption. The probability that our Permutation distribution is able to equal or exceed that number is \(0.001998002\) and we have to reject the Null Hypothesis that the means are identical.
Matched Pairs: Results from a diving championship.
Sometimes the data is collected on the same set of individual categories, e.g. scores by sport persons in two separate tournaments, or sales of identical items in two separate locations of a chain store. Here we have swimming records across a Semi-Final and a Final:
## Name Country Semifinal Final
## 1 CHEONG Jun Hoong Malaysia 325.50 397.50
## 2 SI Yajie China 382.80 396.00
## 3 REN Qian China 367.50 391.95
## 4 KIM Mi Rae North Korea 346.00 385.55
## 5 WU Melissa Australia 318.70 370.20
## 6 KIM Kuk Hyang North Korea 360.85 360.00
##
## categorical variables:
## name class levels n missing
## 1 Name factor 12 12 0
## 2 Country factor 8 12 0
## distribution
## 1 SI Yajie (8.3%) ...
## 2 Canada (16.7%), China (16.7%) ...
##
## quantitative variables:
## name class min Q1 median Q3 max mean sd n
## 1 Semifinal numeric 313.70 322.2000 325.625 356.575 382.8 338.500 22.94946 12
## 2 Final numeric 283.35 318.5875 358.925 387.150 397.5 350.475 40.02204 12
## missing
## 1 0
## 2 0
The data is made up of paired observations per swimmer. So we need
to take the difference between the two swim records for each swimmer
and then shuffle the differences to either polarity. Another way to look
at this is to shuffle the records between Semifinal
and Final
on a
per Swimmer basis. There are 12 swimmers and therefore 12 paired records.
In order to ensure that the records are paired
, we use the argument only.2=FALSE
in the diffmean
function:
## Name Country Semifinal Final
## 1 CHEONG Jun Hoong Malaysia 325.50 397.50
## 2 SI Yajie China 382.80 396.00
## 3 REN Qian China 367.50 391.95
## 4 KIM Mi Rae North Korea 346.00 385.55
## 5 WU Melissa Australia 318.70 370.20
## 6 KIM Kuk Hyang North Korea 360.85 360.00
## 7 ITAHASHI Minami Japan 313.70 357.85
## 8 BENFEITO Meaghan Canada 355.15 331.40
## 9 PAMG Pandelela Malaysia 322.75 322.40
## 10 CHAMANDY Olivia Canada 320.55 307.15
## 11 PARRATTO Jessica USA 322.75 302.35
## 12 MURILLO URREA Carolina Colombia 325.75 283.35
## 318.7-313.7 320.55-318.7 322.75-320.55 325.5-322.75 325.75-325.5
## 12.350 -63.050 5.225 85.125 -114.150
## 346-325.75 355.15-346 360.85-355.15 367.5-360.85 382.8-367.5
## 102.200 -54.150 28.600 31.950 4.050
## [1] 11.975
How would we formulate our Hypothesis?
$$
H_0: {semifinal} = {final}\
H_a: {semifinal} {final}\
$$
## [1] 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1
## mean
## 1 -9.583333
## 2 -17.816667
## 3 -1.791667
## 4 -12.883333
## 5 17.191667
## 6 8.050000
## prop_TRUE
## 0.1294487
Hmm…so by generating 100000 shufflings of score differences, with polarities, it does appear that we can not only obtain the current observed difference but even surpass it frequently. So it does seem that there is no difference in means between SemiFinal and Final swimming scores.
Walmart vs Target
Is there a difference in the price of Groceries sold by the two
retailers Target and Walmart? The data set Groceries
contains a sample
of grocery items and their prices advertised on their respective web
sites on one specific day.
- Inspect the data set, then explain why this is an example of matched pairs data.
- Compute summary statistics of the prices for each store.
- Conduct a permutation test to determine whether or not there is a difference in the mean prices.
- Create a
histogrambar-chart of the difference in prices. What is unusual about Quaker Oats Life cereal? - Redo the hypothesis test without this observation. Do you reach the same conclusion?
## Product Size Target Walmart Units UnitType
## 1 Kellogg NutriGrain Bars 8 bars 2.50 2.78 8 bars
## 2 Quaker Oats Life Cereal Original 18oz 3.19 6.01 18 oz
## 3 General Mills Lucky Charms 11.50oz 3.19 2.98 11 oz
## 4 Quaker Oats Old Fashioned 18oz 2.82 2.68 18 oz
## 5 Nabisco Oreo Cookies 14.3oz 2.99 2.98 14 oz
## 6 Nabisco Chips Ahoy 13oz 2.64 1.98 13 oz
##
## categorical variables:
## name class levels n missing
## 1 Product character 30 30 0
## 2 Size character 24 30 0
## 3 Units character 16 30 0
## 4 UnitType character 3 30 0
## distribution
## 1 Annie's Macaroni & Cheese (3.3%) ...
## 2 18oz (10%), 12oz (6.7%) ...
## 3 10 (10%), 15 (10%), 16 (10%) ...
## 4 oz (93.3%), bars (3.3%) ...
##
## quantitative variables:
## name class min Q1 median Q3 max mean sd n missing
## 1 Target numeric 0.99 1.8275 2.545 3.140 7.99 2.762333 1.582128 30 0
## 2 Walmart numeric 1.00 1.7600 2.340 2.955 6.98 2.705667 1.560211 30 0
We see that the comparison is to be made between two prices for the
same product, and hence this is one more example of paired data
, as
in Case Study #4. Let us plot the prices for the products:
We see that the price difference between Walmart and Target prices is
highest for the Product
named Quaker Oats Life Cereal Original
. Let
us check the mean difference in prices:
## 1-0.99 1.22-1 1.42-1.22 1.49-1.42 1.59-1.49 1.62-1.59 1.79-1.62 1.94-1.79
## -0.580000 0.170000 0.210000 -0.100000 0.190000 0.070000 0.180000 0.160000
## 1.99-1.94 2.12-1.99 2.39-2.12 2.5-2.39 2.59-2.5 2.64-2.59 2.79-2.64 2.82-2.79
## 0.090000 0.010000 0.200000 0.600000 -0.200000 -0.600000 0.660000 0.040000
## 2.99-2.82 3.19-2.99 3.49-3.19 3.99-3.49 4.79-3.99 7.19-4.79 7.99-7.19
## 0.220000 1.263333 -1.183333 -0.480000 2.290000 2.190000 0.000000
## [1] -0.05666667
Let us perform the pair-wise permutation test on prices, by shuffling the two store names:
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
## [26] -1 -1 -1 -1 -1
## mean
## 1 0.05600000
## 2 0.19133333
## 3 -0.10600000
## 4 0.11133333
## 5 -0.00200000
## 6 -0.01533333
## [1] 0
Does not seem to be any significant difference in prices…
Suppose we knock off the Quaker Cereal data item…
## [1] 2
## Product Size Target
## 1 Kellogg NutriGrain Bars 8 bars 2.50
## 3 General Mills Lucky Charms 11.50oz 3.19
## 4 Quaker Oats Old Fashioned 18oz 2.82
## 5 Nabisco Oreo Cookies 14.3oz 2.99
## 6 Nabisco Chips Ahoy 13oz 2.64
## 7 Doritos Nacho Cheese Chips 10oz 3.99
## 8 Cheez-it Original Baked 21oz 4.79
## 9 Swiss Miss Hot Chocolate 10 count 1.49
## 10 Tazo Chai Classic Latte Black Tea 32 oz 3.49
## 11 Annie's Macaroni & Cheese 6oz 1.79
## 12 Rice A Roni Chicken 6.9oz 1.00
## 13 Zatarain's Jambalaya Rice Mix 8oz 1.62
## 14 SPAM Original Lunch Meat 12oz 2.79
## 15 Campbell's Chicken Noodle Soup 10.75oz 0.99
## 16 Dinty Moore Hearty Meals Beef Stew 15oz 1.99
## 17 Hormel Chili with Beans 15oz 1.94
## 18 Dole Pineapple Chunks 20 oz 1.59
## 19 Skippy Creamy Peanut Butter 16.3oz 2.59
## 20 Smucker's Strawberry Preserve 18oz 2.99
## 21 Heinz Tomato Ketchup 32oz 2.99
## 22 Near East Couscous Toasted Pine Nuts mix 5.6oz 2.12
## 23 Barilla Angel Hair Pasta 16oz 1.42
## 24 Betty Crocker Super Moist Chocolate Fudge Cake Mix 15.25oz 1.22
## 25 Kraft Jet-Puffed Marshmllows 16oz 1.99
## 26 Dunkin' Donuts Original Blend Medium Roast Ground Coffee 12oz 7.19
## 27 Dove Promises Milk Chocolate 8.87oz 3.19
## 28 Skittles 41oz 7.99
## 29 Vlasic Kosher Dill Pickle Spears 24oz 2.39
## 30 Vlasic Old Fashioned Sauerkraut 32oz 1.99
## Walmart Units UnitType
## 1 2.78 8 bars
## 3 2.98 11 oz
## 4 2.68 18 oz
## 5 2.98 14 oz
## 6 1.98 13 oz
## 7 2.50 10 oz
## 8 4.79 21 oz
## 9 1.28 10 count
## 10 2.98 32 oz
## 11 1.72 6 oz
## 12 1.00 6 oz
## 13 1.54 8 oz
## 14 2.64 12 oz
## 15 1.58 10 oz
## 16 1.98 15 oz
## 17 1.88 15 oz
## 18 1.47 20 oz
## 19 2.58 16 oz
## 20 2.84 18 oz
## 21 2.88 32 oz
## 22 1.98 5 oz
## 23 1.38 16 oz
## 24 1.17 15 oz
## 25 1.96 16 oz
## 26 6.98 12 oz
## 27 3.50 8 oz
## 28 6.98 41 oz
## 29 2.18 24 oz
## 30 1.97 32 oz
## [1] -0.1558621
## mean
## 1 -0.04551724
## 2 -0.07103448
## 3 0.10344828
## 4 0.10137931
## 5 0.02689655
## 6 -0.10344828
## [1] 0.01605
Conclusion
TBD
References
- Randall Pruim, Nicholas J. HortonDaniel T. Kaplan, Start Teaching with R