Although I am a college professor, a great deal of my freelance writing involves working with the common core state standards. Most of this time, especially at the beginning, was spent trying to decipher exactly what skills the common core is after and how to best assess or address those skills.

A particularly tough to interpret group of standards are in the domain “making inferences and justifying conclusions“. These standards are focused on helping students develop that deep intuition with statistics based thinking. For example, a question like “a coin landed on tails 65 times out of 100 – is this enough to make us question if it is fair?” would be a part of this domain. All these standards require some really deep thinking on the part of students.

## HSS.IC.B.6

This standard states that students should be able to:

Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant.

Many online resources out there are interpreting this as meaning that students should be able to use tools such as a 2-sample-t-test to compare two populations. Personally, I think this completely missed the mark of this entire domain of standards. At this level, it isn’t that we are expecting high school students to apply hypothesis testing or confidence interval calculations formally. Instead, we want them to start thinking about the meaning behind these procedures before they see them formally presented at the college level or in an AP stats course. These types of ideas will help the students have a much better idea of the p-value and the whole process of hypothesis testing itself, once these are introduced.

## An Example

Let’s use a typical question that would be aligned to this standard as a discussion tool. The data for this question and the resulting histogram were all generated in R (see the bottom of the post for code).

Suppose that two researchers want to determine if high school students that are offered encouraging remarks complete a difficult task faster, on average, than those who aren’t. In order to test this, they select two random samples of 25 high school students each. The first group is asked to work on a difficult puzzle and offered no feedback as they work. The second group is asked to do the same but are also given encouraging comments such as “you almost got it” or “that’s a good idea” as they work. For the first group (no encouragement), the mean time to complete the puzzle was 28.1 minutes with a standard deviation of 6.7 minutes. For the second group, the mean time was 27.2 minutes with a standard deviation of 5.5 minutes.

In order to test the significance of this result, the researchers used a computer to randomly assign individual times to each group and then compute the new mean difference between the first and second groups. They then repeated this process 1,000 times and plotted all of the resulting differences on the plot below.

The question here might then ask students to determine if the observed difference between the means is statistically significant, or explain whether or not this should lead researchers to believe that those with encouragement will complete the task faster. Both deep/critical thinking types of questions that go beyond applying a formula.

Using the graph, we would hope that they would see that the observed difference of 28.1 – 27.2 = 0.9 minutes is within a range of values that is frequently observed when the groups are assigned randomly (it is not a rare difference – it came up a lot in simulation). Therefore, the experiment’s results are not statistically significant as they could be due to chance alone. Through resampling, they are able to see how the samples might behave if the differences WERE due to chance (as they were in the simulation).

As you can see, this type of question is indirectly having students think about a p-value and its implications without truly introducing these ideas formally. Certainly they could run a 2-sample-t-test or similar, but that would be robotic compared the critical thinking that the common core writers were hoping students would develop. The ultimate goal is to have students use computers or even physical simulation to understand uncertainty (such as using a special deck of cards, or even flipping coins) and as mentioned develop an intuition towards statistical thinking in general.

If you are finding yourself still trying to wrap your mind around this standard, you might find the following related articles interesting: Why Resampling is Better than Hypothesis Tests and Confidence Intervals which comments on a similar high school standard in New Zealand and Resampling Statistics which is an overview of techniques from East Carolina University.

#### R Code Used for This Example

#create the data for the two groups #sample means and standard deviations #were calculated from these groups no_encourage=rnorm(25,28.6,7.1) encourage=rnorm(25,27.1,6.4) #create the combined group group=c(encourage, no_encourage) #initialize difference vector diff=1:1000 >#resample for(i in 1:1000){ randomized=sample(group) new_no_encourage=randomized[1:25] new_encourage=randomized[26:50] diff[i]=mean(new_no_encourage)-mean(new_encourage) }