The Common Core Standards are educational standards for K-12 education, especially but not only in the United States. The Standards have been adopted by most states. The Standards cover English Language Arts and Mathematics. One content area within the latter is that of statistics and probability, on both middle school and high school levels. This post briefly introduces some concepts in the high school statistics and probability standards.
The Common Core Context
On the Common Core website, the high school standards are presented as consisting of four topic areas plus a list of mathematical practices. Those practices (e.g., “reason abstractly and quantitatively,” “attend to precision”) will hopefully develop during the process of working with the four topic areas. This post does not focus on that list of mathematical practices.
The website offers PDF documents presenting the Common Core Standards, including one document for the mathematics standards. The following paragraphs cite pages within that document, which is referred to here as simply the Mathematics Standards (MS).
Following two introductory pages (MS pp. 79-80), the Mathematics Standards provide several pages (MS pp. 81-83) that elaborate somewhat on the four topic areas mentioned above. Those topic areas are Interpreting Categorical and Quantitative Data, Making Inferences and Justifying Conclusions, Conditional Probability and the Rules of Probability, and Using Probability to Make Decisions.
Under each of those four topic headings, the Standards name two or three general objectives. For instance, the first general objective listed under the first topic area (Interpreting Categorical and Quantitative Data) is to “summarize, represent, and interpret data on a single count or measurement variable.” And then, under each of those general objectives, the Standards identify specific desired abilities. So, for example, within that general objective of summarizing, representing, and interpreting data, the first specific ability is to “Represent data with plots on the real number line (dot plots, histograms, and box plots).”
According to the FAQs, the purpose of such specifications is to provide “clear goals for student learning,” in the form of “high standards that are consistent across states . . . aligned to the expectations in college and careers.” Among other things, it is anticipated that such specifications will facilitate development of textbooks “aligned to the Standards.” This would be helpful. To use the example just cited, there are many questions that can arise in the process of learning about histograms and box plots. It would not be possible to anticipate such details in a summary statement of objectives. It is thus likely that teachers who orient their teaching around the Standards will vary in their depth of treatment. Some may settle for brief exposure to the specified topics (e.g., “Which of these is a histogram?”), while others drill their students in multiple histogram construction scenarios. Perhaps the eventual dominance of certain widely recognized textbooks will help to reduce variability in treatment.
Not surprisingly, the Standards have numerous critics. This post does not attempt to identify and evaluate statistics-specific criticisms. What does seem appropriate, however, is to recognize that individual teachers will still have to interpret and apply their own concepts of what should be included within a given topic, as in the histogram example just noted.
Such judgments will depend upon, among other things, the amount of time available. On that matter, one might consult the Common Core document characterized as Appendix A: Designing High School Mathematics Courses Based on the Common Core State Standards. That document elaborates upon four model course pathways. For example, the Accelerated Traditional Pathway suggests that histograms are an appropriate topic for Unit 3 (Descriptive Statistics) within 8th Grade Algebra I (MS p. 103). That is, adoption of a course pathway may tend to dictate the amount of time that one can devote to histograms. Here, again, the development and adoption of textbooks keyed to specific pathways may be helpful. A search suggests that educators are struggling with this need in various ways. For example, one site indicates that certain people have been working in or with the Utah State Office of Education to design their own Standards-aligned math e-book. At this writing, a search of Utah’s database led to a list of ten different textbooks, most priced above $65, but apparently the ebook project was still in development.
At any rate, at least within the Appendix A pathways, statistics will apparently tend not to be a course unto itself, but will rather be integrated into algebra or other courses. So while descriptive statistics appears in the 8th Grade Algebra I example just cited, inference might not come up until High School Algebra II (MS p. 14). In other words, even when the textbook situation has matured, there will apparently not be many Common Core-aligned statistics textbooks; there will just be statistics chapters within algebra and other math texts.
It is probably unnecessary to offer, here, a textual summary of the objectives and abilities listed in the Common Core Standards for statistics and probability; the Standards themselves are fairly brief and straightforward, and there are endless materials in books and online, for those who want depth. The remainder of this post thus seeks only to provide some introductory impressions pertaining to the four topic areas listed above, supplemented with a few content illustrations.
The first question is, what is statistics? Wikipedia says it is “the collection, organization, analysis, interpretation, and presentation of data.” The Merriam-Webster Dictionary says, similarly, it is “a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data.” Given what everybody already knows – that it involves piles of numbers – I propose that statistics is the study of what has happened, and what is going to happen. (Note that this is very different from the question of what is or should be covered in a course on statistics.)
Of course, statistics is a technical matter; but for the vast majority of students, its value is not in learning how to do numerical calculations. Indeed, as I note in another manuscript (p. 8), it has been suggested that many of the most common technical calculations are obsolete and should not be taught. Research has indicated, further, that students are quite unlikely to retain expertise in the details of statistical computation. What is more important, and seems more likely to stick, is numeracy, or mathematical (including statistical) literacy. In other words, to refine the previous paragraph, the study of introductory statistics is a process of learning how to use numbers and other tools to define certain terms that are helpful in determining what has happened, and what is going to happen.
An example may make that clearer. I can define or explain the mean (commonly called the average) using words: sum up this, count that, divide one by the other, etc. But people may find it easier to work through a numerical example (e.g., 3 + 4 + 5 = 12; 12 ÷ 3 = 4) and to remember the process that way. So it is possible, in effect, to define the mean by using numbers rather than words.
Much of statistics can be boiled down to three large concepts: description, inference, and probability. Contrary to what many textbooks may seem to say, description is not distinct from inference or probability. Rather, these three large concepts overlap and interact. As detailed in the manuscript just cited, descriptive tools pervade all aspects of statistics; statistics cannot proceed without them. Statistics is descriptive; in some cases, it is also inferential. Inference is the process of drawing upon a limited amount of information to make an educated guess about phenomena that were too large, distant, or otherwise inaccessible to study directly. Probability is just the calculation of what people ordinarily call the odds, or or chances, or likelihood that something will happen.
Here, again, an example may help. I can’t study what everyone at my job thinks about the shirt I wore today. An attempt to do so would run into the kinds of problems that often arise in the study of research methods: there may be too many employees to ask within the time available; some may give a false answer in order to avoid offending me; and so forth. As a practical matter, I may just have to ask a couple of people and then draw inferences from what they say. So if I ask three people, and they all tell me that rhinestones are not appropriate on a shirt worn to my job at a funeral parlor, I will have to try to decide whether their answers are representative of the views of all employees of this funeral parlor, or of funeral parlor employees nationwide.
To illustrate the role of probability, suppose I ask 15 people whether they ate cereal for breakfast this morning, and all 15 say they did. Now, what are the chances that everyone I ask would have eaten cereal for breakfast on any given day? That seems very unlikely. So I might suspect that I have an unusual group here. Their answers about breakfast foods may not be very representative of the general public. In other words, my inferences about the public, and my description of the public, will depend upon probabilities – on calculations of what is most likely. Of course, probability can also be used for more abstract mathematical problems, such as the calculation of the odds of getting heads four times in a row, in four flips of a coin.
Those three large concepts – description, inference, and probability – capture much of what appears in the four Common Core areas named above. We use them to describe what has happened, and to predict what is going to happen. For instance, we can look at a line graph showing the teen unemployment rate in recent years:
This graph is a classic example of descriptive statistics. It tells us what happened. We can use it to support inferences about what is probable in the future. But the future in this graph is not too clear. Teen unemployment may be dropping after 2011, or it may just be pausing before climbing again. For purposes of prediction, it would be better if we could connect unemployment with some other factor. It may seem obvious that teen unemployment rose as a direct result of the Great Recession, beginning in late 2007, but it has also been suggested that the real culprit is the minimum wage:
Graphs like these, and other descriptive materials, appear very frequently, in all sorts of materials. An educated student will need to be fluent in such materials to understand much of what appears in the daily news.
Unlike description, inference tends to be useful only in certain kinds of situations – but it is extremely useful in those situations. It often begins with the identification of a representative sample. Sampling is necessary because it is often impossible to do a census – to ask absolutely everyone, that is, what they eat for breakfast or whether they enjoy their jobs. Researchers go to great lengths to find good samples. Of course, a sample consisting of just three people will be much less reliable than a sample of 200: all three could, by chance, have strange views or otherwise be unrepresentative of the larger population, but a group of 200 would probably contain a large number of typical people.
In inferential statistics, a core concept is the “standard deviation.” As noted above, a statistics course might be thought of as an introduction to the definition of certain terms, and this is one of the most important. The standard deviation is essentially a unit of measurement. It is somewhat like the “carlength.” Drivers are often told that they should allow one carlength of space, between their vehicle and the one ahead, for each 10 MPH of speed. So if they are going 30 MPH, they should be at least three carlengths behind the vehicle they are following. How much is a carlength? We don’t reduce it to a certain number of feet because nobody is going to go out on the highway and measure the distance in feet. Drivers need a quick rule of thumb, and it is relatively easy to apply this one: they can just look at the length of the car in front of them, and decide how many of those lengths they are allowing.
The standard deviation is not as easy to estimate as the carlength, but with today’s calculators and software it is pretty straightforward. As with the carlength, its value is that it can be used for comparisons. Here’s an example. Suppose that one group of students in a high school takes the ACT exam, to determine their suitability for college, and achieves an average score of 25, and another group takes the SAT and averages 1300. Which group did better? You can’t tell; the ACT and SAT scoring systems are like apples and oranges. But now suppose you have more information. Suppose that, overall, the thousands of people who take the ACT achieve an average score of 20, with a standard deviation of 5; and suppose the thousands who take the SAT have an average of 1000 and a standard deviation of 200. Now you can see that the ACT group’s average score of 25 is exactly one standard deviation above the mean of all ACT takers nationwide, whereas the SAT group’s average score of 1300 is more than one standard deviation above the mean of all SAT takers. The SAT group did better.
The great thing about such comparisons is that, as with the carlength, you don’t have to go out and measure feet. You don’t have to know anything about the SAT or ACT other than the mean and standard deviation. You can tell immediately that one group did better than the other. That’s valuable in all sorts of research. If somebody tells you that, on average, half of American adults eat cereal for breakfast, with a standard deviation of 2.5, then you can calculate immediately that the situation described above – 15 people asked, all 15 eating cereal – is three standard deviations above the expected mean of 7.5. And then, using a simple table found at the back of any statistics textbook, we can see that the probability of being three standard deviations above the mean is about one in a thousand. If it turns out that those 15 people are all eating the same cereal, you can tell your client, the cereal company, that they have finally achieved their winning cereal formula. People at that company are going to get rich.
This simple skill to identify significant changes is behind all sorts of research. A formula that makes infections heal one standard deviation quicker than average; a shock absorbent material that reduces the number of fatalities in head-on collisions by one-half standard deviation – a potentially simple calculation can help you decide whether this is just a fluke or whether, instead, you have found something remarkable. And you can do it without knowing anything about infections or auto crashes. Inferential statistics is just a part of modern research, but it is a powerful part.
I hope these brief remarks have sketched out an understandable sense of the description, probability, and inference components of the Common Core Standards. To emphasize, there are countless tutorials and other materials available online, for those desiring more depth in these and other concepts cited in those Standards.