Изменить стиль страницы

The ECLS measured the students’ academic performance and gathered typical survey information about each child: his race, gender, family structure, socioeconomic status, the level of his parents’ education, and so on. But the study went well beyond these basics. It also included interviews with the students’ parents (and teachers and school administrators), posing a long list of questions more intimate than those in the typical government interview: whether the parents spanked their children, and how often; whether they took them to libraries or museums; how much television the children watched.

The result is an incredibly rich set of data—which, if the right questions are asked of it, tells some surprising stories.

How can this type of data be made to tell a reliable story? By subjecting it to the economist’s favorite trick: regression analysis. No, regression analysis is not some forgotten form of psychiatric treatment. It is a powerful—if limited—tool that uses statistical techniques to identify otherwise elusive correlations.

Correlation is nothing more than a statistical term that indicates whether two variables move together. It tends to be cold outside when it snows; those two factors are positively correlated. Sunshine and rain, meanwhile, are negatively correlated. Easy enough—as long as there are only a couple of variables. But with a couple of hundred variables, things get harder. Regression analysis is the tool that enables an economist to sort out these huge piles of data. It does so by artificially holding constant every variable except the two he wishes to focus on, and then showing how those two co-vary.

In a perfect world, an economist could run a controlled experiment just like a physicist or a biologist does: setting up two samples, randomly manipulating one of them, and measuring the effect. But an economist rarely has the luxury of such pure experimentation. (That’s why the school-choice lottery in Chicago was such a happy accident.) What an economist typically has is a data set with a great many variables, none of them randomly generated, some related and others not. From this jumble, he must determine which factors are correlated and which are not.

In the case of the ECLS data, it might help to think of regression analysis as performing the following task: converting each of those twenty thousand schoolchildren into a sort of circuit board with an identical number of switches. Each switch represents a single category of the child’s data: his first-grade math score, his third-grade math score, his first-grade reading score, his third-grade reading score, his mother’s education level, his father’s income, the number of books in his home, the relative affluence of his neighborhood, and so on.

Now a researcher is able to tease some insights from this very complicated set of data. He can line up all the children who share many characteristics—all the circuit boards that have their switches flipped the same direction—and then pinpoint the single characteristic they don’t share. This is how he isolates the true impact of that single switch on the sprawling circuit board. This is how the effect of that switch—and, eventually, of every switch—becomes manifest.

Let’s say that we want to ask the ECLS data a fundamental question about parenting and education: does having a lot of books in your home lead your child to do well in school? Regression analysis can’t quite answer that question, but it can answer a subtly different one: does a child with a lot of books in his home tend to do better than a child with no books? The difference between the first and second questions is the difference between causality (question 1) and correlation (question 2). A regression analysis can demonstrate correlation, but it doesn’t prove cause. After all, there are several ways in which two variables can be correlated. X can cause Y; Y can cause X; or it may be that some other factor is causing both X and Y. A regression alone can’t tell you whether it snows because it’s cold, whether it’s cold because it snows, or if the two just happen to go together.

The ECLS data do show, for instance, that a child with a lot of books in his home tends to test higher than a child with no books. So those factors are correlated, and that’s nice to know. But higher test scores are correlated with many other factors as well. If you simply measure children with a lot of books against children with no books, the answer may not be very meaningful. Perhaps the number of books in a child’s home merely indicates how much money his parents make. What we really want to do is measure two children who are alike in every way except one—in this case, the number of books in his home—and see if that one factor makes a difference in his school performance.

It should be said that regression analysis is more art than science. (In this regard, it has a great deal in common with parenting itself.) But a skilled practitioner can use it to tell how meaningful a correlation is—and maybe even tell whether that correlation does indicate a causal relationship.

So what does an analysis of the ECLS data tell us about school-children’s performance? A number of things. The first one concerns the black-white test score gap.

It has long been observed that black children, even before they set foot in a classroom, underperform their white counterparts. Moreover, black children didn’t measure up even when controlling for a wide array of variables. (To control for a variable is essentially to eliminate its influence, much as one golfer uses a handicap against another. In the case of an academic study such as the ECLS, a researcher might control for any number of disadvantages that one student might carry when measured against the average student.) But this new data set tells a different story. After controlling for just a few variables—including the income and education level of the child’s parents and the mother’s age at the birth of her first child—the gap between black and white children is virtually eliminated at the time the children enter school.

This is an encouraging finding on two fronts. It means that young black children have continued to make gains relative to their white counterparts. It also means that whatever gap remains can be linked to a handful of readily identifiable factors. The data reveal that black children who perform poorly in school do so not because they are black but because they tend to come from low-income, low-education households. A typical black child and white child from the same socioeconomic background, however, have the same abilities in math and reading upon entering kindergarden.

Great news, right? Well, not so fast. First of all, because the average black child is more likely to come from a low-income, low-education household, the gap is very real: on average, black children still are scoring worse. Worse yet, even when the parents’ income and education are controlled for, the black-white gap reappears within just two years of a child’s entering school. By the end of first grade, a black child is underperforming a statistically equivalent white child. And the gap steadily grows over the second and third grades.

Why does this happen? That’s a hard, complicated question. But one answer may lie in the fact that the school attended by the typical black child is not the same school attended by the typical white child, and the typical black child goes to a school that is simply . . . bad. Even fifty years after Brown v. Board, many American schools are virtually segregated. The ECLS project surveyed roughly one thousand schools, taking samples of twenty children from each. In 35 percent of those schools, not a single black child was included in the sample. The typical white child in the ECLS study attends a school that is only 6 percent black; the typical black child, meanwhile, attends a school that is about 60 percent black.