This session presents three interesting and well written papers that address different approaches to using statistical models to take a census. Indeed, they represent a tread that increasingly redefines how people think about census taking. The concept of a census as a one-by-one count of the population is increasingly being replaced by the concept of a census as a measure of the population size available for all groups and areas. What distinguishes a census from a survey is increasingly not a matter of methodology (sampling vs a count), but rather the level of detail for which the results are available (small groups, local areas, multiple cross tabulations). However, within this paradigm, the three papers take very different approaches.
Two of the papers, at least in their titles, discuss the issues surrounding a "one number census." This raises the issue of just what constitutes a one-number census. Obviously, it means one set of numbers, and perhaps one set of official numbers. One must ask, when Jean Taylon took the first census in North America, he presumably only produced one set of official numbers. Was this the first "One Number Census"?
I will save the Abbott et al. paper till last, and begin with the Crouse paper.
Crouse's title, is "Evaluating a One Number Approach to the Agricultural Census." The proposal discussed in his paper represents a major change in the methods used to produce the Census of Agriculture. It is not entirely clear to me, from reading the paper, but it seems that the proposal is to take the Department of Agriculture's official estimates program and the initial census results and, in a process that is less than clear, produce a new set of official estimates. These new estimates would then be combined with the census totals and produce a new "official" census. Crouse discusses how the census results might be calibrated to these new "official estimates." While this is an important problem, it is necessarily secondary to the issue of how the official estimates are prepared. It will be very important to document and specify these methods as the proposal is developed.
Within the scope of his principal focus calibration, Crouse does a good job of laying out his issues and research. My only suggestion here is that, perhaps, he gives too much importance to convergence and not enough to other criteria. One of the coauthors of the Abbott paper, Ray Chambers, has written a very interesting paper on "Intelligent Calibration" which is available in the proceedings from the ISI meetings in Helsinki this past summer.
The Dumais et al., paper discusses another new approach to census taking, the rolling or continuous census. I say new not because the idea is new. Kish was advocating this idea years ago. Now the French have really begun to implement it. Of course, in the United States, the American Community Survey (ACS) has implemented several important aspects of the idea. Let me turn, then, to the differences between French and the American designs.
The most important difference is that the French have replaced their periodic census with their continuous census. Because of constitutional requirements, the U.S. has not been able to replace the decennial census with the ACS. Because of this difference, the French approach emphasis measuring the count with the content somewhat secondary. The ACS focuses on the content. It is designed to replace the detailed sample questions from the census, the so called long form. Getting the count right will still be the purpose of the Decennial Census.
In terms of methods, the French can rely on good address lists maintained for other purposes, while much of the effort of the ACS comes from constructing and maintaining address lists. Also, the French has access to good local administrative records. The ACS will be benchmarked to population controls that are based in part on the Decennial Census and in part on administrative records. However, these records are at fairly aggregate levels of geography, and not local like the French system.
Both countries will survey and measure big places yearly. Focusing on the count, the French will use a simple expansion estimator for the large towns and cities. The U.S., as always, will use a complex estimator with multiple levels of survey controls.
For both systems, small places are defined remarkably similar: places of about ten thousand people for France and fifteen thousand for the U.S. The French will visit these places once every five years and then model and project the estimates for the intercensal and postcensal periods. The paper does a good job in explaining this approach. The U.S. will visit places yearly, and then sum or roll up the annual results into a rolling average.
Finally, the absolute superiority of the French approach is shown by the fact that they will take the summer off and go on vacation. Data collection is suspended for July and August. In the ACS, data collection will go on all year.
I will turn now to the Abbott paper. Again a very interesting and well written description of a well thought-through new approach to census taking. Again, what attracts my interest is the differences and similarity with the U.S. approach. At a certain level, both countries are taking a similar approach to measuring the population. First a complete enumeration is attempted. Then a coverage measurement survey measures and corrects for the coverage error from the first attempt. Both countries conduct a survey, followed by computer and then clerical matching. The dual system estimator (DSE) is used to estimate the population at higher levels, and as modeling is used to improve estimates at smaller levels.
Remarkably, given the differences in population size, both countries will be conducting a coverage measurement survey (CMS) of about 300,000 housing units. Now, we have all been taught that the required sample size is largely independent of the population or universe size. However, this never works out in practice, because the bigger the population, the more demand for estimates for subgroups. Both the U.S. and the U.K. are meeting this demand by using statistical models to predict the undercount for small groups and small areas. The British use a target sample cluster size of about 15 housing units, versus the U.S.'s 30 housing units. But in both countries, the actual cluster size vary widely, perhaps somewhat more so in the U.K. So the range of actual cluster sizes may not be so different.
There are important differences. Most remarkable to me is the timing. The British CMS begins three or four weeks after the census reference date. This minimized the number of people who might have moved in between. In the U.S., the CMS largely begins some fifteen weeks after Census Day. The proper treatment of movers is a major design issue in the US.
The British use team interviewing in the CMS. Indeed, they seem to depend on this approach for quality assurance. The British plan no after-matching field work. There is a great trust that the initial CMS interview was done right. In the U.S., we have a detailed CMS interview quality control program. Further, many problematic cases are sent for re-interview after matching. For example, all cases where the CMS and the census interviewed different households for the same housing unit are sent to follow-up.
Related to this is the British decision not to explicitly measure the level of erroneous inclusions in the census, that is, the gross overcount. The U.S. spends much time and money measuring the rate of erroneous enumeration. This is the role of the enumeration or E sample. In the U.S., in calculating the DSE we subtract from the census count as "not in the census" not just duplicate or fictitious enumerations, but also people counted in the wrong place. "Wrong place" can, in the U.S., be because the person moved, or their housing unit was assigned by the census to the wrong geography. The U.S. also subtracts out cases with so little census information as to make both matching and follow-up impossible. This can occur either from respondent or enumeration error or sloppiness or because of data capture problems. Now, I am not saying that the level of erroneous inclusions is as high in U.K. as in the U.S. However, they may wish to think more about how they are handling some of these "messy" cases.
Although both countries use the DSE, the approach is somewhat different. In the U.S., we use a post-stratified DSE to produce census correction factors for large estimation domains. In 1990, for example, we had 357 domains for the U.S. The British are using a cluster level DSE together with a regression model to produce estimates. The details of these methods were not the focus of the Abbott paper. However, they seem quite interesting and, we in the U.S. will want to learn more about their approach.
A final and important difference is that the British plan a complete person and household adjustment. In the U.S., the adjustments for most tables will be only for individuals. This is an easier problem in the U.K., because they are correcting only for gross omissions. Thus, it is possible to measure the characteristics of the observed missed people. In the U.S., we must "net out" whole-household and within-household omissions with whole-household and within-household erroneous inclusions. This is done through the DSE with, of course, the adjustment for the "unobserved fourth cell." How the net estimated undercount divides into within and whole-household errors is less than clear. In the U.S., adjustments for household and housing units are only planned for later tabulations of the content sample (long form) data. To simplify somewhat, we will control these sample estimates, including household and housing unit estimates, to the adjusted totals for population and housing.
In summary, we have three interesting and well written papers about three different approaches to census taking. The overall message is clear: that worldwide, modern statistical methods are becoming an integral part of what we mean by a census. Our thanks to the authors, to the organizer and to the chairman.
1. The views expressed are attributable to the author and do not necessarily reflect those of the Census Bureau.