DISCUSSION OF SESSION ON IMPROVING SMALL-AREA ESTIMATES
Michael L. Cohen
My remarks will follow the following general outline. I will first discuss each of the three papers in turn. For each paper, I will raise a series of questions or remarks that relate to the method suggested or its evaluation. I will then provide more general points about the paper. Then, after discussing all three papers, I will conclude with some broader comments on small-area estimation.
The first paper is "Using Administrative Records for Small-Area Estimation in the American Community Survey," by Chand and Alexander. I have the following remarks and questions:
1. The current model is cross-sectional. The paper mentions the intent to examine the incorporation of time series models in the near future. This might provide some real benefit in variance reduction. It might be especially important if the ACS sample size is cut in the future. In his presentation, Alexander mentioned that times series of administrative records are often difficult to use due to relatively common changes in the associated programs. This would clearly make time series modeling less beneficial, so maybe time series modeling should not be a high priority, depending on programmatic changes in the years of interest.
2. The current model assumes homogeneous variances for the random effects. There was some investigation of this assumption through formation of standardized residuals, which seemed to have a standard normal distribution. However, the tests that were used to show this were not described, and, in addition, there are graphical displays that are excellent at this sort of evaluation that should be examined. Also, one might examine, e.g., whether the means of absolute standardized residuals, within categories defined by, say large versus small tracts, are close to each other (accounting for sampling variance).
3. Given the importance of the fit of the regression model used, it would be interesting to see some regression statistics, especially R-squared and t-statistics, and also the results of regression diagnostics, to evaluate the linearity assumption, and to examine the error distribution and the impact of outliers.
4. The error measures were only calculated for randomly chosen tracts. It was not clear to me why the error measures could not be calculated for all tracts.
5. The authors examined controlling the estimates to direct estimates for larger areas. It might be useful for this purpose to carve up the United States into larger areas that are relatively homogeneous, an idea suggested by John Tukey in the adjustment context. One might make use of a cluster analysis to help form these areas. Unfortunately, this might be a lot of work given the level of aggregation at which one would have to work.
6. Four empirical Bayes methods were examined in this paper, and the results for these four related methods were nearly identical. What are the advantages and disadvantages in the literature for these four methods? Which ones were expected to outperform the others in this application and why?
7. The impressive variance reductions were estimated based on somewhat asymptotic arguments. Is there any chance that these estimated reductions are therefore optimistic?
8. Finally, does the investigation of the benefits of controlling the estimates to direct estimates for higher levels of geographic aggregation indicate some interest in looking at a shares model? Has any work been done in that direction?
Generally, I would like to make the following comments:
1. This is excellent, careful work on a hard and important problem.
2. I believe that there should be more attempts at evaluation of these techniques that make use of simulations, possibly the use of artificially generated data sets, similar to what was used to evaluate adjustment methods in the 1980's.
3. Being careful about the dynamics of the associated federal programs, the incorporation of time series techniques might yet provide some advantages.
The second paper is "Large Numbers of Estimates for Small Areas," by Schirm, Zaslavsky, and Czajka. I have the following remarks and questions:
1. The improvement seen seems to have two sources: (a) better estimation of the frequency of appearance of the household types in a state, and (b) better estimation of the mean response for each type of household. It would be interesting to know which source of improvement is more important.
2. There is a need for some simulation studies to get a sense of what the improvement is likely to be given various assumptions about the size of higher-order interactions in the contingency table model underlying their method.
3. It is unclear, especially given how many n's there are, as to how the parameters are estimated. Are there any convergence problems?
4. Given that this is an "automated" procedure for providing estimates for many problems, it would be useful to compare the estimates not only against the direct estimates, but also against the best empirical Bayes' estimate that one can come up with for an individual problem. This would help someone decide whether to invest the additional resources for key individual estimates, using this technique for the remaining problems.
Generally, I would like to make the following comments:
1. It is very exciting to see an entirely new, and obviously extremely promising new approach to a very hard problem. This work is wonderfully innovative.
2. This is an excellent example of getting the most possible out of a data set.
3. The authors need to start evaluating the method using simulations and real data.
The third paper is "Small Domain Estimation of Employment Using CES and ES202 Data," by Harter and Wolter. I have the following remarks and questions:
1. Why did Estimate 8 (raking of the unbiased estimator) not perform better? Was the marginal information of so little use?
2. It is a little interesting that variance component methods for estimating i did not perform better. As the authors know, a non-constant i would allow trading-off the direct and model-based estimates depending on the relative variances of the two estimates, which depends on sample size. Some of the direct estimates must be better than others, and i points these out.
3. Again, it would be helpful to see some regression statistics and regression diagnostics to get a feeling for the fit of the regression model. This is a crucial aspect of the performance of small-area estimates.
4. The paper mentioned that there was a correlation of at least .9 between the CES and the ES202 values. These correlations represent a situation where the same respondents, presumably, are responding to similar questions in a slightly time-lagged situation. Is the fact that the correlations are not closer to 1.00 a matter primarily of the time difference? This similarity is probably why Estimate 7 was examined. Possibly some detective work could be conducted that would help understand why the correlations aren't higher, and might suggest alternative models.
5. Following up on point 4, if the time lag is the most important factor in the difference, possibly a non-trivial time series model on ES 202 data, for example exponential smoothing, might be useful for addressing the time lag. I would be very surprised if this idea worked, but it is simple to look at.
6. To what extent was the simulation dominated by poor performance in a few areas? If so, what is special about those areas?
7. The authors might consider looking at separate models of important subsets, such as urban and rural, or economic growth and decline, to see if the fit can be improved. What other covariate information is available? Would ACS be of some help?
Generally, I would like to make the following comments:
1. This is extremely careful, comprehensive work.
2. This represents a very promising approach to a very hard problem.
Finally, I would like to make some general comments on small-area estimation motivated by these three papers.
1. Small-area estimation is an area of increasing complexity and importance. There is a greater need for small-area information to address a wide variety of policy needs. There are many sources of information that might be useful, including the census, ACS, regular household surveys, and national and state administrative records. Statisticians interested in getting involved in this need to understand generalized linear modeling, random effects models, survey sampling, and time series analysis. There are very hard issues, such as at what level to carry out the modeling. Clearly, this is a rich area in which much progress will be made and in which the areas of application will quickly grow.
2. Acknowledging that variances can often be substantially reduced through the use of these methods, it also needs to be understood that there are limits to the improvement given the covariate information available. It is crucial to have very predictive models for this work. Statisticians should not over-promise what levels of aggregation can be addressed.
3. I get the impression that the goodness of fit of the indirect estimates is generally over-represented and that therefore the direct estimates get too little weight in these methods. I believe model misspecification is not fully taken into consideration when these models are used in a predictive framework. Is there some way of incorporating this in these estimates?
4. To what extent are we missing local idiosyncracies in these models? How can we address that? Clearly administrative records can pick much of this up, and some sample is usually taken in each area, but unmodeled features are likely to remain.
5. We need to make all possible use of opportunities for external validation. Indirect assessments of performance are fine but we also need to compare our estimates to "true" values.
6. External validation should also be supplemented with simulation studies, where the truth can be controlled. There needs to be greater use of artificially generated data sets to understand the properties of these estimates.