Federal Committee on Statistical Methodology
Office of Management and Budget
FCSM Home ^
Methodology Reports ^

 

  Statistical Policy Working Paper 20 - Seminar on Quality of Federal Data - Part 1 of 3


 

 

Click HERE for graphic.

 



               Statistical Policy

                Working Paper 20



 



 



 



     Seminar on Quality of Federal Data



                   Part 1 of 3



 



Federal Committee on Statistical Methodology



 



 



            Statistical Policy Office



  Office of Information and Regulatory Affairs



          Office of Management and Budget



 



                   March 1991



 



   MEMBERS OF THE FEDERAL COMMITTEE ON



          STATISTICAL METHODOLOGY



 



              (February 1991)



 



           Maria E. Gonzalez, Chair



        Office of Management and Budget



 



 



Yvonne M. Bishop                  Daniel Kasprzyk



Energy Information                Bureau of the Census



Administration



                                Daniel Melnick



Warren L. Buckler                 National Science Foundation



Social Security Administration



                                Robert P. Parker



Charles E. Caudill                Bureau of Economic Analysis



National Agricultural



Statistics Service              David A. Pierce



                                Federal Reserve Board



Cynthia Z.F. Clark



National Agricultural             Thomas J. Plewes



Statistics Service              Bureau of Labor Statistics



 



Zahava D. Doering                 Wesley L. Schaible



Smithsonian Institution           Bureau of Labor Statistics



 



Robert M. Groves                  Fritz J. Scheuren



Bureau of the Census              Internal Revenue Service



 



Roger A. Herriot                  Monroe G. Sirken



National Center for               National Center for



Education Statistics              Health Statistics



 



C. Terry Ireland                  Robert D. Tortora



National Computer Security        Bureau of the Census



      Center



 



Charles D. Jones



Bureau of the Census



 



                            PREFACE



 



In 1975, the Office of Management and Budget (OMB) organized the



Federal Committee on Statistical Methodology.  Comprised of



individuals selected by OMB for their expertise and interest in



statistical methods, the committee has during the past 15 years



determined areas that merit investigation and discussion, and



overseen the work of subcommittees organized to study particular



issues.  Since 1978, 19 Statistical Policy Working Papers have been



published under the auspices of the Committee.



 



On May 23-24, 1990, the Council of Professional Associations on



Federal Statistics (COPAFS) hosted a "Seminar on the Quality of



Federal Data."  Developed to capitalize on work undertaken during



the past dozen years by the Federal Committee on Statistical



Methodology and its subcommittees, the seminar focused on a variety



of topics that have been explored thus far in the Statistical



Policy Working Paper series.  The subjects covered at the seminar



included:



 



   Survey Quality profiles



   Paradigm Shifts Using Administrative Records



   Survey Coverage Evaluation



   Telephone Data Collection



   Data Editing



   Computer Assisted Statistical Surveys



   Quality in Business Surveys



   Cognitive Laboratories



   Employer Reporting Unit Match Study



   Approaches to Developing Questionnaires



   Statistical Disclosure-Avoidance



   Federal Longitudinal Surveys



 



Each of these topics was presented in a two-hour session that



featured formal papers and discussion, followed by informal



dialogue among all speakers And attendees.



 



Statistical Policy Working Paper 20, published in three parts,



presents the proceedings of the "Seminar on the Quality of Federal



Data." In addition to providing the papers and formal discussions



from each of the twelve sessions, this working paper includes



Robert M. Groves' keynote address, "Towards Quality in a Working



Paper Series on Quality," and comments by Stephen E. Fienberg,



Margaret E. Martin, and Hermann Habermann at the closing session,



"Towards an Agenda for the Future."



 



We are indebted to all of our colleagues who assisted in organizing



the seminar, and to the many individuals who not only presented



papers and discussions but also prepared these materials for



publication.   A special thanks is due to Terry Ireland and his



staff for their work in assembling this working paper.



 



                      Table of Contents



 



                      Wednesday, May 23, 1990



 



 



                               Part 1



 



 



                     KEYNOTE ADDRESS



 



 



TOWARDS QUALITY IN A WORKING PAPER SERIES ON QUALITY. . . . . . .3



   Robert M. Groves,  The University of Michigan and U. S.



   Bureau of the Census



 



 



 



          Session 1 - SURVEY QUALITY PROFILES



 



 



 



THE SIPP QUALITY PROFILE. . . . . . . . . . . . . . . . . . . . 19



   Thomas B. Jabine, Statistical Consultant



 



INITIAL REPORT ON THE QUALITY OF AGRICULTURAL SURVEY PROGRAM . .29



    George A. Hanuschak, National Agricultural Statistics



    Service



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . . 40



    Barbara A. Bailar, American Statistical Association



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . .46



    Nancy A: Mathiowetz, U. S. Bureau of the Census



 



 



 



Session 2 - PARADIGM SHIFTS USING ADMINISTRATIVE



                             RECORDS



 



 



 



PARADIGM SHIFTS: ADMINISTRATIVE RECORDS AND CENSUS-TAKING. . . .53



    Fritz Scheuren, Internal Revenue Service



 



AN ADMINISTRATIVE RECORD PARADIGM: A CANADIAN EXPERIENCE . . . .66



    John Leyes, Statistics Canada



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . .. . . 77



    Gerald Gates, U.S. Bureau of the Census



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . . 83



    Edward J. Spar, Market Statistics



 



 



 



      Session 3 - SURVEY COVERAGE EVALUATION



 



 



 



 



CONTROL MEASUREMENT,  AND IMPROVEMENT OF SURVEY COVERAGE . . . .87



    Gary M. Shapiro,, U. S. Bureau of the Census; Raymond R.



    Bosecker, National Agricultural Statistics Service



 



QUALITY OF SURVEY FRAMES  . . . . . . . . . . . . . . . . . . .100



    Judith T. Lessler, Research Triangle Institute



 



DISCUSSION  ... . . . . . . . . . . . . . . . . . . . . . . . .108



    Fritz Scheuren, Internal Revenue Service



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . 114



    Joseph Waksberg, Westat, Inc.



 



 



 



       Session 4 - TELEPHONE DATA COLLECTION



 



 



 



QUALITY IMPROVEMENT IN TELEPHONE SURVEYS . . . . . . . . . . . 123



    Leyla Mohadjer, David Morganstein, Westat, Inc.



 



COMPUTER ASSISTED SURVEY TECHNOLOGIES IN GOVERNMENT:



    AN OVERVIEW  . . . . . . . . . . . . . . . . . . . . . .  137



    Marc Tosiano, National Agricultural Statistics Service



 



DISCUSSION  . . . .       . . . . . . . . . . . . . . . . . . .155



    William L. Nicholls II, U. S. Bureau of the Census



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . .161



    James T. Massey, National Center for Health Statistics



 



 



 



 



 



 



 



 



                                iv



 



                                 Part 2



 



 



 



 



                   Session 5 - DATA EDITING



 



 



 



 



OVERVIEW OF DATA EDITING IN FEDERAL STATISTICAL AGENCIES. . . . 167



    David A. Pierce, Federal Reserve Board



 



EDITING SOFTWARE (An excerpt from Chapter IV of Working



    Paper 18)  . . . . . . . . . . . . . . . . . . . . . . . .173



    Mark Pierzchala, National Agricultural Statistics



    Service



 



RESEARCH ON EDITING. . . . . . . . . . . . . . . . . . . . . .  180



    Yahia Ahmed, Internal Revenue Service



 



DISCUSSION  . . . . . . . . . . .  . . . . . . . . . . . . . .  184



    Charles E. Caudill, National Agricultural Statistics



    service



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . . 186



    Richard Bolstein, George Mason University



 



 



 



 



     Session 6 - COMPUTER ASSISTED STATISTICAL



                              SURVEYS



 



 



 



OVERVIEW OF COMPUTER ASSISTED SURVEY INFORMATION COLLECTION . . 191



    Richard L. Clayton, U. S. Bureau of Labor Statistics



 



A COMPARISON BETWEEN CATI AND CAPI. . . . . . . . . . . . . . . 197



    Martin Baum, National Center for Health Statistics



 



COMPUTER ASSISTED SELF INTERVIEWING . . . . . . . . . . . . . . 202



    Ralph Gillmann, Energy Information Administration



 



COMPUTER ASSISTED SELF INTERVIEWING: RIGS AND PEDRO,



    TWO EXAMPLES. . . . . . . . . . . . . . . . . . . . . . . 205



    Ann M. Ducca, Energy Information Administration



 



DATA  COLLECTION. . . . . . . . . . . . . . . . . . . . . . . . 209



    Cathy Mazur, National Agricultural Statistics Service                                             v



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . 212



     Robert N. Tinari, U. S. Bureau of the Census



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . .216



     David Morganstein, Westat, Inc.



 



 



 



                         Thursday, May 24, 1990



 



 



 



        Session 7 - QUALITY IN BUSINESS SURVEYS



 



 



 



IMPROVING ESTABLISHMENT SURVEYS AT THE BUREAU OF LABOR



     STATISTICS . . . . . . . . . . . . . . . . . . . . . . . 221



     Brian MacDonald, Alan R. Tupek, U. S. Bureau of Labor



     Statistics



 



A REVIEW OF NONSAMPLING ERRORS IN FEDERAL ESTABLISHMENT



SURVEYS WITH SOME AGRIBUSINESS EXAMPLES. . . . . . . . . . . . 232



     Ron Fecso, National Agricultural Statistics Service



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . .243



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . .247



     Charles D. Cowan, Opinion Research  Corporation



 



 



           Session 8 - COGNITIVE LABORATORIES



 



 



 



THE BUREAU OF LABOR STATISTICS COLLECTION PROCEDURES



RESEARCH LABORATORY: ACCOMPLISHMENTS AND FUTURE DIRECTIONS. . .253



     Cathryn S. Dippo, Douglas Herrmann, U. S. Bureau of Labor



     Statistics



 



THE ROLE OF A COGNITIVE LABORATORY IN A STATISTICAL AGENCY. . .268



     Monroe G. Sirken, National Center for Health Statistics



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . .278



     Elizabeth Martin, U. S. Bureau of the Census



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . .281



     Murray Aborn, National Science Foundation (retired)



 



                                 vi



 



Session 11 - STATISTICAL DISCLOSURE - AVOIDANCE



 



 



 



 



DISCLOSURE AVOIDANCE PRACTICES AT THE CENSUS BUREAU. . . . .    367



    Brian Greenberg, U. S. Bureau of the Census            



             



THE MICRODATA RELEASE PROGRAM OF THE NATIONAL CENTER



FOR HEALTH STATISTICS. . . . . . . . . . . . . . . . . . . .    377



    Robert H. Mugge,   National Center for Health Statistics



    (retired)



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . ..385



    George T. Duncan, Carnegie Mellon University



 



 



 



    Session 12 - FEDERAL LONGITUDINAL SURVEYS



 



 



 



FEDERAL LONGITUDINAL SURVEYS. . . . . . . . . . . . . . . . . . 393



    Daniel Kasprzyk, U. S.reau of the Census; Curtis



    Jacobs, U. S. Bureau of Labor Statistics



 



THE ADVANTAGES AND DISADVANTAGES OF LONGITUDINAL SURVEYS. .  .  407



    Robert W., Pearson, Social Science Research Council



 



LONGITUDINAL ANALYSIS OF FEDERAL SURVEY DATA. . . . . . . .  .  425



    Patricia Ruggles, Joint Economic Committee



 



DISCUSSION  ... . . . . . . . . . . . . . . . . . . . . . . . ..438



    Michael Brick, Westat,   Inc.



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . .. . . .  447



    Marilyn E. Manser, U. S. Bureau of Labor Statistics



 



 



 



       TOWARDS AN AGENDA FOR THE FUTURE



 



 



 



Stephen E. Fienberg, Carnegie Mellon University. . . . . . . . .455



 



Margaret E. Martin. . . . . . . . . . . . . . . . . . . . . . . 462



 



Hermann Habermann, Office of Management and Budget. . . . . .   465



 



                               viii



 



                      Part 3



 



     Session 9 - EMPLOYER REPORTING UNIT MATCH



                                 STUDY



 



INTERAGENCY AGREEMENTS FOR MICRODATA ACCESS:



     THE ERUMS EXPERIENCE. . . . . . . . . . . . . . . . . . .291



     Thomas B. Petska, Internal Revenue Service; Lois



     Alexander, Social Security Administration



 



SAMPLE SELECTION AND MATCHING PROCEDURES USED IN ERUMS . . . . 301



     John Pinkos, Kenneth LeVasseur, Marlene Einstein,



     U. S. Bureau of Labor Statistics; Joel Packman, Social



     Security Administration



 



RESULTS, FINDINGS, AND RECOMMENDATIONS OF THE ERUMS PROJECT. . 309



     Vern Renshaw, Bureau of Economic Analysis; Tom Jabine,



     Statistical Consultant



 



DISCUSSION.. . . . . . . . . . . . . . . . . . . . . . . . . . 318



     W. Joel Richardson Charles A. Waite, U. S. Bureau of the



     Census



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . .  324



     Thomas  J. Plewes, U. S. Bureau of Labor Statistics



 



        Session 10 - APPROACHES TO DEVELOPING



                          QUESTIONAIRES



 



TOOLS FOR USE IN DEVELOPING QUESTIONS AND TESTING



     QUESTIONNAIRES . . . . .. . . . . . . . . . . . . . . . .331



     Theresa J. DeMaio, U. S. Bureau of the Census



 



TECHNIQUES FOR EVALUATING THE QUESTIONNAIRE DRAFT. . . . . . . 340



     Deborah H. Bercini, National Center for Health Statistics



 



DESIGNING QUESTIONNAIRES FOR CATI IN A MIXED MODE



     ENVIRONMENT  . . . . . . . . . . . . . . . . . . . . . . 349



     Gemma Furno, U. S. Bureau of the Census



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . .360



     Carol C. House, National Agricultural Statistics Service



 



                                  vii



               Part 1



           Keynote Address



TOWARDS QUALITY IN A WORKING PAPER



        SERIES ON QUALITY



 



    TOWARDS QUALITY IN A WORKING PAPER SERIES ON QUALITY



 



                       Robert M. Groves



                The University of Michigan and



                   U.S. Bureau of the Census



 



 



1.   Introduction.



 



   Although this meeting has the title of the "Seminar on the



Quality of Federal Data," its structure follows quite closely the



topics covered in the multi-paper series of Statistical Policy



Working Papers sponsored by the Office of Statistical Policy and,



Standards. There are as of this date, 19 Statistical Policy



Working Papers written since the first in 1978.  That is about 1.6



per year over the 12 years of the series, (see Figure 1).  They



range over a wide terrain, involving issues of the topical focus of



surveys to a set of methodological and statistical issues affecting



survey quality.



 



   I am unaware of the processes that led to my being asked to



give the keynote address at this meeting. I must admit that I



speak to you today as someone who has a very biased opinion about



the OMB Statistical Policy Working Papers - I love almost all of



them; I like the idea that they exist and only recently, because of



my change of job sectors, have I appreciated their worth from



another perspective.  I have used them in graduate courses for



students in survey methods (they are fine introductions to



important design topics).  I have used them in my research work



(they are unique sources of documentation about what goes on in the



Federal Statistical System).  I recommend them to others calling



for consulting assistance.



 



   Although I speak as a friend, 45 minutes of praise from me



wouldn't act to improve this series and runs the risk of "head



inflation" for those who developed the papers.  Instead, I want to



be a constructive critic and will  divide my remarks into several



categories:



 



 a. alternative goals of the OMB series



 



 b. the need for a structure to their topics



 



    I note that what follows are my personal views as a close



observer from afar of the system and a rookie member of the system.



 



 



 



 



 



 



 



 



                                 3



 



 



 



Click HERE for graphic.



 



 



                                 4



 



2. Alternative Perspectives on Goals of the Working Paper Series



 



2.1. OMB Series as Review of the State of Practice



 



    Some of the papers in the series address a topic that spans



many surveys of different populations (see Figure 2).  The papers



on coverage error and telephone data collection are examples of



this.  These kind of papers are compact summaries of the state of



the art on a current issue facing all surveys.  They often describe



activities in both household surveys and those in economic surveys.



Many times they end with case studies of different surveys across



the Federal system and how they handle the particular issue at



hand.



 



                             Figure 2



 



 Alternative Perspectives on Goals of the Working Paper Series



 



 1.  OMB series as a review of the state of practice



 



 2.  OMB series as agency cross-fertilization



 



 3.  OMB series as a prod to new developments



 



    These kind of papers are valuable to the extent that they have



deep depth and wide breadth.  By that I mean, they cover all the



sources of data quality and cover them in sufficient depth that



real learning is likely on the part of most readers.



 



    Let me first speak of breadth of topics. I find it most



simple to array the topics of the papers along the components of



total survey error (see Figure 3).  It is unfair for me to present



this chart without some clarifying remarks about the missing cells.



First, missingness does not imply absence of any treatment of the



topics.  Indeed, on sampling error, for example, many of the



reports comment on the impact of design options on sampling



variance.  Second, this structure is only one which could be



applied to classify the xx reports.  Considering the label of this



seminar "quality of Federal Data", however, I find it attractive to



use it here.



 



    Despite the weakness of any one classification scheme, let me



point out what I believe are weaknesses with the current status of



the series.  There is a distinct bias toward the household survey



domain to the detriment of the economic domain.  There is one paper



with the overarching title of "Quality in Establishment Surveys",



but the fact that it along exists underscores the problem.  This is



a reflection of the smaller literature in the methodology and



evaluation of quality of economic surveys, but it is a status that



 



                                  5



 



I hope will  change in the future. Why? We have in the past too



quickly assumed the following premises about economic survey



measurement:



 



 a. establishment surveys are too diverse to yield themselves



    to common methodologies or standards.



 



 b. establishment surveys do not face questionnaire design



    issues like those of household surveys because the



    information gathered is factual in nature



 



 c. establishment surveys have nonresponse properties that do



    not resemble those of household surveys.



 



    Each of these can be refuted with some observation of the



various establishment surveys now ongoing.  It is true that



establishment populations have large variation in size; that their



organizational structures are diverse; that their recordkeeping



practices are not standardized; that the ideal respondent for



different issues may vary across establishments.  All of this is



true, but should not lead to the extreme that there Are no common



problems either across different establishment surveys or between



household and economic surveys.



 



   As the Boskin report has observed, economic survey data needs



improvement and the working paper series could be one vehicle of



focusing attention on specific needs in this area.



 



   The next most important omission, in my opinion, concerns the



issue of nonresponse.  I must admit here that the work of the



National Academy of Sciences Panel on Missing and Incomplete Data



offers a comprehensive review of current theory and practice.



Conversely, the issue is vital to the unique inferential power of



probability samples and therefore cannot receive too much



attention.    Even the most basic issues remain unresolved:



relationships between response rates and nonresponse error;



relationships between likelihood of coverage and likelihood of



participation; cost/error evaluations of alternative methods of



improving response rates.  Mean square errors of survey estimators



stem from thousands of individual decisions to cooperate with the



survey request.  It behooves us to devote more energy to this and



the working paper series should do this.



 



   Third, the interviewer has largely been ignored.  It has been



ignored despite that fact that many Federal surveys use



interviewers to assist in the data collection, despite the fact



that evaluative procedures desperately need review and



reconceptualization, despite the fact that it is an area where both



statistics and social science perspectives work.  The attention to



the interviewer is even more important given the likely future in



which the traditional labor force of underemployed/overskilled part



 



 



                               6



 



time homemakers will decline and computer technologies are likely



to transform the job.



 



   Fourth, although large portions of data collection in the



Federal Statistical System is by mail and self-administered



questionnaire there is no focused treatment of the methodology in



the series.



 



   Fifth, a few comments specifically on error profiles.  When I



first read the CPS error profile 12 years ago, I had two reactions.



I was attracted to the literary form -- a compilation of quality



measures for the survey, combined with documentation of design



features.  I then felt and still believe that the structure of an



error profile is a valuable way to document leading components of



error in survey statistics (we should be grateful to Brooks and



Bailar as the mothers (or midwives) of the invention).  My second



reaction came after digesting the full report.  How little we as a



community seemed to know about the error properties of the CPS, the



largest ongoing and one of the most important ongoing Federal



household surveys.   Of the 80 pages of the report, for example,



only about 25 are devoted to the data collection operations, a



source of most of the errors in the process!  That combination of



reactions led me to the belief that I still have -- the error



profile, in the hands of intelligent program directors, can act as



an agenda setting document for quality improvement programs.



 



  Finally, there are no serious treatments of costs of data



collection - a topic I'll revisit in a few minutes.



 



   Let me now turn to issues of depth.  At their worst the



reports are catalogues -- they make great reading for someone



interested in buying an idea from those presented, but they don't



make thrilling reading for the uninitiated.  At the same time, they



often assume knowledge of various data series that is not Possessed



by many outside experienced statistical system staff.  As a



corollary, some fail to cite relevant research literature outside



that produced within the statistical system.



 



   Part of these features may be a matter of choice of audience.



I have assumed that the desired audience consists of both Federal



Statistical System staff and researchers in related fields from



academia and commercial domains.  The government, academic, and



commercial research sectors have much to gain from learning about



each others methods.  The paper series could be enhanced by seeking



input from the two other sectors.  At the very least, this might



entail a forced literature review within each paper; at a higher



intensity this might involve the subcommittee membership of those



outside the Federal system.  Even the input from outsiders may not



sufficient.



 



 



 



 



                                7



 



                           Figure 3



 



        Topics of Statistical Policy Working Papers



 



Multiple Error Sources         3 - CPS Error Profile



                           4  - Nonsampling Error Terms



                          13  - Federal Longitudinal Surveys



                          15  - Quality in Establishment Surveys



 



Coverage Error               17  - Coverage Error



 



Nonresponse Error



 



Sampling Error



 



Measurement Error: Interviewer



 



Measurement Error:           10 - Developing Questionnaires



   Questionnaire



 



Measurement Error:



  Respondents



 



Measurement Error:  Mode       6  - Uses of Administrative



  of Data Collection             Records



                           12  - Telephone Data Collection



                           19  - Computer Assisted Surveys



 



Processing                     2  - Statistical Disclosure



                            5  - Statistical Matching



                           11  - Industry Coding Systems



                           18  - Data Editing



 



Estimation                     7  - Time Series Revision



 



 



 



 



 



 



 



 



                              8



 



    Topics Not Classifiable Easily in Error/Quality Terms



 



 



Topical focus                   1 - Statistics for Allocation



                                   of Funds



                              16 - Reporting in Employer



                                   Data Systems



 



 



Administration                  8 - Statistical Interagency



                                   Agreements



                               9 - Contracting for Surveys



 



 



Other                          14 - Uses of Microcomputers



 



 



 



      Missing Topics of Statistical Policy Working Papers



 



 



Coverage Error                  Problems using households as



                               sampling frame elements



 



Nonresponse Error               Combining social science and



                               statistical models of participation



 



Sampling Error                  Statistical software for



                               estimation; generalized variance



                               models; alternative estimators for



                               public use files



 



Measurement Error:              Training; variance models;



    Interviewer                reinterview programs; monitoring of



                               telephone interviewers



 



Measurement Error.              Developmental methods in cognitive



     Questionnaire             laboratories; pretesting regimens;



                               imbedding experiments in surveys



 



Measurement Error:  Mode        Mail and self-administered



 of Data Collection            surveys; mixed mode surveys



 



Processing                      Statistical quality control;



                               automated coding



 



Estimation                      Model-based Estimation



 



 



 



 



 



 



                          9



 



2.2. OMB Series as Cross-Fertilization Among Federal Statistical



      Agencies



 



     In my fifteen years of working with Federal statistical



agencies from my academic base, I was consistently reminded of the



relative isolation of individual agencies from each other.  As most



people in this room know, it is not uncommon for very similar lines



of research and development to be pursued without much coordination



across agencies.  The arguments for this are that different



problems faced by the agencies demand different solutions.  The



arguments against are that functionally equivalent solutions are



often created by two different agencies at twice the cost.



 



     The working paper series has had, I believe, a beneficial



unanticipated effect at reduction on interagency duplication.



First, the subcommittees consist of members from several different



agencies.  Second, the tasks of the subcommittees often involve



collecting information from many statistical agencies.  The members



thereby learn of work going on in agencies they normally don't



visit.  Third, recommendations of the papers often seek to apply



standards across agencies, and the committees are forced to face



the difficulty of system wide standards.



 



     This is laudable and necessary.  Is it sufficient? Clearly



not.  That is, working subcommittees of the Federal Committee on



Statistical Methodology are temporary, normally have an agenda



limited to the report, and do not generally follow up on logical



conclusions of the report.  Our dispersed statistical system, with



all the benefits that specialization offers, misses opportunities



to implement recommendations of these working papers.



 



2.3. OMB Series as a Prod to New Developments



 



     Several of the papers treat topics where only one or two



agencies are making major contributions and most others fall



behind.  For example, the Time Series Revision paper, the industry



coding paper, the paper on computer assisted surveys, all fall into



this category.



 



     If I can temporarily put on the hat of an OMB staff member,



this perspective seems to be the most central to the goals of the



group.   If reports like this can serve to improve the quality of



work ongoing in several agencies, investments by one agency might



quickly reap benefits in many agencies.



 



     Some of the reports are poised for such effects, but the



statistical system seems to miss more opportunities than necessary.



Interagency agreements can be forged to promote such technology



transfer.  That is, consultation or subcontracting can be obtained



within existing regulations.  However,, this requires the target



agency to acknowledge the need for such upgrading.  Could OMB



 



                                   10



 



facilitate this process?  I am too naive to know, but the existence



of a pool of funds at the OMB staff level to assure the spread of



innovation across agencies through detail of staff and other



mechanisms would be productive.



 



   Are there areas of innovation that can profit from



coordination?  Certainly.  The use of CATI/CAPI is one that comes



to mind quickly.  It is now an area in which separate expenditures



are being made by several agencies, where no standards have been



well-defined, where different solutions, with essentially that same



cost/benefit structure, may evolve across different agencies.



 



   The prod to new developments, however, demands that the papers



end with a series of recommendations.  The authors should stimulate



the readers, dare I say, challenge the readers, toward improving



current practice.   After the detailed investigation needed for



these reports, they are uniquely qualified to offer such



recommendations.  Only a minority of the reports end with such



recommendations.  This should be part of the charge to each



committee.



 



 



3. The Need for a Structure of the Working Paper Series



 



   As I age, I must admit that I find more appeal in structures



that guide our research and development in survey design and



implementation, as opposed to reacting to each new idea without an



explicit framework.  In the academic world major theories provide



that structure; they help to identify what are the important



questions; they guide the development of new ideas.  The



application of the word "theory" to social and economic data



production is rare.   We do work that is guided by statistical



theories, social science theories, organizational theories, and



computer science theories.  We are, however, basically on the



applied side of research and development.  We have a data



collection and estimation vehicle (e.g., a survey) which is used



for many substantive purposes.  We are interested in knowledge that



improves the vehicle and less interested in anything else.



 



   As I understand the Federal Committee on Statistical



Methodology, the topics for papers are essentially the fruit of



discussions of the committee members.  This is fine for assuring



interest in the paper series among subcommittee members, but fails



to assure coverage of important topics.  I have suggested a total



survey error structure above.  The reports should have both



measurement and reduction of error in mind.  The widely perceived



worth of sampling error as a criterion of evaluation of data owes



its existence largely to well accepted estimators of the error.  We



currently lack comparably well accepted measures for nonsampling



errors, but the report series could be used as a vehicle to



stimulate such measures.



 



   Finally, another way to structure the report series is around



major problems facing the Federal statistical system in the near



and far term (see Figure 4).  These, in my view, should form the



core attention of the working paper series.  The first I mention



may be the most controversial.  The statistical literature on



survey design is schizophrenic on costs.  On one hand, there exist



models which demonstrate that only through knowing cost components



can design optimization be achieved.  On the other hand, there is



little serious treatment of survey costs by statisticians or those



from other disciplines.



 



                                 Figure 4



 



      Likely Problems Facing Federal Data in the Near/Far Term



 



  1.  Identification of cost components associated with error-



      related design features



 



  2.  Integration of question changes motivated by cognitive



      research into ongoing surveys



 



  3.  Public cooperation with data collection requests and



      coverage of subpopulations on sampling frames



 



  4.  Development of mixed strategy designs, tailored to



      diverse subpopulations



 



  5.  Development of nonsampling error indicators;



      implementation of statistical quality control procedures



 



  6   Training of statisticians and social scientists in survey



      research; recruitment/retention of trained staff



 



      The second issue has both a restrictive and more global



meaning.  First, the work ongoing in so-called cognitive



laboratories is seeking to identify principles influential of



measurement error in question-answer sequences.  The Federal



statistical system at the current time has no good mechanism for



the orderly introduction of change in questionnaires.  For the vast



majority of ongoing surveys, questionnaires remain static despite



evidence of improved alterative measures.  The value of unbroken



time series and the assumptions of canceling biases in over-time



comparisons are used to justify inactivity.  Americans have very



interesting reactions when they visit Cuba or see scenes of the



country.  They marvel at the maintenance of U.S. manufactured cars



in their original state from the 1950's.  They are at once proud of



the ongoing use of older vehicles and humored by the lack of



progress.  A U.S. auto manufacturer would quickly go out of



business if he were continuing to market 1950's designs.  Indeed,



 



                                    12



 



the watchword in that industry in continued investment in change,



designing systems to permit ongoing change, making change part of



the design.  Survey researchers are driving 1950's vehicles in the



1990's.  What we dearly lack is the will to mount ongoing programs



of ongoing improvement in data series.



 



   The third likely issue of import is the role of voluntary



participation in surveys over the coming years.  Some countries in



Western Europe have experienced political shocks to response rates



(e.g., Sweden, West Germany) . Public debate about surveys in these



countries has led to lower cooperation with survey requests.  In



some cases documented effects on survey statistics exist.  That is,



the nonresponse error becomes visible to even the most naive reader



of statistics.   At this point, there was little the researchers



were prepared to do in terms of reaction of field interviewers or



construction of adjustment schemes.  We must acknowledge that



public cooperation is a fragile base on which the scaffolding of



inference lies.  To improve participation or to adjust inference in



the presence of lower participation, understanding of the decision



to participate must be obtained.  This is an issue that faces the



entire statistical system, indeed, the entire industry of



information collection.



 



    The fourth issue is not unrelated to the problems of



participation.  As the diversity of the U.S. population increases,



survey designs that tailor procedures to different subpopulations



grow.  Large portions of the population remain covered by



traditional frames, cooperative and competent to provide



information using cheap data collection methods.  Others fail to be



covered on traditional frames, have difficulty providing



information, and fear harmful consequences from their



participation.  The coming years are likely to find greater appeal



in mixed design strategies -- multiple frames, multiple data



collection modes, tailored questionnaires to subpopulations.  The



models exist in the survey design literature, but they need careful



attention.



 



    The final problem listed above concerns a crisis looming ahead



for the social measurement industry in this country.  Like all



endeavors that require quantitative literacy social and economic



statistics are currently facing a shortage of qualified personnel.



If this were not bad enough, we also suffer from a worse problem --



the absence of ongoing training programs.  It's not merely that



students aren't entering the field; it's not clear how they can



within traditional academic programs.  Let's examine the problem.



Sampling statistics was well developed by the early 1950's; it is



not a "hot" area of development, attracting the best and brightest



of students.  Instead, a variety of analytic statistical



developments are more emergent.  Young Ph.D.'s labelling themselves



as sampling statisticians are unlikely to have an easy route to



tenure in an academic department.  Within the social sciences the



difficulties might be greater, with great pressure on students to



 



                                  13



 



develop areas of expertise which are central to the dominant



paradigms in the discipline.  Survey methodology is not one of them



in any discipline.  There are two results of this: 1) a gross



inadequacy of training of new staff coming into the statistical



system in topics relevant to survey quality.  (This is not a



comment on their training as statisticians, psychologists, or



economists.) and 2) a reduction in the number, of academic



researchers devoted to the craft of social measurement.  There is



a clear conclusion here: the statistical system has to get serious



about training of staff it needs for the future.  This means



support of specialized graduate programs, focused continuing



education, onsite training and other similar mechanisms.



 



     The two types of structure - quality/cost components of data



series and problems facing the system - suggest two paper series,



one devoted to technical issues, another to administrative and



professional issues.



 



 



4. Other Comments, Not Elsewhere Classified



 



     I must admit confusion about the term, "working paper series."



In an academic setting this term is used to describe papers in the



process of being refined or papers not worthy of being refined.



People are sometimes "working" on them.  The better ones change



over time, they evolve to a better state.  This doesn't seem to fit



well with the OMB Working Paper Series.  Most all remain in their



original state.



 



     I don't want to change the name of the series; I'd rather see



the series periodically updated.   Several of the papers were



valuable only for a short period of time (e.g., microcomputers;



telephone data collection).  Having a well-defined structure to the



series might define a set of ongoing updates of papers devoted to



individual topics.



 



     There in another connotation of "working" when attached to



paper series.   That is, they are "working" toward quality



improvements in the statistical system.  I like this connotation.



But it implies two burdens not uniformly accepted:  a) a set of



recommendations at the end of reports, b) follow through by OMB or



individual agencies to implement change.   On this definition, I



think, the paper series has not achieved full success.



 



     Another problem with the series are the costs and benefits



assigned to authors of the reports.  Contrary to my colleagues in



academia, statistical system staff rarely experience career-



enhancing effects of writing such papers.  There is the value of



education about other agencies, of "networking" with other members



of the statistical system, and of learning more about important



issues facing the system.  On the other hand, I've learned that



this is work essentially performed at nights and weekends by people



 



                                 14



 



already very busy.  Now, night and weekend work is commonly very



productive and I have no problem with such a plan.  What I do



regret (and think it bad for the health of the system) is that such



work is given so little value by many of the home agencies.  OMB



might consider remedying this with some more formal recognition of



the writers of these reports.  At the very least, the authors of



the report might be given a more prominent position on the covers



of the papers.



 



   It strikes me that this seminar is an ideal forum for



generating discussion on the future of this series.  I recommend



several questions:



 



   Have the basic issues changed since the report?



        - because of the paper?



        - in spite of the paper?



 



   Is it time to redo the paper, to update it?



 



   Are there subtopics now of sufficient importance that they



   deserve separate treatment?



 



 



5. Personal note



 



   This working paper series consistently contains the name of



one person, from the first to the last - Maria Gonzalez.  The



Federal Statistical System often focuses its attention on data



series structures and organizations, not people, but the success of



any endeavor that spans decades depends on key people.  In this



paper series the key person is unambiguously Maria.  As those of



you who know her well can attest, she has been a rock of



rationality, courtesy, integrity, and absolute honesty in her work



on the Federal Committee on Statistical Methodology.  She alone can



succeed in pressing overworked federal statisticians to take on



projects for the benefit of the whole system.  Her near unique



ability to suggest ideas in a manner that allows the hearers to



believe they are their own ideas is a marvel.  Her perseverance



toward important goals of quality improvement and coordination have



made the working paper series and this conference possible.



 



 



 



 



 



 



 



 



                                15



 



                                                    16



 



              Session 1



     SURVEY QUALITY PROFILES



 



 



 



 



 



 



 



 



                  17



 



 



                   THE SIPP QUALITY PROFILE



 



                       Thomas B. Jabine



                    Statistical Consultant



 



 



A. Introduction



 



   The Survey of Income and Program Participation (SIPP) is a



longitudinal national household survey which has been conducted by



the U.S. Bureau of the Census since 1983, following several years



of developmental research.  The goal of the survey, which uses a



rotating panel design, is to provide policy makers with



comprehensive and accurate data about the levels and determinants



of the income of U.S. persons and households and about their



participation in a broad range of income transfer and welfare



programs.



 



   The SIPP quality profile summarizes current knowledge about



the sources and magnitude of errors based on SIPP.   An initial



version of a SIPP quality profile was issued in 1987 (U.S. Bureau



of the Census, 1987) and an updated and expanded version was



prepared in 1989 (U.S. Bureau of the Census, 1990).



 



   This paper describes the purposes of developing a quality



profile for a survey or other statistical program and the process



of preparing and updating a quality profile, using the SIPP Quality



Profile as an illustration.  The contents of the updated version



will be discussed briefly.  Those who wish to evaluate the quality



of SIPP data on specific topics or to develop an overall judgement



about the quality of SIPP data are referred to the latest version



of the SIPP Quality Profile and the other sources of information



that it identifies.



 



   Section B outlines the development of the quality profile



concept and identifies some publications of the last 4 decades that



could be regarded as forerunners of the current model.  Section C



explains the origin of the SIPP Quality Profile.  Section D



provides an overview of the updated version:  its intended



audiences, purposes sources of information and structure.  The



contents are discussed briefly in section E.  In the concluding



section, I discuss the role of a quality profile in the broad



context of survey quality control and improvement.



 



 



 



 



                               19



 



B. Some Forerunners of the Quality Profile



 



    The theoretical foundation for a quality profile rests on



various models that have been developed for the measurement and



analysis of errors in surveys, especially the Census Bureau model,



which integrates components of sampling and nonsampling error and



the interactions between them (Hansen, Hurwitz and Bershad, 1959).



Dalenius (1974) formalized the concept of total survey design,



using the Census Bureau model to guide the allocation of resources



to minimize total error in a survey.



 



    Based on this foundation, there have been several broad



qualitative and quantitative reviews of the quality of data from



censuses and surveys, featuring direct and indirect data about the



various components of error.  Zarkovich (1966) published what was



perhaps the first systematic treatment of nonsampling errors in



surveys, with emphasis on procedures for their measurement and



control, and including numerous examples of specific information



about nonsampling errors from surveys and censuses in many



countries.    Bailar and Lanphier (1978), in a pilot test of



methodology for the evaluation of survey practices, reviewed the



quality-related design features of 36 U.S. surveys.  Their review



was not based on direct measures of errors, but the frequency with



which they found indirect evidence of low quality was high enough



to be disturbing and to suggest a need for greater attention to the



quality of survey designs and practices.



 



   A United Nations (1982) manual on Nonsampling Errors in



Household Surveys, prepared for use in developing countries,



systematically explores the different sources and types of



nonsampling error and provides illustrative data from numerous



household surveys throughout the world.  Statistical Policy Working



Paper 15 (Office of Management and Budget, 1988) performs a similar



function for Federally sponsored establishment surveys in this



country.



 



   Compilations of information about the quality of surveys have



two main audiences: survey designers/managers and users of survey



data.  To ensure that the latter have access to such information,



standards have been developed for the dissemination, in survey



publications, of information about errors.  An early example of



such standards was Census Bureau Technical Paper 32 (1974).  Today,



several Federal statistical agencies apply similar standards in



their publication programs.



 



   There have been some publications devoted entirely to the



quality of data on a specific topic in a census or survey.  An



early example was a detailed appraisal of the income data from the



1950 Census of Population (Conference on Research in Income and



Wealth, 1958).  The most immediate forerunner of the SIPP Quality



Profile was Statistical Policy Working Paper 3 (Brooks and Bailar,



1978), which provided an error profile for estimates of



 



                               20



 



unemployment from the Current Population Survey (CPS) Jabine



(1987) provided a detailed analysis of the quality of data on



chronic conditions reported in the National Health Interview



Survey.



 



     There are two fairly evident differences between the CPS error



profile and the SIPP quality profile.  The most obvious is the



switch from "error" to "quality" as the defining adjective for the



profile's content.  While this may seem to be only a semantic



change, it reflects a feeling, undoubtedly shared by the authors of



the CPS error profile, that the goals of such a publication are



constructive.  The use of the term quality seems more in keeping



with today's emphasis on quality control and improvement in all



kinds of endeavors, including surveys.  The other basic difference



is that the SIPP quality profile covers the quality of estimates



for all of the topics included in SIPP, whereas the CPS error



profile covered only one of the many topics included in that



survey.



 



     Other U.S. statistical agencies are undertaking similar



although not identical efforts.  The Energy Information Adminis-



tration, for example, periodically publishes reports in a series



called An Assessment of the Quality of Selected EIA Data series.



These reports rely largely on the technique of comparing data from



EIA surveys with more or less comparable data from other sources



and analyzing the differences that are observed.  Janet Norwood, in



a paper presented at the Census Bureau's Third Annual Research



Conference, stated that the Bureau of Labor Statistics was planning



to develop a comprehensive error profile for each of its surveys



(Norwood, 1987, pp. 217-218).



 



C. Origin of the SIPP Quality Profile



 



     The SIPP is a major longitudinal survey. The start of the



survey was preceded by several years of research and development,



an effort known as the Income Survey Development Program.  The



evolution of SIPP's complex survey design did not end when the



survey became operational late in 1983.  Methodological research



and evaluation studies have continued at a substantial pace and the



results of these studies, along with accumulated performance



statistics, feedback from users and adjustments made necessary by



reductions in funding, have led to significant changes in the



survey design and procedures.  Thus, SIPP is still in the early



stages of its evolution, in contrast to the Current Population



Survey which, although not immune to evaluation and improvement,



has reached a more mature and stable phase.



 



     In l984 the Social Science Research Council and the Survey



Research Methods Section of the American Statistical Association,



with the encouragement and support of the Census Bureau, estab-



lished a Working Group on the Technical Aspects of SIPP to provide



 



                                   21



 



advice to the Census Bureau on research priorities and the



translation of research findings into changes in the survey design



and procedures.  (The Social Science Research Council later



relinquished its sponsorship role.) An early recommendation of the



Working Group was that the Census Bureau prepare a compendium of



research results and other information about the quality of SIPP



data.  Members of the Working Group believed that a systematic



account of information about the different kinds of errors that



affect estimates from SIPP would be invaluable as a guide in



setting research priorities and applying the principles of total



survey design to SIPP.  Given the substantial amount of ongoing



research, they recommended that such a quality profile be updated



periodically, perhaps every two years.



 



    The Census Bureau accepted the Working Group recommendation



and produced the Quality Profile for the Survey of Income and



Participation (King, Petroni and Singh, 1987), early drafts of



which were reviewed by several members of the Working Group.  New



information continued to flow in at a rapid rate and toward the end



of 1988, Census decided that it was time to start work on an



update.  The updated version, published in mid-1990, was prepared



by the author of this paper with substantial assistance from Karen



King and Rita Petroni of the Census Bureau's Statistical Methods



Division.  Although the general structure of the two versions is



similar, the update contains much new material and some of the



earlier sections were significantly revised.  It also includes an



index.  The new version benefitted from reviews by several members



of the SIPP Working Group and Census staff.  Special thanks are due



to Daniel Kasprzyk and Rajendra Singh for their support of the



project.



 



 



D. Overview of Version 2



 



    The SIPP Quality Profile is intended to serve two main



audiences: "users of SIPP data and those who are responsible for



or have an interest in the SIPP design and methodology."  The



interests of these two groups are different.  Users want to know



how the errors associated with specific categories or classes of



data are likely to affect their analyses.  SIPP designers and



managers need to know the magnitude of errors associated with



specific design features, in order to control the quality of the



survey estimates and to guide the allocation of resources available



for their improvement.  Besides these two primary audiences, it was



expected that the publication would be of interest to persons



concerned with the design of longitudinal surveys other than SIPP



and to two special groups: the ASA/SRM Working Group and a Panel



to Evaluate the Survey of Income and Participation, convened by the



Committee on National Statistics at the request of the Census



Bureau.



 



 



 



                                 22



 



    Information about the components of error that affect SIPP



data comes from four sources:



 



    o    Performance statistics, such as unit and item non-



    response rates and reports based on quality control



    procedures used in data collection and processing



    operations.



 



    o    Methodological experiments.   Both in the developmental



    period and since the start of survey operations, there



    have been numerous methodological experiments involving



    design features such as length of questionnaire,



    respondent rules, use of respondent incentives, increased



    use of telephone interviewing and methods of adjustment



    for nonresponse.



 



    o    Micro-evaluation studies. The outstanding example is the



    SIPP Record Check Study, in which individual survey



    responses to questions about program participation and



    benefits were compared with administrative data for each



    of several programs.



 



    o    Macro-evaluation studies.  There have been numerous



    comparisons of SIPP data with data on the same topics



    from other surveys, especially the Current Population



    Survey, and from program records.



 



    Assembling the relevant documentation was a challenge.  SIPP



has probably generated more methodological documentation than any



other survey that has been in existence for a similar length of



time.  The list of 161 references provided in the updated version



of the Quality Profile, which includes only those items that were



actually cited in the report, is nearly double the size of the list,



included in the first version.  The most commonly used sources



were: the SIPP Working Paper series; the annual proceedings of the



Survey Research Methods, Social Statistics and Business and



Economic Statistics sections of the American Statistical



Association; the proceedings of the Census Bureau's Annual Research



Conferences; and internal Census Bureau memoranda.  The report



informs readers how to obtain copies of any of the internal



memoranda in which they are interested.



 



    Finding a suitable framework in which to present all of this



information about different components of error also presented a



challenge.  The traditional approach is to organize the material



according to the main phases of the survey: sample selection, data



collection, data processing and estimation.   The core of the



Quality Profile (Chapters 3 through 8) is, in fact, organized in



that manner, with one chapter devoted to sample selection, three to



data collection (covering data collection procedures, nonresponse



error and measurement error) and one each to data processing and



estimation.



 



                                 23



 



    Two important topics did not fit neatly within this framework.



Chapter 9, Sampling Errors, covers the procedures used to estimate



sampling errors and the relationship between sampling errors and



sample size.   Chapter 10, one of the longer chapters, is called



"Evaluation of Estimates" and covers both comparisons Of SIPP



estimates with data from other sources and indicators of errors of



undercoverage.   The remaining chapters, 1, 2 and 11, provide an



introduction, an overview of the survey and a summary,



respectively.



 



    The structure of the SIPP Quality Profile is similar to that



of its chief forerunner, the CPS Error Profile.  The main



differences are the division of the material on data collection



(called "Observational Design and Implementation" in the CPS Error



Profile) into three chapters, and the addition of the chapters on



sampling errors and evaluation of estimates.



 



    Our goal was to provide, insofar as available, quantitative



information about overall error and its components.  Hence, the



report includes 6 figures and 43 tables, a substantial increase



over the number included in the first version.  Space limitations



preclude inclusion of tables in this paper, but for those who may



be interested, the numbers of some key tables and figures from the



publication ate given in the following section.



 



 



E. Summary of Findings



 



Major sources of error



 



   The SIPP Quality Profile does not contain any broad



conclusions about how successful SIPP has been so far in fulfilling



its goals.  Our goal was to provide enough information about the



quality of the survey data so that individuals and groups like the



Committee on National Statistics Panel to Evaluate SIPP could reach



their own conclusions.  The summary chapter does, however, identify



what stood out as the three main sources of error in SIPP



estimates: nonresponse, differential undercoverage and measurement



error.



 



   As in any longitudinal survey, unit nonresponse increases in



succeeding rounds (called "waves" in SIPP) of the survey.



Table 5.1 (not included with this paper, see the report) shows the



data available as of 1989 on unit nonresponse by wave for each



panel of the survey (households and individuals in each panel are



interviewed 8 or 9 times, at 4-month intervals).  The rates are



relatively low -- 4.9 to 7.6 percent -- for the first wave, but



increase to over 20 percent at the final wave of each panel.  This



relatively high attrition is due in part to the difficulty of



tracking households and individuals that move, as is required by



the SIPP design.  The characteristics associated with Unit



nonresponse have been analyzed in detail, and these analyses have



 



                               24



 



guided the development of estimation procedures designed to



minimize the biases that result from differences between the



characteristics of respondents and nonrespondents.



 



    Item nonresponse has been low for core items on labor force



activity, income recipiency and asset ownership.  It has been



somewhat higher for income amounts, especially self employment



earnings and interest.  In the topical modules (questions not asked



in every wave), especially high nonresponse has occurred for



questions on asset amounts.



 



    Indicators of differential undercoverage in SIPP for



population subgroups defined by age, race and sex are shown in



Table 10.13 of the report.  The table shows the reciprocals of the



weights that are applied in order to make the simple unbiased



estimate for each subgroup agree with an independent estimate that



uses the Population Census count as a benchmark.  The group most



affected is young adult black males.  The ratios for black females



in the same age group are also quite low.  At least for the males,



the coverage ratios shown understate the amount of undercoverage,



because the ratios do not include any adjustment for census



undercoverage which is known to be above average for this



population subgroup.



 



    Similar patterns of undercoverage have been observed in the



Current Population Survey and other national household surveys.



The second-stage ratio adjustments used for both cross-sectional



and longitudinal estimates to compensate for undercoverage are



believed to reduce both the sampling error and bias of the



estimates.  The effects of these adjustments on sampling errors can



be estimated, but little is known about their affects on biases



associated with undercoverage.



 



    Measurement error takes many forms, but perhaps its most



significant manifestation in SIPP has been the seam problem, i.e.,



a pronounced tendency for survey respondents to report month-to-



month changes for months in adjacent waves at substantially higher



rates than for adjacent months within a single wave.  Figure 6.1 in



the report provides a graphic illustration of the seam effect on



reports of changes in earnings.  Pronounced effects have been noted



for most income recipiency and amount variables.  Because of the



rotation group design used in SIPP, cross-sectional estimates of



transitions are not likely to be seriously distorted by this



pattern of reporting, but it can affect estimates of the covariance



structure and may have adverse effects on multivariate analyses



dealing with transitions or length of spells.



 



     Table 6.6 in the report shows some early results from the SIPP



Record Check Study.  The sample sizes are small, and the table



shows results for only two of the four states included in the



study.   For the State of Wisconsin, significant levels of



underreporting were found for participation in two programs and



 



                                 25



 



benefit amounts in one other program.  The full results from the



Record Check Study will provide the best direct information so far



available on levels of measurement error in SIPP and will be a



valuable resource for studying the sources and correlates Of



response bias and response error variance.



 



Current research



 



   An active program of SIPP methodological and evaluation



research is continuing.  The main areas of research include:



 



o    The design of the questionnaires and the structure of the



   interviews.  Laboratory research is being conducted to



   study the cognitive aspects of SIPP interviews and how



   they relate to seam effects and other kinds of reporting



   errors.  Field experiments have been conducted to test



   the feasibility of providing feedback of prior wave



   information and encouraging greater use of records in



   interviews.



 



o    Interview mode.  An experiment with increased use of



   telephone interviewing is being evaluated to determine



   whether to adopt the procedures that were tested.  For



   the longer term the Census Bureau is arranging for the



   development of a prototype questionnaire for use in



   computer-assisted personal interviewing (CAPI), in order



   to evaluate the potential effectiveness of this



   collection mode in SIPP.



 



o    Estimation procedures. The broad goal for this area of



   investigation is to develop estimation procedures for



   SIPP that make effective use of auxiliary data available



   from both the Current Population Survey and



   administrative records.  An initial study of the



   feasibility of reducing variances by using IRS data as



   controls in the second-stage ratio estimation procedure



   showed considerable promise.



 



   Research in these and other aspects of the survey is



proceeding at a pace that suggests the desirability of preparing



updates of the SIPP Quality Profile on a regular basis.



 



   Areas of research that have been relatively untouched so far



include the effects of interviewer variance and the conditioning,



effects of repeated interviews on response error.  For the latter,



the overlapping panel design used in SIPP offers the possibility of



comparing cross-sectional estimates for households and persons that



have been in the sample for varying lengths of time.  There is also



a need to update some of the earlier evaluation studies in order to



monitor the effects of design changes since the beginning of the



survey.  Much of the research reported in versions 1 and 2 of the



 



                              26



 



SIPP Quality Profile, including the Record Check Study, which is



the only source of direct information on the site of individual



reporting errors, is based on data from the 1984 panel.



 



F. Conclusions



 



    Judging from some comments by users of the initial version and



reviewers of the preliminary draft of the updated version of the



SIPP Quality Profile, the systematic compilation and publication of



information about the nature and sources of error in a major



continuing survey like SIPP, with Periodic updates, is a worthwhile



undertaking.  A more definitive evaluation of utility will be



possible now that the updated version has been published and is



being widely distributed.  The author believes that the preparation



of quality profiles could be valuable in connection with efforts to



track and improve the quality of data from other major continuing



national surveys, such as the Current Population Survey, the



National Health Interview Survey, the National Crime Survey, the



Annual Survey of Manufactures and the Monthly Retail Trade Survey.



The technique is applicable to both household and establishment



surveys.



 



    Maintaining and improving the quality of survey data is a



never-ending job for survey designers and managers, and there is



room for a multiplicity of approaches.  Some Federal agencies are



making a strong commitment to the application, to survey



operations, of Deming's philosophy and techniques for total quality



management.  That approach implies not just measurement of errors



and identification of their sources, but modification of the survey



process as needed to eliminate or reduce the effects of significant



sources of error.  The other paper presented at this session



(Hanuschak, 1990) provides an example of this model of survey



quality management, with active participation and commitment to



quality improvement by key managers in the organization.  The same



commitment to the quality of data can be seen in the work of the



sponsors and participants in this Conference and they deserve our



thanks for it.



 



REFERENCES



 



Bailar, B. and Lanphier, M. (1978), Development of Survey Methods



to Assess Survey Practices, Washington DC: American Statistical



Association.



 



Brooks, C. and Bailar, B. (1978), An Error Profile: Employment as



Measured by the Current Population Survey, Statistical Policy



Working Paper 3, Office of Federal Statistical Policy and



Standards, U.S. Department of Commerce.



 



 



                                 27



 



Conference on Research in Income and Wealth (1958), An Appraisal of



the 1950 Census Income Data, Studies in Income and Wealth, Vol.23,



National Bureau of Economic Research, Princeton:  Princeton



University Press.



 



Dalenius, T. (1974), Ends and Means of Total Survey Design,



Stockholm: University of Stockholm.



 



Energy Information Administration (1983), An Assessment of the



Ouality of Principal Data Series of the Energy information



Administration (first in a series of "state of the data" reports),



Publication DOE/EIA-0292(82).



 



Hansen, M., Hurwitz, W. and Bershad, M. (1959), "Measurement Errors



in Censuses and Surveys", Bulletin of the International Statistical



Institute, 38:359-374.



 



Jabine, T. (1987), Reporting Chronic Conditions in the National



Health Interview Survey:   A Review of Findings From Evaluation



Studies and Methodological Tests, Data From the National Health



Survey, Series 2, No. 105, National Center for Health Statistics.



 



Jabine, T., assisted by King, K. and Petroni, R. (1990), Survey of



Income and Program Participation: SIPP Quality Profile, Bureau of



the Census, U.S. Department of Commerce.



 



King K., Petroni, R. and Singh, R. (1987), Quality Profile for the



Survey of Income and Program Participation, SIPP Working Paper No.



8708, Bureau of the Census, U.S. Department of Commerce.



 



Norwood, J. (1987), "What is Quality?" in Proceedings, Third



Annual Research Conference, Bureau of the Census, U.S. Department



of Commerce: 215-222.



 



Subcommittee on Measurement of Quality in Establishment Surveys



(1988), Ouality in Establishment Surveys, Statistical Policy



Working Paper 15, Statistical Policy Office, U.S. Office of



Management and Budget.



 



United Nations (1982), Non-sampling Errors in Household Surveys:



Sources, Assessment and Control, UN Publication DP/UN/UBT-81-



041/2, National Household Survey Capability Programme.



 



U.S. Census Bureau (1974), Standard's for Discussion and



Presentation of Errors in Data, Technical Paper 32, U.S. Department



of Commerce.



 



Zarkovich, S. (1966), Quality of Statistical Data, Rome: Food and



Agriculture organization of the United Nations.



 



 



 



 



                                  28



 



                INITIAL REPORT ON THE QUALITY OF



                   AGRICULTURAL SURVEY PROGRAM



 



                       George A. Hanuschak



            National Agricultural Statistics Service



 



 



I. Background and Introduction



 



    In December 1988, the National Agricultural Statistics Service



(NASS) formed a Survey Quality Team (SQT) for its Agricultural



Survey Program (ASP).  The ASP is a series of integrated multiple



sampling frame (area and list) based surveys throughout the



agricultural calendar year.  Some major items on the surveys are



planted and harvested crop acreages, hog, cattle and sheep



inventories, crop yields and production and on-farm grain storage.



There was a major survey redesign from individual MF surveys to an



integrated multiple frame survey program which was implemented over



several years (1984 - 1986).  The mission of the Survey Quality



Team is to identify and develop statistical process control (SPC)



methods for the management of the integrated Agricultural Survey



Program.  The SPC methods are based upon the fundamentals of total



quality management (TQM) techniques developed by Edward Deming,



Joseph Juran, Philip Crosby and other well-known TQM developers in



the TQM and SPC literature.  However, since much of the literature



refers to "manufacturing" situations, it was adapted to fit the



government agricultural survey situation.  Several papers by Ron



Fecso developed the basic model of survey quality used by the SQT.



The first major milestone of the SQT was to be the development of



a baseline "state of the survey" quality report.



 



    The mission of the SQT is quite broad, challenging and



critically important to the Agency's long term goal of routinely



and continually improving survey quality.  The team and the Agency



also face this challenge in the light of severe budget pressure, in



general, on Federal Statistics programs.  However, the team feels



that TQM and SPC methods are quite powerful tools, when properly



applied, that can aid in measuring and improving survey quality



over time.



 



    One of the first lessons of total process control is to define



the major steps in the total process.  In the case of the ASP, one



needs to first define or identify the major steps or stages of the



ASP surveys.  The survey quality team had identified the following



steps (Exhibit I) as the major 22 processes of the survey.



Unfortunately, each one of these survey stages or processes is



probably susceptible to some type of errors or biases.  The SQT



developed the following profile (Exhibit 11) of 24 potential



sources of error or bias in the ASP.



 



    Like any good statistical organization, the Agency has tried



to minimize the probability of various nonsampling errors occurring



 



                                29



 



in the survey process.  Controls include training, survey manuals



and instructions, Agency Policy and Standards Memorandum, quality



control checks on enumeration, reinterview studies, etc. 



Controlling and measuring nonsampling errors for a complex survey



process will remain extremely challenging even with the best



efforts at statistical process control.  However, in the remainder



of this report, the SQT defines and demonstrates how to use



statistical process control and total quality management techniques



to reduce total survey error over time.



 



 



                   Exhibit I - Major Survey Stages



 



 Survey Clearance



 Area Sampling Frame



      (Construction, Maintenance and Sampling)



 List Sampling Frame



      (Construction, Maintenance and Sampling)



 Survey Specifications



 Design of Questionnaires



      (Design, Print and Distribution)



 Preparation of Manuals



     (Interviewers, Supervisory and Editing)



 Prepare Survey Software



     (Data Entry, Survey Coordinator, Edit, Analysis, Summary,



     Data Base, Mail and Maintenance System, Etc.)



 National/Regional Training Schools



 Survey Management - Headquarters and State Statistical Offices



     (Coordination of Procedures)



 Presurvey Coding/Handling/Processing by State Statistical Offices



 State Training Schools



 Data Collection



 Data Collection Quality Control



 Manual Data Review and Coding



 Data Entry and Validation



 Data Edit and Review



 Imputation, Analysis and Summarization



 State Statistical Office  Review of Survey Results



     (including submission of estimates)



 Headquarters Review and Release Preparation



 Post Survey Updating



    (Data Base and List Sampling Frame)



 Post Survey Evaluations



 Survey Research



 



 



 



 



 



 



 



 



                                  30



 



 Exhibit II - Some Potential Sources of Total Survey Error



             in the Agricultural Survey Program



 



Undetected List Sampling Frame Duplication



 



List Sampling Frame (Old or Incorrect Control Data)



List - Undetected Reporting Duplication or other



   reporting/enumeration errors or bias



List Sources of Questionable Quality used for List Sampling Frame



   Build/Maintenance



Area Sampling Frame (Outdated Land Use Stratification)



List Sampling Frame (Any large operations not covered by the



   frame)



Area Sampling Frame (Outdated Sample Segment - Aerial



   Photography)



Different Farm Operation Description Questions



   on Different Questionnaire versions



Incorrect overlap/nonoverlap Determination



Incorrect Exception Report Handling (One Type of Survey Weighting



   Factor)



Incorrect Coding (List Adjustment Survey Weighting Factors,



   Completion/Imputation Codes, etc.)



Undetected Data Entry errors (pass all the way through the



   editing system)



Shift in Mix of Data Collection Modes (Telephone, Computer



   Assisted Telephone, Mail and Personal)



Shift in Mix of Respondents (Operator vs. Spouse vs. Other)



Incorrect Survey Master Records



Questionnaire Design (or Print) Errors



Unmeasured Major Changes in Survey or Estimation Procedures



   (Headquarters or State Statistical Offices)



Error in Known Zero Determination (Is Respondent Validly out of



   Business?)



Overediting/Underediting of Survey Data



Potential Bias in Manual or Machine "Imputation" Procedures



Lack of Formal Outlier Handling Procedures (Non Robust or Non



   Smooth Time Series Estimation)



Survey Processing Software



Shifts in Characteristics or Skill Level of Work Force



   {(Enumerators, Statisticians, Programmers, Support Staff)



   Experience in their current job, survey procedures



   knowledge, farm knowledge, statistics knowledge, technology



   skills, etc.}



Farmer or Respondent's level of understanding or grasping of



   survey reporting concepts and item definitions (Cognitive



   aspects).



 



 



 



 



 



 



 



 



                               31



 



II. The Components of Survey Quality



 



   When faced with the problem of measuring and improving the



quality of the ASP, one should consider the components of survey



quality.  Listing the components defines exactly what is meant by



the-term "survey quality" and highlights specific sub-areas that



need to be explored.



 



    Figure 1 shows the components of survey quality.  It was



developed by the Nonsampling Errors Research Section in the Survey



Research Branch of NASS and adopted by the SQT.  There are four



major components related to survey quality accuracy, resources,



timeliness, and relevance.



 



 



Click HERE for graphic.



 



 



   Accuracy is the component that first comes to mind when



thinking about survey quality.  NASS wants the survey indications



to be as accurate as possible.  Not only should the sampling errors



be small, but also the nonsampling errors should be minimized.  In



large-scale surveys the relative sampling errors can be smaller



than the relative size of the nonsampling errors.  Factors such as



undetected list sampling frame duplication, nonresponse,



questionnaire wording, mode of interview, change in respondent,



etc., can lead to substantial nonsampling errors.



 



   The second component of survey quality is resources.  Even if



a survey organization can control the sampling and nonsampling



errors, its ability to do so will be affected by the amount of



dollars that are available to spend on the survey.  The amount of



dollars has a direct impact on sample sizes, list frame quality,



pretesting, reinterview projects, editing programs, summary



programs, analysis, etc.  Also important is the amount and quality



 



                               32



 



of staff hours that can be devoted to a survey.  Staff hours are



affected by salaries, training, hiring practices, long-term career



development, and organizational climate; components that are also



greatly affected by the amount of dollars available.  Most people



quickly realize that the crucial problem is to take the fixed set



of available resources and use those resources in a way that



maximizes the survey quality.



 



    The third component is timeliness.  Of course, time could be



considered another element of resources -- like dollars and staff.



However, timeliness needs to be considered a component by itself



because timeliness is crucial in the survey process.  The impact



and usefulness of survey indications are greatly affected by



whether the survey data were collected one month or one year



earlier.  NASS has always stressed the need to collect data quickly



and to release estimates as close to the survey reference date as



possible.  Thus, the survey calendar -- which is used to time all



the steps of the survey -- is important to the survey quality.



 



    The final component is relevance.  Relevance is dependent on



the needs of the users of NASS statistics, and those needs change



from day to day.  It is useless for NASS to collect a high-quality



piece of information on farming if that piece of  information has no



relevance for the users of NASS statistics -- that piece of



information simply becomes a product without a buyer.  NASS must



constantly assess the needs of people using its statistics to make



sure that the collected information is relevant.  The second aspect



of relevance is internal to NASS.  An example of internal relevance



is whether the Agency wants direct expansion (level) or ratio



(percent change) or both types of estimators out of the ASP.



 



 



III.  Accuracy of Survey Soybean Acreage Estimates



 



    NASS has an expert panel of Agency statisticians called the



Agricultural Statistics Board.  (ASB) which reviews all survey



indications (often multiple indications for any one item), and



administrative or check data (such as the amount of soybeans



crushed in processing plants) and adopts or sets the official



estimates to be published.



 



    Two concepts need to be defined - use and fitness.  The ASB's



use of the ASP indications was chosen as the primary "use" of the



ASP.  "Fitness" for use is evaluated by setting a standard for use



and measuring adherence to the standard.



 



    Ideally we would have standards for all the components of mean



squared error (MSE) for the various commodity indications and



administrative data used by the ASB.  This would provide the



ability to create statistically well defined composites of the data



for use as the Board estimate or forecast.  As this time we have



measures of the variance for most indications, but have only enough



 



                               33



 



information about MSE's to recognize the importance of developing



more extensive MSE measures.  This section will provide information



for Agency management to assess which areas are most in need of



further study or research and/or corrective action.



 



 The ASB's specific need is to have indications which serve as



a solid basis for the official numbers.  The following chart on



soybean planted acreage display the degree to which the ASB has



found the ASP indications to be "fit for use."



 



 In reviewing the soybean planted acreage chart on ASB use you



will observe the following:



 



1. The Agricultural Statistics Board finds the area sampling frame



based June acreage estimate quite "fit for use."



 



2. The ASB does not find the integrated multiple frame based June



acreage estimate "fit for use."  It has an observed substantial



upward bias which also changed substantially in magnitude between



1987 and 1988 and stayed at the larger magnitude in 1989 and 1990.



Using Pareto analysis and an expert panel using TQM principles



applied to surveys, the SQT identified the major suspected causes



of the upward bias in the multiple frame based soybean acreage



estimate.  These suspected causes are:



 



 



 



Click HERE for graphic.



 



 



 



 



                            34



 



    1. Different Data Collection Methodologies



 



         The area frame based acreage estimate is based upon a



    sample of about 16,000 sample segments throughout the U.S.



    Data collection is done completely by personal interviews



    using an aerial photograph to locate each crop field and



    recorded on a questionnaire by the interviewer with the



    farmers direct participation.  Crop acreage data is collected



    and edited field by field.  Farmers are probed to report waste



    acreage for each field.  There are also five specific



    questions related to defining land operated now to which all



    the rest of the questions relate to.



 



         On the integrated multiple frame survey, the majority of



    data collection is done by telephone (both conventional and



    computer assisted).  The crop acreage data is collected for the



    entire farm (not field by field).  Therefore farmers are



    probed for waste acreage only once, at best, when reporting



    crop acreage.  There is no photographic aid for the farmer to



    refer to.  There is only one or two questions on defining land



    operated now.



 



    2. Undetected List Sampling Frame Duplication



 



         There are sophisticated record linkage tools to identify



    and remove duplication on the list sampling frame.  However,



    due to clerical resource constraints and funding to call



    farmers to resolve differences and the use of multiple list



    sources some duplication remains.   A special study was



    designed in 1989 to measure remaining duplication and the



    effect on the estimates.  The study showed that approximately



    10 percent of the acreage difference was due to obvious list



    frame duplication.



 



    3. No Formal Documented Outlier Handling Procedurers



 



         While there are several good analysis tools to identify



    outliers, there is no formal procedure for handling them.  The



    area frame based acreage estimator is quite robust since the



    average expansion factor is about 200 and the segment size is



    640 acres putting an upper bound on "influential



    observations".    For the list sample, expansion factors are



    considerably larger and farm size does not have much of an



    upper bound.  Thus it is much easier to get highly influential



    observations in the list sample.  Development of a formal



    robust estimator for the list sample is highly recommended.



 



 



 



 



                                 35



 



   4.   Different Imputation Methodologies



 



         There are also different imputation methodologies.  All



   imputation for the area frame is done manually by interviewers



   observations or statisticians.  In the case of crop acreage if



   a farmer refuses the interviewer can still observe most of the



   crop fields and the crop.  On the list sample, the imputation



   is a computerized algorithm that uses other reported survey



   data and list frame control data to impute for nonreported



   data cells.



 



   5. Undetected Reporting Errors



 



         Since the questionnaire design is different the



   undetected reporting error structure may also be different.



   For example, the screening questions on land operated on the



   area side are more detailed than the list questionnaire and



   may do a more accurate job of screening out landlords who are



   not active farmers at survey time.  New farm programs may have



   also led to the formation of more complex farming operations,



   which may involve a different reporting error structure also.



 



   6. Different Ratio Type Information and Sample Designs



 



        On the area frame sample there is an 80 percent overlap



   from one year to the next.   On the list frame sample



   (independent from year to year) there is negligible overlap.



   Thus the area frame sample also provides a paired sample ratio



   estimator.



 



   It is important to note that there have also been two rather



independent sources of data available to the ASB which also support



following the area frame level.  These are a Landsat satellite



based regression estimator (1980-1987) which for major soybean



states had variances at least twice as small as the direct



expansion estimator but also were unbiased when compared to the ASB



and direct expansion.  The second source is the calculation of a



soybean balance sheet which the ASB uses as an evaluation tool.  A



balance sheet takes the carryover from one crop year to the next



and adds crop production to that and then subtracts crop



utilization including exports from it to get a current balance.



These balance sheets also support the area frame based crop acreage



level.  Thus the agency has attempted to verify the correct crop



acreage level using several methods and independent data sources.



 



   Even though there is an observed upward bias in the integrated



multiple frame estimator for soybean acreage there are reasons for



keeping it and reducing the bias.  These reasons are:



 



 



                               36



 



1.  Later crop season yield and production estimates are tied



     to the integrated multiple frame (IMF) approach.



 



 2.  State and sub-state level estimates from the IMF have



     much better precision than the corresponding area frame



     estimates.



 



 3.  Solving the bias problem associated with soybean acreage



     may well improve the entire IMF which is a survey 6 times



     a year with an average of 20-40 items (multivariate in



     nature).  The Survey Quality Team has performed similar



     analysis for on-farm grain storage, and cattle and hog



     inventories.  Some of the bias issues are item specific



     but others are associated with the total survey process



     or components of the survey process.



 



 4.  The IMF approach is substantially more cost efficient and



     involves less respondent burden than the area frame



     approach.



 



     Most important is that the Agency is taking actions on all of



these expected causes in 1989 and 1990.  As previously mentioned



there is now an improved list frame duplication adjustment



procedure in place starting in June 1989.  There is a reinterview



research study being conducted in June 1990 to provide initial



measures of previously undetected reporting errors.  This study



will involve the reinterviewing of a subsample of the list sample



of farmers and record the crop data field by field and ask the more



detailed land operated questions and compare the results.  There are



also research efforts underway to examine the imputation



methodologies and to look at an across year design for list frame



based estimators and evaluate several robust estimators.  In



addition the SQT has provided several quality measures to be



monitored on the resource, relevance, timeliness and accuracy



dimensions which should become operational in 1990-91.



 



     The Agency is also developing alternative "proxies" to the



true item values in addition to relying on the ASB process.  An



operational reinterview/reconciliation survey is being conducted in



six major grain producing states in December 1990.  There has also



been an extensive operational soybean yield validation survey (198?



- current) where farmers are asked to harvest specific fields and



take just that grain to a grain elevator to be weighed and



measured.



 



     This "proxies" to true values are important in a survey



evaluation program but are also complex and expensive to develop



and implement.



 



     As previously mentioned, use of earth resource satellite data



has also been used by the Agency to develop more precise and



accurate crop acreage estimates.



 



                                  37



 



IV.  Summary



 



   It is the claim of the SQT that more consistent and timely



process improvements can take place by using the principles of



statistical process control and Total Quality Management.  More



formal survey quality measurement and monitoring mechanisms will



provide the Agency's management with more and critically important



information to manage the quality of the ASP.  Also, most of these



techniques will readily transfer to other survey programs in the



Agency such as Prices Paid and Received by Farmers, the Farm Costs



and Returns Survey, Objective Yield Surveys, Farm Labor Surveys,



and even to new programs such as Water Quality and Food Safety



Surveys, the National Animal Health Monitoring System and the



Monthly Yield Survey Program.



 



   There are several tools available for such a survey quality



management system.  First there are numerous charting techniques



such as bar and pie charts for resource information, Board



standardized indication graphs with standard errors, Gantt charts



to display, project management and survey schedule information,



upper limit and lower limit control charts, multivariate control



charts, Ishikawa fishbone diagrams and Pareto charts and analysis.



Many of these were used in an earlier effort by the Nonsampling



Errors Research Section when a statistical process control study



was conducted on the Soybean Objective Yield Program.



 



   Pareto analysis is one of the most powerful tools in quality



monitoring systems.  Pareto analysis ranks the potential errors in



a system from most serious to least serious.  The reasoning is that



in many systems and not just surveys, there are a "vital few" and



"trivial many" potential errors in the system.  Thus, the most



important beginning of evaluating the quality of a system is to



identify where it is most likely to break down or fail.  Once the



ranking of potential errors is accomplished, then it is recommended



to identify the allocation of resources for each potential error to



see if management is allocating resources in a fashion that will



truly minimize total survey error.  Many Pareto analyses have



demonstrated that the resource allocation was not in proper



alignment with the true error structure.



 



  Thus, more information on the true total survey error



structure and appropriate resource allocations, is being provided



to survey managers and administrators to form a basis for future



improvements in total survey quality.



 



  Considerable progress has been made by the Agency in



addressing quality issues in its integrated multiple frame



Agricultural Survey Program.   Many of the discoveries will



translate to improved quality on several other major Agency survey



programs as well.



 



                             38



 



References



 



Beller, N., "Error Profile for Multiple Frame Surveys," Statistical



Reporting Service, Research Report, 1979, Washington, DC.



 



Bosecker, R., "Integrated Agricultural Surveys,"  National



Agricultural Statistics Service, Research Report No. SSB-89-05,



Washington, DC, June 1989.



 



Fecso, R., "Survey Quality," Presented at the 2nd Quality Assurance



in Government Symposium, Washington, DC, May 1989.



 



Fecso, R., Pafford, B., Tremblay, T., Johnson, R., "Quality Profile



for Soybean objective Yield Survey," National Agricultural



Statistics Service, Unpublished Case Study, Washington, DC, 1988.



 



 



 



 



 



 



 



 



                                  39



 



                           DISCUSSION



 



                        Barbara A. Bailar



                American Statistical Association



 



I. What is a Quality Profile?



 



   The first quality profile was called an error profile and it



concerned the CPS employment statistics.  To be more positive,



error profiles have now become quality profiles.  The purpose is to



prepare a systematic and comprehensive account of survey



operations, listing the operations, the potential sources of



error, and how the error influences the uses of the survey



statistics.



 



   Quality profiles are still rare events.  When asked why there



are not more, survey producers have three main themes:



 



o    The staff resources that would go into producing a



   quality profile are too great and are in competition with



   other, more urgent needs.



 



o    Producing a report that tells about the errors in surveys



   would lead to less credibility in the statistics



   produced.



 



o    Admitting that there are errors is admitting that we



   haven't done our jobs well.



 



   In fact there are many benefits to producing quality



profiles.  Some of these are as follows:



 



o    to minimize total error, not just sampling error within



   given cost constraints



 



o    to force a thorough documentation of the survey process.



 



o    to guide a user on the effects of possible errors and



   their impact on specific uses



 



o    to develop a sound quality control program



 



o    to use in training programs for new staff in either



   operations or research; and



 



o    to use as the foundation for a sound research and



   analysis program



 



   The development of a quality profile parallels the survey



process and would contain the following elements:



 



                              40



 



1.  Objectives and specifications of the survey



2.  Sampling design and implementation



3.  Observational design and implementation



4.  Data processing



5.  Estimation



6.  Analysis and publication



 



   Given this as my basic understanding, let me comment on the



quality profile for SIPP and the quality assessment of the



Agricultural Survey Program (ASP).



 



   The two reports have some differences and some similarities.



The SIPP profile summarizes what is known about sources and



magnitudes of errors of estimates and addresses accuracy.  The ASP



report is written from the point of view of total quality



management and uses many of the ideas of Deming, Juran, and Crosby.



This report considers resources, timeliness, and relevance as major



components of quality, along with accuracy.  The aims of the two



groups seem to be quite different.



 



   The two reports each identify the same groups as their targets



-- the users of the survey data outside the agency and producers of



the survey inside the agency.



 



   Another similarity is that both look at major phases of the



survey operation, something essential for a quality profile.



 



   A difference in the two reports was that the SIPP report



actually identified four main sources of information on nonsampling



errors:



 



   Performance data



   methodological experiments



   micro-evaluation studies



   macro-evaluation studies.



 



   The ASP report was more concerned with process and how quality



would be assessed.  In fact, the report stresses the need not to



identify too many sources of error because tracking everything down



might take too long.  Actually, I think the total quality



management movement urges groups to use brainstorming techniques to



identify all possible problems and then Pareto analysis to decide



where to concentrate one's efforts.



 



   Another similarity is that both reports left out major steps in



the survey process.  The SIPP report briefly listed the objectives



of the survey, but said nothing about the objectives being



conflicting.  Producing a survey to give both cross-sectional and



longitudinal data has been a new experience for the Census Bureau.



The two objectives do conflict, at least from the resource point of



view.  There were some references to different needs in imputation,



but the resource needs have probably had more impact on the survey.



 



                                41



 



The ASP report did not even list objectives of the survey as a



potential source of error.  Neither report really addressed the



effects of staff training or compared the kinds of training, length



of training, etc.  It is fairly well known that performance data



does not correlate well with interviewer performance on accuracy.



Training could make a difference, but almost nothing is known at



the present time.



 



   Let me move now to some separate comments on the two reports,



starting with the ASP report.  There was a large group of people



who worked on this survey quality team.  Many of them have done



excellent work in survey methodology, so I think we can expect



great things from this group., The mission of the group is to



contribute to NASS's long term goal of routinely and continually



improving survey quality.



 



   The focus on quality at NASS has taken on the language of the



quality and productivity movement.  For example, they use a simple



definition of quality, "fitness for use."   This led them on a



search to decide what that meant and what objective criteria would



be.  Finally, they decided that they would measure it by comparison



with the Agricultural Statistics Board (ASB) estimate.  If the ASB



value is within plus or minus two standard errors of the survey



indication, then the survey indication is fit for use.  And, in



fact, they have five ratings:  ideal, acceptable, workable,



minimal, and out-of-control.



 



   I find it hard to see why the Agricultural Statistics Board



estimate would be used as the standard.  In some cases, there are



long time series and other indicators that the ASB uses to make its



estimate.   However, for some surveys they have much less



information.  Perhaps NASS is pushing the ASB to use the survey



indicators or explain why they haven't.  Though the example given



in the paper about the integrated multiple frame based June acreage



estimate was interesting, there will not always be that kind of



other data available to compare with.



 



   There is nothing about a Board estimate that measures accuracy.



In some ways, it is as if the SIPP people looked at one of their



macro indicators and said that if SIPP didn't come within two



standard deviations of that estimate, then SIPP was not fit for



use.  At least, with a macro indicator, one might be able to



untangle why estimates differ; that may not be possible to do with



the ASB.



 



   Following Deming's principles, I think the careful



documentation of every survey for which millions of dollars are



spent and on which important decisions are based is important to



the profound understanding of which Deming speaks.  A quality



profile tells you what you know and what you don't know but



should.



 



                                 42



 



   It was interesting to see that KASS also addressed resources,



timeliness, and relevance as major components of quality.  However,



it was not clear how criteria would be set or measurements taken.



The Gantt chart on the QAS was helpful in identifying time periods



and overlaps of one round of survey with the next but it did not



help individuals who have many surveys to work on identify



overlapping periods of high intensity.  The sentence "Too frequent



use of overtime to correct a process that is out of control usually



has a devastating effect on overall performance"  What does out of



control mean?  How does it affect overall performance?  How do you



know these things unless you keep careful records on hours worked



on a survey, overtime, and have some measure of a downturn in



performance?



 



    NASS has several good ideas about looking at relevance,



timeliness, and resources as well as accuracy.  It is an ambitious



undertaking.  I have one word of caution in their drive to use



total quality management techniques to help them.  They focus on



several tools available for a survey quality management system



including charting methods.  I agree that these are useful tools.



But what has been most helpful in the manufacturing and service



industries where TQM is used is bringing in a team that has hands-



on knowledge of all the facets of the survey.  The team would



include data collectors from states, edit specifications people,



estimation people, those who set objectives.  The tools would be



something the team would be taught to use to help them.  They would



all need to learn basic concepts of variability.  Only when all



these people participate, do you get the profound knowledge that



you need to improve a system, not merely tamper with it.  As you



recall, tampering with a system does not take care of the major



changes needed to remove high variability due to special causes.



 



   Let me now move to the SIPP report.  This is a good report that



gets periodic updating.  There are areas not covered in the report,



probably because they did not seem as urgent as the areas covered.



However, I do believe that we will need to see a section on



objectives, meeting multiple objectives, defining concepts,



translating concepts into questions, and so forth.  At the other



end of the survey, something needs to be said about analysis and



publication.



 



   Though the Census Bureau does not use the language of total



quality management, I know that they have thought along those



lines.  Using some of the performance measure standards flies in



the face of everything Deming preaches.  I'm talking about



standards for response rates:



 



     Outstanding................ 97.5 - 100.0



     commendable................ 95.5 -  97.4



     Fully successful........... 91.5 -  95.4



     Marginal................... 88.0 -  91.4



     Unsatisfactory............. 87.9 and less



 



                                     43



 



    Instead of setting arbitrary standards for response rates and



production, the Bureau needs to get a deeper understanding of what



is possible in each type of area in which it does surveys.  For



example, response rates can be charted with upper and lower control



limits for PSU's in New York City.  Probably the response rates



there very seldom, if ever, meet the commendable level.  However,



they may be within normal variability for that area.  Only with



positive efforts at changing the system can the response rates be



lowered.  This is partly what Dr. Deming thunders about -- blaming



the worker who may be doing the best he or she can when it is the



system at fault.  Again, this labelling of people's work does not



make the interviewer proud, and it is really tampering with the



system.



 



   The report gave lots of interesting information on household,



person, and item response rates.  Some of the non-response rates on



asset data are such that it seems questionable that the survey is



the right vehicle for collecting the data.



 



   There is also emphasis on the seam problem, but this is nothing



new.   As I recall, it also showed up in the crime survey.  It



seems that certain biases are endemic to longitudinal surveys.  So



far the Bureau has been content to catalog the measured effect.  We



really need some creative thinking and some money to get some



experiments going to look at recall errors, the placement of



events in time, and the time in sample problems.  Though dependent



interviewing may yield more consistent results, they may be no more



accurate.  Before action is taken to fix a problem, there needs to



be a deeper understanding of why the problem exists.



 



   There was very little information available on the extent of



editing, what it does, why changes are made, and what we call



editing and what we call imputation.   Beller made some very



pertinent comments in his 1979 error profile for NASS surveys.



"The amount of editing on some questions resulted in changing the



level of cattle and calves by an amount two or three times greater



than the error caused by sampling.  This amount of editing is cause



for alarm in that it clearly shows a breakdown in the survey



process."    In both the NASS surveys and SIPP, we need to get a



better picture -- a profound understanding -- of what editing is



doing to the data.



 



   One last point on SIPP.  The only direct estimates of sampling



error were for the third quarter of 1983 using 1984 panel data



collected in wave one.  The survey at that time was based on the



1970 census.  It certainly seems time to recompute variances.



Besides having incorrect variances, it seems like gilding the lily



when the analysts are making actual and implied comparisons that



they multiply by 1.6 times the standard error.  The interpretations



and the comparisons could be quite far off.



 



 



                                  44



 



 All in all, I enjoyed reading these papers.  I think the



documentation of SIPP is more complete but I think NASS is farther



along in trying to improve quality.  They do not want to document



only; their real goal is improvement.  I believe that is ultimately



the SIPP goal too, but no strategy has yet been set forward on how



to move in that direction.



 



 



 



 



 



 



 



 



                               45



 



                           Discussion



 



                      Nancy A. Mathiowetz



                  U. S. Bureau of the Census



 



   The data collected by Federal statistical agencies are used to



both shape federal policy and change the distribution of federal



expenditures; given the magnitude of the impact of these data, the



need for high quality goes without question.  In developing the



Quality Profiles, the agencies responsible for this work are to be



commended for continuing to move the discussion of error beyond



that of sampling error and into the realm of the measurement of



nonsampling error.  Although most agencies have for years provided



discussion of sampling error with release of their data and



research findings, we are just beginning to develop a standard of



reporting which includes a discussion of all of the components of



total survey error.



 



Sources of Nonsampling Error



 



   The sources of nonsampling error are many and include:



 



-    the design of the study (e.g. longitudinal vs. cross



   sectional; length of recall period;



 



-    the questionnaire, both the contents and the structure;



 



-    the interviewer;



 



-    the respondent; and



 



-    the post-survey processing, including coding and keying



   of data.



 



   Rather than reiterate issues raised in the Quality Profiles,



I would like to suggest some other topics of investigation within



these sources of nonsampling error.  My goal in doing so, is not to



criticize the work presented here, but to provide some ideas on



where these Quality Profiles could be expanded.



 



Design



 



   With respect to design, we still know little about the effects



of longitudinal designs on the level of error and the error



variance structure of reports over time.  There has been research



to indicate that respondents suffer from "conditioning" effects,



that is the changing of behavior or the reporting of behavior in



later interviews resulting from earlier interviews.   Some



conditioning may improve reporting in that the respondent knows



 



                              46



 



prior to the interview what are the nature of the questions;



conditioning may also result in a reduction in reporting since



respondents are now knowledgeable about the sequencing within an



interview.  In one study, the best predictor of error in reports of



functional status in the fourth round of interviewing is the length



of time it took to conduct the previous interviews.  The finding



suggests that conditioning effects may be reduced by something as



subtle as reducing the length of an earlier interview.  We need



further research to understand how conditioning impacts the



analysis of change over time and the structure of errors over time.



 



   Longitudinal designs may also be affected by changes in the



respondents the interviewer, or even the interpretation and meaning



of critical concepts in the questions, if the panel has a long



life.  With the proliferation of more longitudinal data collection



efforts within the Federal Government, more research into what



questions are sensitive and which are resistant to conditioning



effects as well as which items are most affected by between



interview changes, is necessary.



 



 



Questionnaire



 



   As noted in a lecture to the Society of Government Economists,



Janet Norwood stated that



 



   ...the quality of a statistical indicator is sometimes



   elusive and often difficult to define.  Effective



   measurement requires an underlying conceptual framework



   and careful identification of the phenomenon to be



   estimated....



 



   In the past 25 years, we have made great strides in



understanding how sensitive response distributions are to minor



changes in question wording.  The merging of literatures from



cognitive psychology, social linguistics, and social psychology



with survey methodology has presented use with new means for



attempting to reduce the levels of error associated with the



questionnaire.   What is now needed in the Federal statistical



system is a means for evaluating the various forms by which the



"same" information is collected and analyzed among various



agencies.    For example, in recent years, the proportion of



individuals lacking health insurance has been a critical issue.



The most widely cited data on insurance coverage comes from the



Current Population Survey, which asks whether each person in a



household was covered at any time during the preceding year.



Persons covered by any source at any time during the year are



counted as insured.  In 1987, the estimate for uninsured from the



March CPS was 17.6 percent.  Notice that this question asks whether



the person has been covered "at any time" during the previous year.



In contrast questions from the 1980 National Medical Expenditure



and Utilization Survey (NMCUES) and the 1987 National Medical



 



                               47



 



Expenditure Survey (NMES), both designed as one-year panel surveys,



indicate that point in time estimates of the uninsured (at the time



the person was interviewed) are approximately 14 to 16 percent at



any one cross-section, but that estimates for all year uninsured



are approximately 9 percent.



 



   There is some conjecture that the response to the CPS may



reflect a respondent's status at the time of the interview rather



than in reference to any time in the previous year, due to the,



similarity in the estimates from CPS and the cross-sectional



estimates from NMCUES and NMES.   From a policy perspective the



difference is critical -- whether to provide health insurance for



the chronically uninsured, approximately 21 million people, or



whether to provide insurance for all individuals ever uninsured,



which appears to be approximately 35 million people in a given



year.  Those attempting to address this issue would benefit from a



consistent definition of uninsured as well as a set of questions



which asks about a consistent time period.



 



Interviewer



 



   The use of response rates, hours per completed interview and



item nonresponse rates traditionally used as measures of



interviewer quality, only begin to capture the errors that are



potentially associated with the interviewer's task.  While each of



these measures provides us with information that we believe is



related to quality, we need to employ more measures that could be



used with respect to understanding error for individual questions.



How well do interviewers understand the concepts underlying the



questions they are asking?  Do they have sufficient training and



understanding to ask non-directive probes when necessary to obtain



an adequate answer?   The increased movement toward telephone



interviewing provides use with a means to routinely randomize



interviews across interviewers to obtain measures of interviewer



variance.  We spend millions of dollars in the training of



interviewers and yet know little about the most effective means for



training interviewers or determining their ability to conduct the



interview as trained.  The review of one or more interviews by a



supervisor provides some information, but if we believe that



training interviewers to read questions exactly as written is worth



the cost, we should be routinely evaluating the association between



the delivery of questions and the error associated with the



responses.



 



Editing and Coding



 



   As noted in the SIPP Quality Profile, much of the between wave



difference in industry and occupation appears to be a spurious



result of either data collection or data processing.  A similar



problem can be found in the coding of medical conditions and



 



                                48



 



surgical procedures based on household reported data.   Not only



coding, but also editing procedures, can contribute to the overall



level of error in estimates.  For example, Duncan and Mathiowetz



(1985), using microlevel validation data, found that trimming



estimates of change in income between two years, that is



disbelieving levels of change beyond a certain level as reported by



household respondents, a procedure often done in editing data from



longitudinal surveys of income, resulted in biased estimates of



change and bias in the coefficients predicting income levels and



change.  Retrospective reports of income were more likely to be



correct for those individuals with a large proportional change than



for those with little or no change.  The finding suggests that



editing procedures should be conservative and based on empirically



derived principles.



 



  Whereas we have learned to be sensitive to question wording



with respect to understanding potential sources of bias, and in



doing so demand documentation concerning question wording and



study design, few, if any, studies provide information on effects



of editing and coding processes.  If consumers of the data are to



understand all aspects of total survey error, coding and editing



decisions need to be researched and documented.



 



 



Adjusting for Nonresponse



 



  For the most part, nonresponse adjustments are made using



demographic and segment information and little if any information



concerning the nature of the nonresponse is factored into the



adjustment.  There is a growing body of literature which suggests



that using information from call records, specifically separating



refusals from those you were unable to locate, in a nonresponse



adjustment may prove beneficial, since difficult to locate (but



eventually interviewed) sample individuals look similar to



respondents who cannot be located.



 



   These comments are intended to extend the excellent work



presented in the Quality Profiles.  The profiles provide details on



the measurement of nonsampling error and the results of several



experiments to reduce these levels of error.  In addition.  I hope



that as others consider producing quality profiles these profiles



are expanded to cover some of these other issues.



 



 



Reference



 



Duncan, G.J. and Mathiowetz, N.A. A Validation Study of Economic



Survey Data, Ann Arbor, MI: The Institute for Social Research,



1985.



 



 



 



                                49



 



 



 



                                50



 



            Session 2



 



    PARADIGM SHIFTS USING



    ADMINISTRATIVE RECORDS



 



 



 



 



 



 



 



 



               51



 



 



 



               52



 



 PARADIGM SHIFTS:  ADMINISTRATIVE RECORDS AND CENSUS-TAKING



 



                        Fritz Scheuren



                   Internal Revenue Service



 



 



  There is a lot in the news lately about problems with the 1990



decennial census in the United States.  Many opinions have already



been offered about what went wrong and what should be done.



Indeed, a paradigm shift may be needed in census-taking.



 



   This brief note talks about the possible role administrative



records might play in a new paradigm.  To get things started, the



word "paradigm" might deserve some elaboration: a paradigm is a way



of thinking and then doing; a pattern of belief and behavior; a way



of seeing reality and using that sense to accomplish something.



Paradigms are common -- the way we get to work would be a humble



example.  Conventional census-taking, under this definition, could



be characterized as a major scientific and technical paradigm.



 



   As long as our paradigms work well for us, we tend not to



change them.  Occasionally, however, paradigms break down and have



to be replaced; e.g., the bridge goes out and we need to find



another route to work.  As Kuhn pointed out in his seminal book on



the structure of scientific revolutions, paradigms break down in,



science, as well (Kuhn, 1970).  Perhaps the most famous example of



this is the revolution in the thinking of astronomers that occurred



when the Ptolemic earth-centered view of the universe was replaced



by the Copernican view of an earth that revolved, with the other



planets, around the sun.



 



   If we look at the problems the U.S. Census Bureau has



encountered with the 1990 decennial census, it can easily be argued



that one of the major barriers to overcoming these obstacles is the



conventional census-taking paradigm.  Kish, in a recent paper he



has written for Survey Methodology (1990), considers at length some



possible alternatives.  My objective here will be to focus on two



of those areas -- rolling censuses and administrative registers



and to explore a new paradigm for the U.S. decennial census.



 



 



 



 



 



 



                                53



 



Conventional Census-Taking



 



    Conventional censuses, like those in Canada and the U.S.,



continue to do many things very well (e.g., Hammond, 1990).



Indeed, at present, we have no adequate substitute for them;



nonetheless, the need for at least some change seems compelling.



Rising costs are a big factor.  There have been many improvements



in census-taking in this century; still, in both Canada and the



U.S., total costs and even costs per person have risen



significantly:



 



o     The 1990  decennial census in the U.S. is budgeted at



    about $10 (U.S.) per person.   Even adjusting for



    inflation, this is a four-fold increase over what the per



    capita expenses were in 1960.  Item content differences



    between the two censuses are small and essentially not a



    factor in explaining the difference.  Both the 1960 and



    1990 Census, for example, asked only 7 population



    questions of everyone (U.S. Bureau of the Census, 1989).



    The Census long-form sample in 1960 contained 35



    questions and was to be completed by 25% of the



    population.   For 1990, the Census long-form sample was



    given to 16% of U.S. hou