Federal Committee on Statistical Methodology
Office of Management and Budget
FCSM Home ^
Methodology Reports ^

 

  Statistical Policy Working Paper 20 - Seminar on Quality of Federal Data - Part 1 of 3


 

 

Click HERE for graphic.

 



               Statistical Policy

                Working Paper 20



 



 



 



     Seminar on Quality of Federal Data



                   Part 1 of 3



 



Federal Committee on Statistical Methodology



 



 



            Statistical Policy Office



  Office of Information and Regulatory Affairs



          Office of Management and Budget



 



                   March 1991



 



   MEMBERS OF THE FEDERAL COMMITTEE ON



          STATISTICAL METHODOLOGY



 



              (February 1991)



 



           Maria E. Gonzalez, Chair



        Office of Management and Budget



 



 



Yvonne M. Bishop                  Daniel Kasprzyk



Energy Information                Bureau of the Census



Administration



                                Daniel Melnick



Warren L. Buckler                 National Science Foundation



Social Security Administration



                                Robert P. Parker



Charles E. Caudill                Bureau of Economic Analysis



National Agricultural



Statistics Service              David A. Pierce



                                Federal Reserve Board



Cynthia Z.F. Clark



National Agricultural             Thomas J. Plewes



Statistics Service              Bureau of Labor Statistics



 



Zahava D. Doering                 Wesley L. Schaible



Smithsonian Institution           Bureau of Labor Statistics



 



Robert M. Groves                  Fritz J. Scheuren



Bureau of the Census              Internal Revenue Service



 



Roger A. Herriot                  Monroe G. Sirken



National Center for               National Center for



Education Statistics              Health Statistics



 



C. Terry Ireland                  Robert D. Tortora



National Computer Security        Bureau of the Census



      Center



 



Charles D. Jones



Bureau of the Census



 



                            PREFACE



 



In 1975, the Office of Management and Budget (OMB) organized the



Federal Committee on Statistical Methodology.  Comprised of



individuals selected by OMB for their expertise and interest in



statistical methods, the committee has during the past 15 years



determined areas that merit investigation and discussion, and



overseen the work of subcommittees organized to study particular



issues.  Since 1978, 19 Statistical Policy Working Papers have been



published under the auspices of the Committee.



 



On May 23-24, 1990, the Council of Professional Associations on



Federal Statistics (COPAFS) hosted a "Seminar on the Quality of



Federal Data."  Developed to capitalize on work undertaken during



the past dozen years by the Federal Committee on Statistical



Methodology and its subcommittees, the seminar focused on a variety



of topics that have been explored thus far in the Statistical



Policy Working Paper series.  The subjects covered at the seminar



included:



 



   Survey Quality profiles



   Paradigm Shifts Using Administrative Records



   Survey Coverage Evaluation



   Telephone Data Collection



   Data Editing



   Computer Assisted Statistical Surveys



   Quality in Business Surveys



   Cognitive Laboratories



   Employer Reporting Unit Match Study



   Approaches to Developing Questionnaires



   Statistical Disclosure-Avoidance



   Federal Longitudinal Surveys



 



Each of these topics was presented in a two-hour session that



featured formal papers and discussion, followed by informal



dialogue among all speakers And attendees.



 



Statistical Policy Working Paper 20, published in three parts,



presents the proceedings of the "Seminar on the Quality of Federal



Data." In addition to providing the papers and formal discussions



from each of the twelve sessions, this working paper includes



Robert M. Groves' keynote address, "Towards Quality in a Working



Paper Series on Quality," and comments by Stephen E. Fienberg,



Margaret E. Martin, and Hermann Habermann at the closing session,



"Towards an Agenda for the Future."



 



We are indebted to all of our colleagues who assisted in organizing



the seminar, and to the many individuals who not only presented



papers and discussions but also prepared these materials for



publication.   A special thanks is due to Terry Ireland and his



staff for their work in assembling this working paper.



 



                      Table of Contents



 



                      Wednesday, May 23, 1990



 



 



                               Part 1



 



 



                     KEYNOTE ADDRESS



 



 



TOWARDS QUALITY IN A WORKING PAPER SERIES ON QUALITY. . . . . . .3



   Robert M. Groves,  The University of Michigan and U. S.



   Bureau of the Census



 



 



 



          Session 1 - SURVEY QUALITY PROFILES



 



 



 



THE SIPP QUALITY PROFILE. . . . . . . . . . . . . . . . . . . . 19



   Thomas B. Jabine, Statistical Consultant



 



INITIAL REPORT ON THE QUALITY OF AGRICULTURAL SURVEY PROGRAM . .29



    George A. Hanuschak, National Agricultural Statistics



    Service



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . . 40



    Barbara A. Bailar, American Statistical Association



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . .46



    Nancy A: Mathiowetz, U. S. Bureau of the Census



 



 



 



Session 2 - PARADIGM SHIFTS USING ADMINISTRATIVE



                             RECORDS



 



 



 



PARADIGM SHIFTS: ADMINISTRATIVE RECORDS AND CENSUS-TAKING. . . .53



    Fritz Scheuren, Internal Revenue Service



 



AN ADMINISTRATIVE RECORD PARADIGM: A CANADIAN EXPERIENCE . . . .66



    John Leyes, Statistics Canada



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . .. . . 77



    Gerald Gates, U.S. Bureau of the Census



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . . 83



    Edward J. Spar, Market Statistics



 



 



 



      Session 3 - SURVEY COVERAGE EVALUATION



 



 



 



 



CONTROL MEASUREMENT,  AND IMPROVEMENT OF SURVEY COVERAGE . . . .87



    Gary M. Shapiro,, U. S. Bureau of the Census; Raymond R.



    Bosecker, National Agricultural Statistics Service



 



QUALITY OF SURVEY FRAMES  . . . . . . . . . . . . . . . . . . .100



    Judith T. Lessler, Research Triangle Institute



 



DISCUSSION  ... . . . . . . . . . . . . . . . . . . . . . . . .108



    Fritz Scheuren, Internal Revenue Service



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . 114



    Joseph Waksberg, Westat, Inc.



 



 



 



       Session 4 - TELEPHONE DATA COLLECTION



 



 



 



QUALITY IMPROVEMENT IN TELEPHONE SURVEYS . . . . . . . . . . . 123



    Leyla Mohadjer, David Morganstein, Westat, Inc.



 



COMPUTER ASSISTED SURVEY TECHNOLOGIES IN GOVERNMENT:



    AN OVERVIEW  . . . . . . . . . . . . . . . . . . . . . .  137



    Marc Tosiano, National Agricultural Statistics Service



 



DISCUSSION  . . . .       . . . . . . . . . . . . . . . . . . .155



    William L. Nicholls II, U. S. Bureau of the Census



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . .161



    James T. Massey, National Center for Health Statistics



 



 



 



 



 



 



 



 



                                iv



 



                                 Part 2



 



 



 



 



                   Session 5 - DATA EDITING



 



 



 



 



OVERVIEW OF DATA EDITING IN FEDERAL STATISTICAL AGENCIES. . . . 167



    David A. Pierce, Federal Reserve Board



 



EDITING SOFTWARE (An excerpt from Chapter IV of Working



    Paper 18)  . . . . . . . . . . . . . . . . . . . . . . . .173



    Mark Pierzchala, National Agricultural Statistics



    Service



 



RESEARCH ON EDITING. . . . . . . . . . . . . . . . . . . . . .  180



    Yahia Ahmed, Internal Revenue Service



 



DISCUSSION  . . . . . . . . . . .  . . . . . . . . . . . . . .  184



    Charles E. Caudill, National Agricultural Statistics



    service



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . . 186



    Richard Bolstein, George Mason University



 



 



 



 



     Session 6 - COMPUTER ASSISTED STATISTICAL



                              SURVEYS



 



 



 



OVERVIEW OF COMPUTER ASSISTED SURVEY INFORMATION COLLECTION . . 191



    Richard L. Clayton, U. S. Bureau of Labor Statistics



 



A COMPARISON BETWEEN CATI AND CAPI. . . . . . . . . . . . . . . 197



    Martin Baum, National Center for Health Statistics



 



COMPUTER ASSISTED SELF INTERVIEWING . . . . . . . . . . . . . . 202



    Ralph Gillmann, Energy Information Administration



 



COMPUTER ASSISTED SELF INTERVIEWING: RIGS AND PEDRO,



    TWO EXAMPLES. . . . . . . . . . . . . . . . . . . . . . . 205



    Ann M. Ducca, Energy Information Administration



 



DATA  COLLECTION. . . . . . . . . . . . . . . . . . . . . . . . 209



    Cathy Mazur, National Agricultural Statistics Service                                             v



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . 212



     Robert N. Tinari, U. S. Bureau of the Census



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . .216



     David Morganstein, Westat, Inc.



 



 



 



                         Thursday, May 24, 1990



 



 



 



        Session 7 - QUALITY IN BUSINESS SURVEYS



 



 



 



IMPROVING ESTABLISHMENT SURVEYS AT THE BUREAU OF LABOR



     STATISTICS . . . . . . . . . . . . . . . . . . . . . . . 221



     Brian MacDonald, Alan R. Tupek, U. S. Bureau of Labor



     Statistics



 



A REVIEW OF NONSAMPLING ERRORS IN FEDERAL ESTABLISHMENT



SURVEYS WITH SOME AGRIBUSINESS EXAMPLES. . . . . . . . . . . . 232



     Ron Fecso, National Agricultural Statistics Service



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . .243



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . .247



     Charles D. Cowan, Opinion Research  Corporation



 



 



           Session 8 - COGNITIVE LABORATORIES



 



 



 



THE BUREAU OF LABOR STATISTICS COLLECTION PROCEDURES



RESEARCH LABORATORY: ACCOMPLISHMENTS AND FUTURE DIRECTIONS. . .253



     Cathryn S. Dippo, Douglas Herrmann, U. S. Bureau of Labor



     Statistics



 



THE ROLE OF A COGNITIVE LABORATORY IN A STATISTICAL AGENCY. . .268



     Monroe G. Sirken, National Center for Health Statistics



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . .278



     Elizabeth Martin, U. S. Bureau of the Census



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . .281



     Murray Aborn, National Science Foundation (retired)



 



                                 vi



 



Session 11 - STATISTICAL DISCLOSURE - AVOIDANCE



 



 



 



 



DISCLOSURE AVOIDANCE PRACTICES AT THE CENSUS BUREAU. . . . .    367



    Brian Greenberg, U. S. Bureau of the Census            



             



THE MICRODATA RELEASE PROGRAM OF THE NATIONAL CENTER



FOR HEALTH STATISTICS. . . . . . . . . . . . . . . . . . . .    377



    Robert H. Mugge,   National Center for Health Statistics



    (retired)



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . ..385



    George T. Duncan, Carnegie Mellon University



 



 



 



    Session 12 - FEDERAL LONGITUDINAL SURVEYS



 



 



 



FEDERAL LONGITUDINAL SURVEYS. . . . . . . . . . . . . . . . . . 393



    Daniel Kasprzyk, U. S.reau of the Census; Curtis



    Jacobs, U. S. Bureau of Labor Statistics



 



THE ADVANTAGES AND DISADVANTAGES OF LONGITUDINAL SURVEYS. .  .  407



    Robert W., Pearson, Social Science Research Council



 



LONGITUDINAL ANALYSIS OF FEDERAL SURVEY DATA. . . . . . . .  .  425



    Patricia Ruggles, Joint Economic Committee



 



DISCUSSION  ... . . . . . . . . . . . . . . . . . . . . . . . ..438



    Michael Brick, Westat,   Inc.



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . .. . . .  447



    Marilyn E. Manser, U. S. Bureau of Labor Statistics



 



 



 



       TOWARDS AN AGENDA FOR THE FUTURE



 



 



 



Stephen E. Fienberg, Carnegie Mellon University. . . . . . . . .455



 



Margaret E. Martin. . . . . . . . . . . . . . . . . . . . . . . 462



 



Hermann Habermann, Office of Management and Budget. . . . . .   465



 



                               viii



 



                      Part 3



 



     Session 9 - EMPLOYER REPORTING UNIT MATCH



                                 STUDY



 



INTERAGENCY AGREEMENTS FOR MICRODATA ACCESS:



     THE ERUMS EXPERIENCE. . . . . . . . . . . . . . . . . . .291



     Thomas B. Petska, Internal Revenue Service; Lois



     Alexander, Social Security Administration



 



SAMPLE SELECTION AND MATCHING PROCEDURES USED IN ERUMS . . . . 301



     John Pinkos, Kenneth LeVasseur, Marlene Einstein,



     U. S. Bureau of Labor Statistics; Joel Packman, Social



     Security Administration



 



RESULTS, FINDINGS, AND RECOMMENDATIONS OF THE ERUMS PROJECT. . 309



     Vern Renshaw, Bureau of Economic Analysis; Tom Jabine,



     Statistical Consultant



 



DISCUSSION.. . . . . . . . . . . . . . . . . . . . . . . . . . 318



     W. Joel Richardson Charles A. Waite, U. S. Bureau of the



     Census



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . .  324



     Thomas  J. Plewes, U. S. Bureau of Labor Statistics



 



        Session 10 - APPROACHES TO DEVELOPING



                          QUESTIONAIRES



 



TOOLS FOR USE IN DEVELOPING QUESTIONS AND TESTING



     QUESTIONNAIRES . . . . .. . . . . . . . . . . . . . . . .331



     Theresa J. DeMaio, U. S. Bureau of the Census



 



TECHNIQUES FOR EVALUATING THE QUESTIONNAIRE DRAFT. . . . . . . 340



     Deborah H. Bercini, National Center for Health Statistics



 



DESIGNING QUESTIONNAIRES FOR CATI IN A MIXED MODE



     ENVIRONMENT  . . . . . . . . . . . . . . . . . . . . . . 349



     Gemma Furno, U. S. Bureau of the Census



 



DISCUSSION  . . . . . . . . . . . . . . . . . . . . . . . . . .360



     Carol C. House, National Agricultural Statistics Service



 



                                  vii



               Part 1



           Keynote Address



TOWARDS QUALITY IN A WORKING PAPER



        SERIES ON QUALITY



 



    TOWARDS QUALITY IN A WORKING PAPER SERIES ON QUALITY



 



                       Robert M. Groves



                The University of Michigan and



                   U.S. Bureau of the Census



 



 



1.   Introduction.



 



   Although this meeting has the title of the "Seminar on the



Quality of Federal Data," its structure follows quite closely the



topics covered in the multi-paper series of Statistical Policy



Working Papers sponsored by the Office of Statistical Policy and,



Standards. There are as of this date, 19 Statistical Policy



Working Papers written since the first in 1978.  That is about 1.6



per year over the 12 years of the series, (see Figure 1).  They



range over a wide terrain, involving issues of the topical focus of



surveys to a set of methodological and statistical issues affecting



survey quality.



 



   I am unaware of the processes that led to my being asked to



give the keynote address at this meeting. I must admit that I



speak to you today as someone who has a very biased opinion about



the OMB Statistical Policy Working Papers - I love almost all of



them; I like the idea that they exist and only recently, because of



my change of job sectors, have I appreciated their worth from



another perspective.  I have used them in graduate courses for



students in survey methods (they are fine introductions to



important design topics).  I have used them in my research work



(they are unique sources of documentation about what goes on in the



Federal Statistical System).  I recommend them to others calling



for consulting assistance.



 



   Although I speak as a friend, 45 minutes of praise from me



wouldn't act to improve this series and runs the risk of "head



inflation" for those who developed the papers.  Instead, I want to



be a constructive critic and will  divide my remarks into several



categories:



 



 a. alternative goals of the OMB series



 



 b. the need for a structure to their topics



 



    I note that what follows are my personal views as a close



observer from afar of the system and a rookie member of the system.



 



 



 



 



 



 



 



 



                                 3



 



 



 



Click HERE for graphic.



 



 



                                 4



 



2. Alternative Perspectives on Goals of the Working Paper Series



 



2.1. OMB Series as Review of the State of Practice



 



    Some of the papers in the series address a topic that spans



many surveys of different populations (see Figure 2).  The papers



on coverage error and telephone data collection are examples of



this.  These kind of papers are compact summaries of the state of



the art on a current issue facing all surveys.  They often describe



activities in both household surveys and those in economic surveys.



Many times they end with case studies of different surveys across



the Federal system and how they handle the particular issue at



hand.



 



                             Figure 2



 



 Alternative Perspectives on Goals of the Working Paper Series



 



 1.  OMB series as a review of the state of practice



 



 2.  OMB series as agency cross-fertilization



 



 3.  OMB series as a prod to new developments



 



    These kind of papers are valuable to the extent that they have



deep depth and wide breadth.  By that I mean, they cover all the



sources of data quality and cover them in sufficient depth that



real learning is likely on the part of most readers.



 



    Let me first speak of breadth of topics. I find it most



simple to array the topics of the papers along the components of



total survey error (see Figure 3).  It is unfair for me to present



this chart without some clarifying remarks about the missing cells.



First, missingness does not imply absence of any treatment of the



topics.  Indeed, on sampling error, for example, many of the



reports comment on the impact of design options on sampling



variance.  Second, this structure is only one which could be



applied to classify the xx reports.  Considering the label of this



seminar "quality of Federal Data", however, I find it attractive to



use it here.



 



    Despite the weakness of any one classification scheme, let me



point out what I believe are weaknesses with the current status of



the series.  There is a distinct bias toward the household survey



domain to the detriment of the economic domain.  There is one paper



with the overarching title of "Quality in Establishment Surveys",



but the fact that it along exists underscores the problem.  This is



a reflection of the smaller literature in the methodology and



evaluation of quality of economic surveys, but it is a status that



 



                                  5



 



I hope will  change in the future. Why? We have in the past too



quickly assumed the following premises about economic survey



measurement:



 



 a. establishment surveys are too diverse to yield themselves



    to common methodologies or standards.



 



 b. establishment surveys do not face questionnaire design



    issues like those of household surveys because the



    information gathered is factual in nature



 



 c. establishment surveys have nonresponse properties that do



    not resemble those of household surveys.



 



    Each of these can be refuted with some observation of the



various establishment surveys now ongoing.  It is true that



establishment populations have large variation in size; that their



organizational structures are diverse; that their recordkeeping



practices are not standardized; that the ideal respondent for



different issues may vary across establishments.  All of this is



true, but should not lead to the extreme that there Are no common



problems either across different establishment surveys or between



household and economic surveys.



 



   As the Boskin report has observed, economic survey data needs



improvement and the working paper series could be one vehicle of



focusing attention on specific needs in this area.



 



   The next most important omission, in my opinion, concerns the



issue of nonresponse.  I must admit here that the work of the



National Academy of Sciences Panel on Missing and Incomplete Data



offers a comprehensive review of current theory and practice.



Conversely, the issue is vital to the unique inferential power of



probability samples and therefore cannot receive too much



attention.    Even the most basic issues remain unresolved:



relationships between response rates and nonresponse error;



relationships between likelihood of coverage and likelihood of



participation; cost/error evaluations of alternative methods of



improving response rates.  Mean square errors of survey estimators



stem from thousands of individual decisions to cooperate with the



survey request.  It behooves us to devote more energy to this and



the working paper series should do this.



 



   Third, the interviewer has largely been ignored.  It has been



ignored despite that fact that many Federal surveys use



interviewers to assist in the data collection, despite the fact



that evaluative procedures desperately need review and



reconceptualization, despite the fact that it is an area where both



statistics and social science perspectives work.  The attention to



the interviewer is even more important given the likely future in



which the traditional labor force of underemployed/overskilled part



 



 



                               6



 



time homemakers will decline and computer technologies are likely



to transform the job.



 



   Fourth, although large portions of data collection in the



Federal Statistical System is by mail and self-administered



questionnaire there is no focused treatment of the methodology in



the series.



 



   Fifth, a few comments specifically on error profiles.  When I



first read the CPS error profile 12 years ago, I had two reactions.



I was attracted to the literary form -- a compilation of quality



measures for the survey, combined with documentation of design



features.  I then felt and still believe that the structure of an



error profile is a valuable way to document leading components of



error in survey statistics (we should be grateful to Brooks and



Bailar as the mothers (or midwives) of the invention).  My second



reaction came after digesting the full report.  How little we as a



community seemed to know about the error properties of the CPS, the



largest ongoing and one of the most important ongoing Federal



household surveys.   Of the 80 pages of the report, for example,



only about 25 are devoted to the data collection operations, a



source of most of the errors in the process!  That combination of



reactions led me to the belief that I still have -- the error



profile, in the hands of intelligent program directors, can act as



an agenda setting document for quality improvement programs.



 



  Finally, there are no serious treatments of costs of data



collection - a topic I'll revisit in a few minutes.



 



   Let me now turn to issues of depth.  At their worst the



reports are catalogues -- they make great reading for someone



interested in buying an idea from those presented, but they don't



make thrilling reading for the uninitiated.  At the same time, they



often assume knowledge of various data series that is not Possessed



by many outside experienced statistical system staff.  As a



corollary, some fail to cite relevant research literature outside



that produced within the statistical system.



 



   Part of these features may be a matter of choice of audience.



I have assumed that the desired audience consists of both Federal



Statistical System staff and researchers in related fields from



academia and commercial domains.  The government, academic, and



commercial research sectors have much to gain from learning about



each others methods.  The paper series could be enhanced by seeking



input from the two other sectors.  At the very least, this might



entail a forced literature review within each paper; at a higher



intensity this might involve the subcommittee membership of those



outside the Federal system.  Even the input from outsiders may not



sufficient.



 



 



 



 



                                7



 



                           Figure 3



 



        Topics of Statistical Policy Working Papers



 



Multiple Error Sources         3 - CPS Error Profile



                           4  - Nonsampling Error Terms



                          13  - Federal Longitudinal Surveys



                          15  - Quality in Establishment Surveys



 



Coverage Error               17  - Coverage Error



 



Nonresponse Error



 



Sampling Error



 



Measurement Error: Interviewer



 



Measurement Error:           10 - Developing Questionnaires



   Questionnaire



 



Measurement Error:



  Respondents



 



Measurement Error:  Mode       6  - Uses of Administrative



  of Data Collection             Records



                           12  - Telephone Data Collection



                           19  - Computer Assisted Surveys



 



Processing                     2  - Statistical Disclosure



                            5  - Statistical Matching



                           11  - Industry Coding Systems



                           18  - Data Editing



 



Estimation                     7  - Time Series Revision



 



 



 



 



 



 



 



 



                              8



 



    Topics Not Classifiable Easily in Error/Quality Terms



 



 



Topical focus                   1 - Statistics for Allocation



                                   of Funds



                              16 - Reporting in Employer



                                   Data Systems



 



 



Administration                  8 - Statistical Interagency



                                   Agreements



                               9 - Contracting for Surveys



 



 



Other                          14 - Uses of Microcomputers



 



 



 



      Missing Topics of Statistical Policy Working Papers



 



 



Coverage Error                  Problems using households as



                               sampling frame elements



 



Nonresponse Error               Combining social science and



                               statistical models of participation



 



Sampling Error                  Statistical software for



                               estimation; generalized variance



                               models; alternative estimators for



                               public use files



 



Measurement Error:              Training; variance models;



    Interviewer                reinterview programs; monitoring of



                               telephone interviewers



 



Measurement Error.              Developmental methods in cognitive



     Questionnaire             laboratories; pretesting regimens;



                               imbedding experiments in surveys



 



Measurement Error:  Mode        Mail and self-administered



 of Data Collection            surveys; mixed mode surveys



 



Processing                      Statistical quality control;



                               automated coding



 



Estimation                      Model-based Estimation



 



 



 



 



 



 



                          9



 



2.2. OMB Series as Cross-Fertilization Among Federal Statistical



      Agencies



 



     In my fifteen years of working with Federal statistical



agencies from my academic base, I was consistently reminded of the



relative isolation of individual agencies from each other.  As most



people in this room know, it is not uncommon for very similar lines



of research and development to be pursued without much coordination



across agencies.  The arguments for this are that different



problems faced by the agencies demand different solutions.  The



arguments against are that functionally equivalent solutions are



often created by two different agencies at twice the cost.



 



     The working paper series has had, I believe, a beneficial



unanticipated effect at reduction on interagency duplication.



First, the subcommittees consist of members from several different



agencies.  Second, the tasks of the subcommittees often involve



collecting information from many statistical agencies.  The members



thereby learn of work going on in agencies they normally don't



visit.  Third, recommendations of the papers often seek to apply



standards across agencies, and the committees are forced to face



the difficulty of system wide standards.



 



     This is laudable and necessary.  Is it sufficient? Clearly



not.  That is, working subcommittees of the Federal Committee on



Statistical Methodology are temporary, normally have an agenda



limited to the report, and do not generally follow up on logical



conclusions of the report.  Our dispersed statistical system, with



all the benefits that specialization offers, misses opportunities



to implement recommendations of these working papers.



 



2.3. OMB Series as a Prod to New Developments



 



     Several of the papers treat topics where only one or two



agencies are making major contributions and most others fall



behind.  For example, the Time Series Revision paper, the industry



coding paper, the paper on computer assisted surveys, all fall into



this category.



 



     If I can temporarily put on the hat of an OMB staff member,



this perspective seems to be the most central to the goals of the



group.   If reports like this can serve to improve the quality of



work ongoing in several agencies, investments by one agency might



quickly reap benefits in many agencies.



 



     Some of the reports are poised for such effects, but the



statistical system seems to miss more opportunities than necessary.



Interagency agreements can be forged to promote such technology



transfer.  That is, consultation or subcontracting can be obtained



within existing regulations.  However,, this requires the target



agency to acknowledge the need for such upgrading.  Could OMB



 



                                   10



 



facilitate this process?  I am too naive to know, but the existence



of a pool of funds at the OMB staff level to assure the spread of



innovation across agencies through detail of staff and other



mechanisms would be productive.



 



   Are there areas of innovation that can profit from



coordination?  Certainly.  The use of CATI/CAPI is one that comes



to mind quickly.  It is now an area in which separate expenditures



are being made by several agencies, where no standards have been



well-defined, where different solutions, with essentially that same



cost/benefit structure, may evolve across different agencies.



 



   The prod to new developments, however, demands that the papers



end with a series of recommendations.  The authors should stimulate



the readers, dare I say, challenge the readers, toward improving



current practice.   After the detailed investigation needed for



these reports, they are uniquely qualified to offer such



recommendations.  Only a minority of the reports end with such



recommendations.  This should be part of the charge to each



committee.



 



 



3. The Need for a Structure of the Working Paper Series



 



   As I age, I must admit that I find more appeal in structures



that guide our research and development in survey design and



implementation, as opposed to reacting to each new idea without an



explicit framework.  In the academic world major theories provide



that structure; they help to identify what are the important



questions; they guide the development of new ideas.  The



application of the word "theory" to social and economic data



production is rare.   We do work that is guided by statistical



theories, social science theories, organizational theories, and



computer science theories.  We are, however, basically on the



applied side of research and development.  We have a data



collection and estimation vehicle (e.g., a survey) which is used



for many substantive purposes.  We are interested in knowledge that



improves the vehicle and less interested in anything else.



 



   As I understand the Federal Committee on Statistical



Methodology, the topics for papers are essentially the fruit of



discussions of the committee members.  This is fine for assuring



interest in the paper series among subcommittee members, but fails



to assure coverage of important topics.  I have suggested a total



survey error structure above.  The reports should have both



measurement and reduction of error in mind.  The widely perceived



worth of sampling error as a criterion of evaluation of data owes



its existence largely to well accepted estimators of the error.  We



currently lack comparably well accepted measures for nonsampling



errors, but the report series could be used as a vehicle to



stimulate such measures.



 



   Finally, another way to structure the report series is around



major problems facing the Federal statistical system in the near



and far term (see Figure 4).  These, in my view, should form the



core attention of the working paper series.  The first I mention



may be the most controversial.  The statistical literature on



survey design is schizophrenic on costs.  On one hand, there exist



models which demonstrate that only through knowing cost components



can design optimization be achieved.  On the other hand, there is



little serious treatment of survey costs by statisticians or those



from other disciplines.



 



                                 Figure 4



 



      Likely Problems Facing Federal Data in the Near/Far Term



 



  1.  Identification of cost components associated with error-



      related design features



 



  2.  Integration of question changes motivated by cognitive



      research into ongoing surveys



 



  3.  Public cooperation with data collection requests and



      coverage of subpopulations on sampling frames



 



  4.  Development of mixed strategy designs, tailored to



      diverse subpopulations



 



  5.  Development of nonsampling error indicators;



      implementation of statistical quality control procedures



 



  6   Training of statisticians and social scientists in survey



      research; recruitment/retention of trained staff



 



      The second issue has both a restrictive and more global



meaning.  First, the work ongoing in so-called cognitive



laboratories is seeking to identify principles influential of



measurement error in question-answer sequences.  The Federal



statistical system at the current time has no good mechanism for



the orderly introduction of change in questionnaires.  For the vast



majority of ongoing surveys, questionnaires remain static despite



evidence of improved alterative measures.  The value of unbroken



time series and the assumptions of canceling biases in over-time



comparisons are used to justify inactivity.  Americans have very



interesting reactions when they visit Cuba or see scenes of the



country.  They marvel at the maintenance of U.S. manufactured cars



in their original state from the 1950's.  They are at once proud of



the ongoing use of older vehicles and humored by the lack of



progress.  A U.S. auto manufacturer would quickly go out of



business if he were continuing to market 1950's designs.  Indeed,



 



                                    12



 



the watchword in that industry in continued investment in change,



designing systems to permit ongoing change, making change part of



the design.  Survey researchers are driving 1950's vehicles in the



1990's.  What we dearly lack is the will to mount ongoing programs



of ongoing improvement in data series.



 



   The third likely issue of import is the role of voluntary



participation in surveys over the coming years.  Some countries in



Western Europe have experienced political shocks to response rates



(e.g., Sweden, West Germany) . Public debate about surveys in these



countries has led to lower cooperation with survey requests.  In



some cases documented effects on survey statistics exist.  That is,



the nonresponse error becomes visible to even the most naive reader



of statistics.   At this point, there was little the researchers



were prepared to do in terms of reaction of field interviewers or



construction of adjustment schemes.  We must acknowledge that



public cooperation is a fragile base on which the scaffolding of



inference lies.  To improve participation or to adjust inference in



the presence of lower participation, understanding of the decision



to participate must be obtained.  This is an issue that faces the



entire statistical system, indeed, the entire industry of



information collection.



 



    The fourth issue is not unrelated to the problems of



participation.  As the diversity of the U.S. population increases,



survey designs that tailor procedures to different subpopulations



grow.  Large portions of the population remain covered by



traditional frames, cooperative and competent to provide



information using cheap data collection methods.  Others fail to be



covered on traditional frames, have difficulty providing



information, and fear harmful consequences from their



participation.  The coming years are likely to find greater appeal



in mixed design strategies -- multiple frames, multiple data



collection modes, tailored questionnaires to subpopulations.  The



models exist in the survey design literature, but they need careful



attention.



 



    The final problem listed above concerns a crisis looming ahead



for the social measurement industry in this country.  Like all



endeavors that require quantitative literacy social and economic



statistics are currently facing a shortage of qualified personnel.



If this were not bad enough, we also suffer from a worse problem --



the absence of ongoing training programs.  It's not merely that



students aren't entering the field; it's not clear how they can



within traditional academic programs.  Let's examine the problem.



Sampling statistics was well developed by the early 1950's; it is



not a "hot" area of development, attracting the best and brightest



of students.  Instead, a variety of analytic statistical



developments are more emergent.  Young Ph.D.'s labelling themselves



as sampling statisticians are unlikely to have an easy route to



tenure in an academic department.  Within the social sciences the



difficulties might be greater, with great pressure on students to



 



                                  13



 



develop areas of expertise which are central to the dominant



paradigms in the discipline.  Survey methodology is not one of them



in any discipline.  There are two results of this: 1) a gross



inadequacy of training of new staff coming into the statistical



system in topics relevant to survey quality.  (This is not a



comment on their training as statisticians, psychologists, or



economists.) and 2) a reduction in the number, of academic



researchers devoted to the craft of social measurement.  There is



a clear conclusion here: the statistical system has to get serious



about training of staff it needs for the future.  This means



support of specialized graduate programs, focused continuing



education, onsite training and other similar mechanisms.



 



     The two types of structure - quality/cost components of data



series and problems facing the system - suggest two paper series,



one devoted to technical issues, another to administrative and



professional issues.



 



 



4. Other Comments, Not Elsewhere Classified



 



     I must admit confusion about the term, "working paper series."



In an academic setting this term is used to describe papers in the



process of being refined or papers not worthy of being refined.



People are sometimes "working" on them.  The better ones change



over time, they evolve to a better state.  This doesn't seem to fit



well with the OMB Working Paper Series.  Most all remain in their



original state.



 



     I don't want to change the name of the series; I'd rather see



the series periodically updated.   Several of the papers were



valuable only for a short period of time (e.g., microcomputers;



telephone data collection).  Having a well-defined structure to the



series might define a set of ongoing updates of papers devoted to



individual topics.



 



     There in another connotation of "working" when attached to



paper series.   That is, they are "working" toward quality



improvements in the statistical system.  I like this connotation.



But it implies two burdens not uniformly accepted:  a) a set of



recommendations at the end of reports, b) follow through by OMB or



individual agencies to implement change.   On this definition, I



think, the paper series has not achieved full success.



 



     Another problem with the series are the costs and benefits



assigned to authors of the reports.  Contrary to my colleagues in



academia, statistical system staff rarely experience career-



enhancing effects of writing such papers.  There is the value of



education about other agencies, of "networking" with other members



of the statistical system, and of learning more about important



issues facing the system.  On the other hand, I've learned that



this is work essentially performed at nights and weekends by people



 



                                 14



 



already very busy.  Now, night and weekend work is commonly very



productive and I have no problem with such a plan.  What I do



regret (and think it bad for the health of the system) is that such



work is given so little value by many of the home agencies.  OMB



might consider remedying this with some more formal recognition of



the writers of these reports.  At the very least, the authors of



the report might be given a more prominent position on the covers



of the papers.



 



   It strikes me that this seminar is an ideal forum for



generating discussion on the future of this series.  I recommend



several questions:



 



   Have the basic issues changed since the report?



        - because of the paper?



        - in spite of the paper?



 



   Is it time to redo the paper, to update it?



 



   Are there subtopics now of sufficient importance that they



   deserve separate treatment?



 



 



5. Personal note



 



   This working paper series consistently contains the name of



one person, from the first to the last - Maria Gonzalez.  The



Federal Statistical System often focuses its attention on data



series structures and organizations, not people, but the success of



any endeavor that spans decades depends on key people.  In this



paper series the key person is unambiguously Maria.  As those of



you who know her well can attest, she has been a rock of



rationality, courtesy, integrity, and absolute honesty in her work



on the Federal Committee on Statistical Methodology.  She alone can



succeed in pressing overworked federal statisticians to take on



projects for the benefit of the whole system.  Her near unique



ability to suggest ideas in a manner that allows the hearers to



believe they are their own ideas is a marvel.  Her perseverance



toward important goals of quality improvement and coordination have



made the working paper series and this conference possible.



 



 



 



 



 



 



 



 



                                15



 



                                                    16



 



              Session 1



     SURVEY QUALITY PROFILES



 



 



 



 



 



 



 



 



                  17



 



 



                   THE SIPP QUALITY PROFILE



 



                       Thomas B. Jabine



                    Statistical Consultant



 



 



A. Introduction



 



   The Survey of Income and Program Participation (SIPP) is a



longitudinal national household survey which has been conducted by



the U.S. Bureau of the Census since 1983, following several years



of developmental research.  The goal of the survey, which uses a



rotating panel design, is to provide policy makers with



comprehensive and accurate data about the levels and determinants



of the income of U.S. persons and households and about their



participation in a broad range of income transfer and welfare



programs.



 



   The SIPP quality profile summarizes current knowledge about



the sources and magnitude of errors based on SIPP.   An initial



version of a SIPP quality profile was issued in 1987 (U.S. Bureau



of the Census, 1987) and an updated and expanded version was



prepared in 1989 (U.S. Bureau of the Census, 1990).



 



   This paper describes the purposes of developing a quality



profile for a survey or other statistical program and the process



of preparing and updating a quality profile, using the SIPP Quality



Profile as an illustration.  The contents of the updated version



will be discussed briefly.  Those who wish to evaluate the quality



of SIPP data on specific topics or to develop an overall judgement



about the quality of SIPP data are referred to the latest version



of the SIPP Quality Profile and the other sources of information



that it identifies.



 



   Section B outlines the development of the quality profile



concept and identifies some publications of the last 4 decades that



could be regarded as forerunners of the current model.  Section C



explains the origin of the SIPP Quality Profile.  Section D



provides an overview of the updated version:  its intended



audiences, purposes sources of information and structure.  The



contents are discussed briefly in section E.  In the concluding



section, I discuss the role of a quality profile in the broad



context of survey quality control and improvement.



 



 



 



 



                               19



 



B. Some Forerunners of the Quality Profile



 



    The theoretical foundation for a quality profile rests on



various models that have been developed for the measurement and



analysis of errors in surveys, especially the Census Bureau model,



which integrates components of sampling and nonsampling error and



the interactions between them (Hansen, Hurwitz and Bershad, 1959).



Dalenius (1974) formalized the concept of total survey design,



using the Census Bureau model to guide the allocation of resources



to minimize total error in a survey.



 



    Based on this foundation, there have been several broad



qualitative and quantitative reviews of the quality of data from



censuses and surveys, featuring direct and indirect data about the



various components of error.  Zarkovich (1966) published what was



perhaps the first systematic treatment of nonsampling errors in



surveys, with emphasis on procedures for their measurement and



control, and including numerous examples of specific information



about nonsampling errors from surveys and censuses in many



countries.    Bailar and Lanphier (1978), in a pilot test of



methodology for the evaluation of survey practices, reviewed the



quality-related design features of 36 U.S. surveys.  Their review



was not based on direct measures of errors, but the frequency with



which they found indirect evidence of low quality was high enough



to be disturbing and to suggest a need for greater attention to the



quality of survey designs and practices.



 



   A United Nations (1982) manual on Nonsampling Errors in



Household Surveys, prepared for use in developing countries,



systematically explores the different sources and types of



nonsampling error and provides illustrative data from numerous



household surveys throughout the world.  Statistical Policy Working



Paper 15 (Office of Management and Budget, 1988) performs a similar



function for Federally sponsored establishment surveys in this



country.



 



   Compilations of information about the quality of surveys have



two main audiences: survey designers/managers and users of survey



data.  To ensure that the latter have access to such information,



standards have been developed for the dissemination, in survey



publications, of information about errors.  An early example of



such standards was Census Bureau Technical Paper 32 (1974).  Today,



several Federal statistical agencies apply similar standards in



their publication programs.



 



   There have been some publications devoted entirely to the



quality of data on a specific topic in a census or survey.  An



early example was a detailed appraisal of the income data from the



1950 Census of Population (Conference on Research in Income and



Wealth, 1958).  The most immediate forerunner of the SIPP Quality



Profile was Statistical Policy Working Paper 3 (Brooks and Bailar,



1978), which provided an error profile for estimates of



 



                               20



 



unemployment from the Current Population Survey (CPS) Jabine



(1987) provided a detailed analysis of the quality of data on



chronic conditions reported in the National Health Interview



Survey.



 



     There are two fairly evident differences between the CPS error



profile and the SIPP quality profile.  The most obvious is the



switch from "error" to "quality" as the defining adjective for the



profile's content.  While this may seem to be only a semantic



change, it reflects a feeling, undoubtedly shared by the authors of



the CPS error profile, that the goals of such a publication are



constructive.  The use of the term quality seems more in keeping



with today's emphasis on quality control and improvement in all



kinds of endeavors, including surveys.  The other basic difference



is that the SIPP quality profile covers the quality of estimates



for all of the topics included in SIPP, whereas the CPS error



profile covered only one of the many topics included in that



survey.



 



     Other U.S. statistical agencies are undertaking similar



although not identical efforts.  The Energy Information Adminis-



tration, for example, periodically publishes reports in a series



called An Assessment of the Quality of Selected EIA Data series.



These reports rely largely on the technique of comparing data from



EIA surveys with more or less comparable data from other sources



and analyzing the differences that are observed.  Janet Norwood, in



a paper presented at the Census Bureau's Third Annual Research



Conference, stated that the Bureau of Labor Statistics was planning



to develop a comprehensive error profile for each of its surveys



(Norwood, 1987, pp. 217-218).



 



C. Origin of the SIPP Quality Profile



 



     The SIPP is a major longitudinal survey. The start of the



survey was preceded by several years of research and development,



an effort known as the Income Survey Development Program.  The



evolution of SIPP's complex survey design did not end when the



survey became operational late in 1983.  Methodological research



and evaluation studies have continued at a substantial pace and the



results of these studies, along with accumulated performance



statistics, feedback from users and adjustments made necessary by



reductions in funding, have led to significant changes in the



survey design and procedures.  Thus, SIPP is still in the early



stages of its evolution, in contrast to the Current Population



Survey which, although not immune to evaluation and improvement,



has reached a more mature and stable phase.



 



     In l984 the Social Science Research Council and the Survey



Research Methods Section of the American Statistical Association,



with the encouragement and support of the Census Bureau, estab-



lished a Working Group on the Technical Aspects of SIPP to provide



 



                                   21



 



advice to the Census Bureau on research priorities and the



translation of research findings into changes in the survey design



and procedures.  (The Social Science Research Council later



relinquished its sponsorship role.) An early recommendation of the



Working Group was that the Census Bureau prepare a compendium of



research results and other information about the quality of SIPP



data.  Members of the Working Group believed that a systematic



account of information about the different kinds of errors that



affect estimates from SIPP would be invaluable as a guide in



setting research priorities and applying the principles of total



survey design to SIPP.  Given the substantial amount of ongoing



research, they recommended that such a quality profile be updated



periodically, perhaps every two years.



 



    The Census Bureau accepted the Working Group recommendation



and produced the Quality Profile for the Survey of Income and



Participation (King, Petroni and Singh, 1987), early drafts of



which were reviewed by several members of the Working Group.  New



information continued to flow in at a rapid rate and toward the end



of 1988, Census decided that it was time to start work on an



update.  The updated version, published in mid-1990, was prepared



by the author of this paper with substantial assistance from Karen



King and Rita Petroni of the Census Bureau's Statistical Methods



Division.  Although the general structure of the two versions is



similar, the update contains much new material and some of the



earlier sections were significantly revised.  It also includes an



index.  The new version benefitted from reviews by several members



of the SIPP Working Group and Census staff.  Special thanks are due



to Daniel Kasprzyk and Rajendra Singh for their support of the



project.



 



 



D. Overview of Version 2



 



    The SIPP Quality Profile is intended to serve two main



audiences: "users of SIPP data and those who are responsible for



or have an interest in the SIPP design and methodology."  The



interests of these two groups are different.  Users want to know



how the errors associated with specific categories or classes of



data are likely to affect their analyses.  SIPP designers and



managers need to know the magnitude of errors associated with



specific design features, in order to control the quality of the



survey estimates and to guide the allocation of resources available



for their improvement.  Besides these two primary audiences, it was



expected that the publication would be of interest to persons



concerned with the design of longitudinal surveys other than SIPP



and to two special groups: the ASA/SRM Working Group and a Panel



to Evaluate the Survey of Income and Participation, convened by the



Committee on National Statistics at the request of the Census



Bureau.



 



 



 



                                 22



 



    Information about the components of error that affect SIPP



data comes from four sources:



 



    o    Performance statistics, such as unit and item non-



    response rates and reports based on quality control



    procedures used in data collection and processing



    operations.



 



    o    Methodological experiments.   Both in the developmental



    period and since the start of survey operations, there



    have been numerous methodological experiments involving



    design features such as length of questionnaire,



    respondent rules, use of respondent incentives, increased



    use of telephone interviewing and methods of adjustment



    for nonresponse.



 



    o    Micro-evaluation studies. The outstanding example is the



    SIPP Record Check Study, in which individual survey



    responses to questions about program participation and



    benefits were compared with administrative data for each



    of several programs.



 



    o    Macro-evaluation studies.  There have been numerous



    comparisons of SIPP data with data on the same topics



    from other surveys, especially the Current Population



    Survey, and from program records.



 



    Assembling the relevant documentation was a challenge.  SIPP



has probably generated more methodological documentation than any



other survey that has been in existence for a similar length of



time.  The list of 161 references provided in the updated version



of the Quality Profile, which includes only those items that were



actually cited in the report, is nearly double the size of the list,



included in the first version.  The most commonly used sources



were: the SIPP Working Paper series; the annual proceedings of the



Survey Research Methods, Social Statistics and Business and



Economic Statistics sections of the American Statistical



Association; the proceedings of the Census Bureau's Annual Research



Conferences; and internal Census Bureau memoranda.  The report



informs readers how to obtain copies of any of the internal



memoranda in which they are interested.



 



    Finding a suitable framework in which to present all of this



information about different components of error also presented a



challenge.  The traditional approach is to organize the material



according to the main phases of the survey: sample selection, data



collection, data processing and estimation.   The core of the



Quality Profile (Chapters 3 through 8) is, in fact, organized in



that manner, with one chapter devoted to sample selection, three to



data collection (covering data collection procedures, nonresponse



error and measurement error) and one each to data processing and



estimation.



 



                                 23



 



    Two important topics did not fit neatly within this framework.



Chapter 9, Sampling Errors, covers the procedures used to estimate



sampling errors and the relationship between sampling errors and



sample size.   Chapter 10, one of the longer chapters, is called



"Evaluation of Estimates" and covers both comparisons Of SIPP



estimates with data from other sources and indicators of errors of



undercoverage.   The remaining chapters, 1, 2 and 11, provide an



introduction, an overview of the survey and a summary,



respectively.



 



    The structure of the SIPP Quality Profile is similar to that



of its chief forerunner, the CPS Error Profile.  The main



differences are the division of the material on data collection



(called "Observational Design and Implementation" in the CPS Error



Profile) into three chapters, and the addition of the chapters on



sampling errors and evaluation of estimates.



 



    Our goal was to provide, insofar as available, quantitative



information about overall error and its components.  Hence, the



report includes 6 figures and 43 tables, a substantial increase



over the number included in the first version.  Space limitations



preclude inclusion of tables in this paper, but for those who may



be interested, the numbers of some key tables and figures from the



publication ate given in the following section.



 



 



E. Summary of Findings



 



Major sources of error



 



   The SIPP Quality Profile does not contain any broad



conclusions about how successful SIPP has been so far in fulfilling



its goals.  Our goal was to provide enough information about the



quality of the survey data so that individuals and groups like the



Committee on National Statistics Panel to Evaluate SIPP could reach



their own conclusions.  The summary chapter does, however, identify



what stood out as the three main sources of error in SIPP



estimates: nonresponse, differential undercoverage and measurement



error.



 



   As in any longitudinal survey, unit nonresponse increases in



succeeding rounds (called "waves" in SIPP) of the survey.



Table 5.1 (not included with this paper, see the report) shows the



data available as of 1989 on unit nonresponse by wave for each



panel of the survey (households and individuals in each panel are



interviewed 8 or 9 times, at 4-month intervals).  The rates are



relatively low -- 4.9 to 7.6 percent -- for the first wave, but



increase to over 20 percent at the final wave of each panel.  This



relatively high attrition is due in part to the difficulty of



tracking households and individuals that move, as is required by



the SIPP design.  The characteristics associated with Unit



nonresponse have been analyzed in detail, and these analyses have



 



                               24



 



guided the development of estimation procedures designed to



minimize the biases that result from differences between the



characteristics of respondents and nonrespondents.



 



    Item nonresponse has been low for core items on labor force



activity, income recipiency and asset ownership.  It has been



somewhat higher for income amounts, especially self employment



earnings and interest.  In the topical modules (questions not asked



in every wave), especially high nonresponse has occurred for



questions on asset amounts.



 



    Indicators of differential undercoverage in SIPP for



population subgroups defined by age, race and sex are shown in



Table 10.13 of the report.  The table shows the reciprocals of the



weights that are applied in order to make the simple unbiased



estimate for each subgroup agree with an independent estimate that



uses the Population Census count as a benchmark.  The group most



affected is young adult black males.  The ratios for black females



in the same age group are also quite low.  At least for the males,



the coverage ratios shown understate the amount of undercoverage,



because the ratios do not include any adjustment for census



undercoverage which is known to be above average for this



population subgroup.



 



    Similar patterns of undercoverage have been observed in the



Current Population Survey and other national household surveys.



The second-stage ratio adjustments used for both cross-sectional



and longitudinal estimates to compensate for undercoverage are



believed to reduce both the sampling error and bias of the



estimates.  The effects of these adjustments on sampling errors can



be estimated, but little is known about their affects on biases



associated with undercoverage.



 



    Measurement error takes many forms, but perhaps its most



significant manifestation in SIPP has been the seam problem, i.e.,



a pronounced tendency for survey respondents to report month-to-



month changes for months in adjacent waves at substantially higher



rates than for adjacent months within a single wave.  Figure 6.1 in



the report provides a graphic illustration of the seam effect on



reports of changes in earnings.  Pronounced effects have been noted



for most income recipiency and amount variables.  Because of the



rotation group design used in SIPP, cross-sectional estimates of



transitions are not likely to be seriously distorted by this



pattern of reporting, but it can affect estimates of the covariance



structure and may have adverse effects on multivariate analyses



dealing with transitions or length of spells.



 



     Table 6.6 in the report shows some early results from the SIPP



Record Check Study.  The sample sizes are small, and the table



shows results for only two of the four states included in the



study.   For the State of Wisconsin, significant levels of



underreporting were found for participation in two programs and



 



                                 25



 



benefit amounts in one other program.  The full results from the



Record Check Study will provide the best direct information so far



available on levels of measurement error in SIPP and will be a



valuable resource for studying the sources and correlates Of



response bias and response error variance.



 



Current research



 



   An active program of SIPP methodological and evaluation



research is continuing.  The main areas of research include:



 



o    The design of the questionnaires and the structure of the



   interviews.  Laboratory research is being conducted to



   study the cognitive aspects of SIPP interviews and how



   they relate to seam effects and other kinds of reporting



   errors.  Field experiments have been conducted to test



   the feasibility of providing feedback of prior wave



   information and encouraging greater use of records in



   interviews.



 



o    Interview mode.  An experiment with increased use of



   telephone interviewing is being evaluated to determine



   whether to adopt the procedures that were tested.  For



   the longer term the Census Bureau is arranging for the



   development of a prototype questionnaire for use in



   computer-assisted personal interviewing (CAPI), in order



   to evaluate the potential effectiveness of this



   collection mode in SIPP.



 



o    Estimation procedures. The broad goal for this area of



   investigation is to develop estimation procedures for



   SIPP that make effective use of auxiliary data available



   from both the Current Population Survey and



   administrative records.  An initial study of the



   feasibility of reducing variances by using IRS data as



   controls in the second-stage ratio estimation procedure



   showed considerable promise.



 



   Research in these and other aspects of the survey is



proceeding at a pace that suggests the desirability of preparing



updates of the SIPP Quality Profile on a regular basis.



 



   Areas of research that have been relatively untouched so far



include the effects of interviewer variance and the conditioning,



effects of repeated interviews on response error.  For the latter,



the overlapping panel design used in SIPP offers the possibility of



comparing cross-sectional estimates for households and persons that



have been in the sample for varying lengths of time.  There is also



a need to update some of the earlier evaluation studies in order to



monitor the effects of design changes since the beginning of the



survey.  Much of the research reported in versions 1 and 2 of the



 



                              26



 



SIPP Quality Profile, including the Record Check Study, which is



the only source of direct information on the site of individual



reporting errors, is based on data from the 1984 panel.



 



F. Conclusions



 



    Judging from some comments by users of the initial version and



reviewers of the preliminary draft of the updated version of the



SIPP Quality Profile, the systematic compilation and publication of



information about the nature and sources of error in a major



continuing survey like SIPP, with Periodic updates, is a worthwhile



undertaking.  A more definitive evaluation of utility will be



possible now that the updated version has been published and is



being widely distributed.  The author believes that the preparation



of quality profiles could be valuable in connection with efforts to



track and improve the quality of data from other major continuing



national surveys, such as the Current Population Survey, the



National Health Interview Survey, the National Crime Survey, the



Annual Survey of Manufactures and the Monthly Retail Trade Survey.



The technique is applicable to both household and establishment



surveys.



 



    Maintaining and improving the quality of survey data is a



never-ending job for survey designers and managers, and there is



room for a multiplicity of approaches.  Some Federal agencies are



making a strong commitment to the application, to survey



operations, of Deming's philosophy and techniques for total quality



management.  That approach implies not just measurement of errors



and identification of their sources, but modification of the survey



process as needed to eliminate or reduce the effects of significant



sources of error.  The other paper presented at this session



(Hanuschak, 1990) provides an example of this model of survey



quality management, with active participation and commitment to



quality improvement by key managers in the organization.  The same



commitment to the quality of data can be seen in the work of the



sponsors and participants in this Conference and they deserve our



thanks for it.



 



REFERENCES



 



Bailar, B. and Lanphier, M. (1978), Development of Survey Methods



to Assess Survey Practices, Washington DC: American Statistical



Association.



 



Brooks, C. and Bailar, B. (1978), An Error Profile: Employment as



Measured by the Current Population Survey, Statistical Policy



Working Paper 3, Office of Federal Statistical Policy and



Standards, U.S. Department of Commerce.



 



 



                                 27



 



Conference on Research in Income and Wealth (1958), An Appraisal of



the 1950 Census Income Data, Studies in Income and Wealth, Vol.23,



National Bureau of Economic Research, Princeton:  Princeton



University Press.



 



Dalenius, T. (1974), Ends and Means of Total Survey Design,



Stockholm: University of Stockholm.



 



Energy Information Administration (1983), An Assessment of the



Ouality of Principal Data Series of the Energy information



Administration (first in a series of "state of the data" reports),



Publication DOE/EIA-0292(82).



 



Hansen, M., Hurwitz, W. and Bershad, M. (1959), "Measurement Errors



in Censuses and Surveys", Bulletin of the International Statistical



Institute, 38:359-374.



 



Jabine, T. (1987), Reporting Chronic Conditions in the National



Health Interview Survey:   A Review of Findings From Evaluation



Studies and Methodological Tests, Data From the National Health



Survey, Series 2, No. 105, National Center for Health Statistics.



 



Jabine, T., assisted by King, K. and Petroni, R. (1990), Survey of



Income and Program Participation: SIPP Quality Profile, Bureau of



the Census, U.S. Department of Commerce.



 



King K., Petroni, R. and Singh, R. (1987), Quality Profile for the



Survey of Income and Program Participation, SIPP Working Paper No.



8708, Bureau of the Census, U.S. Department of Commerce.



 



Norwood, J. (1987), "What is Quality?" in Proceedings, Third



Annual Research Conference, Bureau of the Census, U.S. Department



of Commerce: 215-222.



 



Subcommittee on Measurement of Quality in Establishment Surveys



(1988), Ouality in Establishment Surveys, Statistical Policy



Working Paper 15, Statistical Policy Office, U.S. Office of



Management and Budget.



 



United Nations (1982), Non-sampling Errors in Household Surveys:



Sources, Assessment and Control, UN Publication DP/UN/UBT-81-



041/2, National Household Survey Capability Programme.



 



U.S. Census Bureau (1974), Standard's for Discussion and



Presentation of Errors in Data, Technical Paper 32, U.S. Department



of Commerce.



 



Zarkovich, S. (1966), Quality of Statistical Data, Rome: Food and



Agriculture organization of the United Nations.



 



 



 



 



                                  28



 



                INITIAL REPORT ON THE QUALITY OF



                   AGRICULTURAL SURVEY PROGRAM



 



                       George A. Hanuschak



            National Agricultural Statistics Service



 



 



I. Background and Introduction



 



    In December 1988, the National Agricultural Statistics Service



(NASS) formed a Survey Quality Team (SQT) for its Agricultural



Survey Program (ASP).  The ASP is a series of integrated multiple



sampling frame (area and list) based surveys throughout the



agricultural calendar year.  Some major items on the surveys are



planted and harvested crop acreages, hog, cattle and sheep



inventories, crop yields and production and on-farm grain storage.



There was a major survey redesign from individual MF surveys to an



integrated multiple frame survey program which was implemented over



several years (1984 - 1986).  The mission of the Survey Quality



Team is to identify and develop statistical process control (SPC)



methods for the management of the integrated Agricultural Survey



Program.  The SPC methods are based upon the fundamentals of total



quality management (TQM) techniques developed by Edward Deming,



Joseph Juran, Philip Crosby and other well-known TQM developers in



the TQM and SPC literature.  However, since much of the literature



refers to "manufacturing" situations, it was adapted to fit the



government agricultural survey situation.  Several papers by Ron



Fecso developed the basic model of survey quality used by the SQT.



The first major milestone of the SQT was to be the development of



a baseline "state of the survey" quality report.



 



    The mission of the SQT is quite broad, challenging and



critically important to the Agency's long term goal of routinely



and continually improving survey quality.  The team and the Agency



also face this challenge in the light of severe budget pressure, in



general, on Federal Statistics programs.  However, the team feels



that TQM and SPC methods are quite powerful tools, when properly



applied, that can aid in measuring and improving survey quality



over time.



 



    One of the first lessons of total process control is to define



the major steps in the total process.  In the case of the ASP, one



needs to first define or identify the major steps or stages of the



ASP surveys.  The survey quality team had identified the following



steps (Exhibit I) as the major 22 processes of the survey.



Unfortunately, each one of these survey stages or processes is



probably susceptible to some type of errors or biases.  The SQT



developed the following profile (Exhibit 11) of 24 potential



sources of error or bias in the ASP.



 



    Like any good statistical organization, the Agency has tried



to minimize the probability of various nonsampling errors occurring



 



                                29



 



in the survey process.  Controls include training, survey manuals



and instructions, Agency Policy and Standards Memorandum, quality



control checks on enumeration, reinterview studies, etc. 



Controlling and measuring nonsampling errors for a complex survey



process will remain extremely challenging even with the best



efforts at statistical process control.  However, in the remainder



of this report, the SQT defines and demonstrates how to use



statistical process control and total quality management techniques



to reduce total survey error over time.



 



 



                   Exhibit I - Major Survey Stages



 



 Survey Clearance



 Area Sampling Frame



      (Construction, Maintenance and Sampling)



 List Sampling Frame



      (Construction, Maintenance and Sampling)



 Survey Specifications



 Design of Questionnaires



      (Design, Print and Distribution)



 Preparation of Manuals



     (Interviewers, Supervisory and Editing)



 Prepare Survey Software



     (Data Entry, Survey Coordinator, Edit, Analysis, Summary,



     Data Base, Mail and Maintenance System, Etc.)



 National/Regional Training Schools



 Survey Management - Headquarters and State Statistical Offices



     (Coordination of Procedures)



 Presurvey Coding/Handling/Processing by State Statistical Offices



 State Training Schools



 Data Collection



 Data Collection Quality Control



 Manual Data Review and Coding



 Data Entry and Validation



 Data Edit and Review



 Imputation, Analysis and Summarization



 State Statistical Office  Review of Survey Results



     (including submission of estimates)



 Headquarters Review and Release Preparation



 Post Survey Updating



    (Data Base and List Sampling Frame)



 Post Survey Evaluations



 Survey Research



 



 



 



 



 



 



 



 



                                  30



 



 Exhibit II - Some Potential Sources of Total Survey Error



             in the Agricultural Survey Program



 



Undetected List Sampling Frame Duplication



 



List Sampling Frame (Old or Incorrect Control Data)



List - Undetected Reporting Duplication or other



   reporting/enumeration errors or bias



List Sources of Questionable Quality used for List Sampling Frame



   Build/Maintenance



Area Sampling Frame (Outdated Land Use Stratification)



List Sampling Frame (Any large operations not covered by the



   frame)



Area Sampling Frame (Outdated Sample Segment - Aerial



   Photography)



Different Farm Operation Description Questions



   on Different Questionnaire versions



Incorrect overlap/nonoverlap Determination



Incorrect Exception Report Handling (One Type of Survey Weighting



   Factor)



Incorrect Coding (List Adjustment Survey Weighting Factors,



   Completion/Imputation Codes, etc.)



Undetected Data Entry errors (pass all the way through the



   editing system)



Shift in Mix of Data Collection Modes (Telephone, Computer



   Assisted Telephone, Mail and Personal)



Shift in Mix of Respondents (Operator vs. Spouse vs. Other)



Incorrect Survey Master Records



Questionnaire Design (or Print) Errors



Unmeasured Major Changes in Survey or Estimation Procedures



   (Headquarters or State Statistical Offices)



Error in Known Zero Determination (Is Respondent Validly out of



   Business?)



Overediting/Underediting of Survey Data



Potential Bias in Manual or Machine "Imputation" Procedures



Lack of Formal Outlier Handling Procedures (Non Robust or Non



   Smooth Time Series Estimation)



Survey Processing Software



Shifts in Characteristics or Skill Level of Work Force



   {(Enumerators, Statisticians, Programmers, Support Staff)



   Experience in their current job, survey procedures



   knowledge, farm knowledge, statistics knowledge, technology



   skills, etc.}



Farmer or Respondent's level of understanding or grasping of



   survey reporting concepts and item definitions (Cognitive



   aspects).



 



 



 



 



 



 



 



 



                               31



 



II. The Components of Survey Quality



 



   When faced with the problem of measuring and improving the



quality of the ASP, one should consider the components of survey



quality.  Listing the components defines exactly what is meant by



the-term "survey quality" and highlights specific sub-areas that



need to be explored.



 



    Figure 1 shows the components of survey quality.  It was



developed by the Nonsampling Errors Research Section in the Survey



Research Branch of NASS and adopted by the SQT.  There are four



major components related to survey quality accuracy, resources,



timeliness, and relevance.



 



 



Click HERE for graphic.



 



 



   Accuracy is the component that first comes to mind when



thinking about survey quality.  NASS wants the survey indications



to be as accurate as possible.  Not only should the sampling errors



be small, but also the nonsampling errors should be minimized.  In



large-scale surveys the relative sampling errors can be smaller



than the relative size of the nonsampling errors.  Factors such as



undetected list sampling frame duplication, nonresponse,



questionnaire wording, mode of interview, change in respondent,



etc., can lead to substantial nonsampling errors.



 



   The second component of survey quality is resources.  Even if



a survey organization can control the sampling and nonsampling



errors, its ability to do so will be affected by the amount of



dollars that are available to spend on the survey.  The amount of



dollars has a direct impact on sample sizes, list frame quality,



pretesting, reinterview projects, editing programs, summary



programs, analysis, etc.  Also important is the amount and quality



 



                               32



 



of staff hours that can be devoted to a survey.  Staff hours are



affected by salaries, training, hiring practices, long-term career



development, and organizational climate; components that are also



greatly affected by the amount of dollars available.  Most people



quickly realize that the crucial problem is to take the fixed set



of available resources and use those resources in a way that



maximizes the survey quality.



 



    The third component is timeliness.  Of course, time could be



considered another element of resources -- like dollars and staff.



However, timeliness needs to be considered a component by itself



because timeliness is crucial in the survey process.  The impact



and usefulness of survey indications are greatly affected by



whether the survey data were collected one month or one year



earlier.  NASS has always stressed the need to collect data quickly



and to release estimates as close to the survey reference date as



possible.  Thus, the survey calendar -- which is used to time all



the steps of the survey -- is important to the survey quality.



 



    The final component is relevance.  Relevance is dependent on



the needs of the users of NASS statistics, and those needs change



from day to day.  It is useless for NASS to collect a high-quality



piece of information on farming if that piece of  information has no



relevance for the users of NASS statistics -- that piece of



information simply becomes a product without a buyer.  NASS must



constantly assess the needs of people using its statistics to make



sure that the collected information is relevant.  The second aspect



of relevance is internal to NASS.  An example of internal relevance



is whether the Agency wants direct expansion (level) or ratio



(percent change) or both types of estimators out of the ASP.



 



 



III.  Accuracy of Survey Soybean Acreage Estimates



 



    NASS has an expert panel of Agency statisticians called the



Agricultural Statistics Board.  (ASB) which reviews all survey



indications (often multiple indications for any one item), and



administrative or check data (such as the amount of soybeans



crushed in processing plants) and adopts or sets the official



estimates to be published.



 



    Two concepts need to be defined - use and fitness.  The ASB's



use of the ASP indications was chosen as the primary "use" of the



ASP.  "Fitness" for use is evaluated by setting a standard for use



and measuring adherence to the standard.



 



    Ideally we would have standards for all the components of mean



squared error (MSE) for the various commodity indications and



administrative data used by the ASB.  This would provide the



ability to create statistically well defined composites of the data



for use as the Board estimate or forecast.  As this time we have



measures of the variance for most indications, but have only enough



 



                               33



 



information about MSE's to recognize the importance of developing



more extensive MSE measures.  This section will provide information



for Agency management to assess which areas are most in need of



further study or research and/or corrective action.



 



 The ASB's specific need is to have indications which serve as



a solid basis for the official numbers.  The following chart on



soybean planted acreage display the degree to which the ASB has



found the ASP indications to be "fit for use."



 



 In reviewing the soybean planted acreage chart on ASB use you



will observe the following:



 



1. The Agricultural Statistics Board finds the area sampling frame



based June acreage estimate quite "fit for use."



 



2. The ASB does not find the integrated multiple frame based June



acreage estimate "fit for use."  It has an observed substantial



upward bias which also changed substantially in magnitude between



1987 and 1988 and stayed at the larger magnitude in 1989 and 1990.



Using Pareto analysis and an expert panel using TQM principles



applied to surveys, the SQT identified the major suspected causes



of the upward bias in the multiple frame based soybean acreage



estimate.  These suspected causes are:



 



 



 



Click HERE for graphic.



 



 



 



 



                            34



 



    1. Different Data Collection Methodologies



 



         The area frame based acreage estimate is based upon a



    sample of about 16,000 sample segments throughout the U.S.



    Data collection is done completely by personal interviews



    using an aerial photograph to locate each crop field and



    recorded on a questionnaire by the interviewer with the



    farmers direct participation.  Crop acreage data is collected



    and edited field by field.  Farmers are probed to report waste



    acreage for each field.  There are also five specific



    questions related to defining land operated now to which all



    the rest of the questions relate to.



 



         On the integrated multiple frame survey, the majority of



    data collection is done by telephone (both conventional and



    computer assisted).  The crop acreage data is collected for the



    entire farm (not field by field).  Therefore farmers are



    probed for waste acreage only once, at best, when reporting



    crop acreage.  There is no photographic aid for the farmer to



    refer to.  There is only one or two questions on defining land



    operated now.



 



    2. Undetected List Sampling Frame Duplication



 



         There are sophisticated record linkage tools to identify



    and remove duplication on the list sampling frame.  However,



    due to clerical resource constraints and funding to call



    farmers to resolve differences and the use of multiple list



    sources some duplication remains.   A special study was



    designed in 1989 to measure remaining duplication and the



    effect on the estimates.  The study showed that approximately



    10 percent of the acreage difference was due to obvious list



    frame duplication.



 



    3. No Formal Documented Outlier Handling Procedurers



 



         While there are several good analysis tools to identify



    outliers, there is no formal procedure for handling them.  The



    area frame based acreage estimator is quite robust since the



    average expansion factor is about 200 and the segment size is



    640 acres putting an upper bound on "influential



    observations".    For the list sample, expansion factors are



    considerably larger and farm size does not have much of an



    upper bound.  Thus it is much easier to get highly influential



    observations in the list sample.  Development of a formal



    robust estimator for the list sample is highly recommended.



 



 



 



 



                                 35



 



   4.   Different Imputation Methodologies



 



         There are also different imputation methodologies.  All



   imputation for the area frame is done manually by interviewers



   observations or statisticians.  In the case of crop acreage if



   a farmer refuses the interviewer can still observe most of the



   crop fields and the crop.  On the list sample, the imputation



   is a computerized algorithm that uses other reported survey



   data and list frame control data to impute for nonreported



   data cells.



 



   5. Undetected Reporting Errors



 



         Since the questionnaire design is different the



   undetected reporting error structure may also be different.



   For example, the screening questions on land operated on the



   area side are more detailed than the list questionnaire and



   may do a more accurate job of screening out landlords who are



   not active farmers at survey time.  New farm programs may have



   also led to the formation of more complex farming operations,



   which may involve a different reporting error structure also.



 



   6. Different Ratio Type Information and Sample Designs



 



        On the area frame sample there is an 80 percent overlap



   from one year to the next.   On the list frame sample



   (independent from year to year) there is negligible overlap.



   Thus the area frame sample also provides a paired sample ratio



   estimator.



 



   It is important to note that there have also been two rather



independent sources of data available to the ASB which also support



following the area frame level.  These are a Landsat satellite



based regression estimator (1980-1987) which for major soybean



states had variances at least twice as small as the direct



expansion estimator but also were unbiased when compared to the ASB



and direct expansion.  The second source is the calculation of a



soybean balance sheet which the ASB uses as an evaluation tool.  A



balance sheet takes the carryover from one crop year to the next



and adds crop production to that and then subtracts crop



utilization including exports from it to get a current balance.



These balance sheets also support the area frame based crop acreage



level.  Thus the agency has attempted to verify the correct crop



acreage level using several methods and independent data sources.



 



   Even though there is an observed upward bias in the integrated



multiple frame estimator for soybean acreage there are reasons for



keeping it and reducing the bias.  These reasons are:



 



 



                               36



 



1.  Later crop season yield and production estimates are tied



     to the integrated multiple frame (IMF) approach.



 



 2.  State and sub-state level estimates from the IMF have



     much better precision than the corresponding area frame



     estimates.



 



 3.  Solving the bias problem associated with soybean acreage



     may well improve the entire IMF which is a survey 6 times



     a year with an average of 20-40 items (multivariate in



     nature).  The Survey Quality Team has performed similar



     analysis for on-farm grain storage, and cattle and hog



     inventories.  Some of the bias issues are item specific



     but others are associated with the total survey process



     or components of the survey process.



 



 4.  The IMF approach is substantially more cost efficient and



     involves less respondent burden than the area frame



     approach.



 



     Most important is that the Agency is taking actions on all of



these expected causes in 1989 and 1990.  As previously mentioned



there is now an improved list frame duplication adjustment



procedure in place starting in June 1989.  There is a reinterview



research study being conducted in June 1990 to provide initial



measures of previously undetected reporting errors.  This study



will involve the reinterviewing of a subsample of the list sample



of farmers and record the crop data field by field and ask the more



detailed land operated questions and compare the results.  There are



also research efforts underway to examine the imputation



methodologies and to look at an across year design for list frame



based estimators and evaluate several robust estimators.  In



addition the SQT has provided several quality measures to be



monitored on the resource, relevance, timeliness and accuracy



dimensions which should become operational in 1990-91.



 



     The Agency is also developing alternative "proxies" to the



true item values in addition to relying on the ASB process.  An



operational reinterview/reconciliation survey is being conducted in



six major grain producing states in December 1990.  There has also



been an extensive operational soybean yield validation survey (198?



- current) where farmers are asked to harvest specific fields and



take just that grain to a grain elevator to be weighed and



measured.



 



     This "proxies" to true values are important in a survey



evaluation program but are also complex and expensive to develop



and implement.



 



     As previously mentioned, use of earth resource satellite data



has also been used by the Agency to develop more precise and



accurate crop acreage estimates.



 



                                  37



 



IV.  Summary



 



   It is the claim of the SQT that more consistent and timely



process improvements can take place by using the principles of



statistical process control and Total Quality Management.  More



formal survey quality measurement and monitoring mechanisms will



provide the Agency's management with more and critically important



information to manage the quality of the ASP.  Also, most of these



techniques will readily transfer to other survey programs in the



Agency such as Prices Paid and Received by Farmers, the Farm Costs



and Returns Survey, Objective Yield Surveys, Farm Labor Surveys,



and even to new programs such as Water Quality and Food Safety



Surveys, the National Animal Health Monitoring System and the



Monthly Yield Survey Program.



 



   There are several tools available for such a survey quality



management system.  First there are numerous charting techniques



such as bar and pie charts for resource information, Board



standardized indication graphs with standard errors, Gantt charts



to display, project management and survey schedule information,



upper limit and lower limit control charts, multivariate control



charts, Ishikawa fishbone diagrams and Pareto charts and analysis.



Many of these were used in an earlier effort by the Nonsampling



Errors Research Section when a statistical process control study



was conducted on the Soybean Objective Yield Program.



 



   Pareto analysis is one of the most powerful tools in quality



monitoring systems.  Pareto analysis ranks the potential errors in



a system from most serious to least serious.  The reasoning is that



in many systems and not just surveys, there are a "vital few" and



"trivial many" potential errors in the system.  Thus, the most



important beginning of evaluating the quality of a system is to



identify where it is most likely to break down or fail.  Once the



ranking of potential errors is accomplished, then it is recommended



to identify the allocation of resources for each potential error to



see if management is allocating resources in a fashion that will



truly minimize total survey error.  Many Pareto analyses have



demonstrated that the resource allocation was not in proper



alignment with the true error structure.



 



  Thus, more information on the true total survey error



structure and appropriate resource allocations, is being provided



to survey managers and administrators to form a basis for future



improvements in total survey quality.



 



  Considerable progress has been made by the Agency in



addressing quality issues in its integrated multiple frame



Agricultural Survey Program.   Many of the discoveries will



translate to improved quality on several other major Agency survey



programs as well.



 



                             38



 



References



 



Beller, N., "Error Profile for Multiple Frame Surveys," Statistical



Reporting Service, Research Report, 1979, Washington, DC.



 



Bosecker, R., "Integrated Agricultural Surveys,"  National



Agricultural Statistics Service, Research Report No. SSB-89-05,



Washington, DC, June 1989.



 



Fecso, R., "Survey Quality," Presented at the 2nd Quality Assurance



in Government Symposium, Washington, DC, May 1989.



 



Fecso, R., Pafford, B., Tremblay, T., Johnson, R., "Quality Profile



for Soybean objective Yield Survey," National Agricultural



Statistics Service, Unpublished Case Study, Washington, DC, 1988.



 



 



 



 



 



 



 



 



                                  39



 



                           DISCUSSION



 



                        Barbara A. Bailar



                American Statistical Association



 



I. What is a Quality Profile?



 



   The first quality profile was called an error profile and it



concerned the CPS employment statistics.  To be more positive,



error profiles have now become quality profiles.  The purpose is to



prepare a systematic and comprehensive account of survey



operations, listing the operations, the potential sources of



error, and how the error influences the uses of the survey



statistics.



 



   Quality profiles are still rare events.  When asked why there



are not more, survey producers have three main themes:



 



o    The staff resources that would go into producing a



   quality profile are too great and are in competition with



   other, more urgent needs.



 



o    Producing a report that tells about the errors in surveys



   would lead to less credibility in the statistics



   produced.



 



o    Admitting that there are errors is admitting that we



   haven't done our jobs well.



 



   In fact there are many benefits to producing quality



profiles.  Some of these are as follows:



 



o    to minimize total error, not just sampling error within



   given cost constraints



 



o    to force a thorough documentation of the survey process.



 



o    to guide a user on the effects of possible errors and



   their impact on specific uses



 



o    to develop a sound quality control program



 



o    to use in training programs for new staff in either



   operations or research; and



 



o    to use as the foundation for a sound research and



   analysis program



 



   The development of a quality profile parallels the survey



process and would contain the following elements:



 



                              40



 



1.  Objectives and specifications of the survey



2.  Sampling design and implementation



3.  Observational design and implementation



4.  Data processing



5.  Estimation



6.  Analysis and publication



 



   Given this as my basic understanding, let me comment on the



quality profile for SIPP and the quality assessment of the



Agricultural Survey Program (ASP).



 



   The two reports have some differences and some similarities.



The SIPP profile summarizes what is known about sources and



magnitudes of errors of estimates and addresses accuracy.  The ASP



report is written from the point of view of total quality



management and uses many of the ideas of Deming, Juran, and Crosby.



This report considers resources, timeliness, and relevance as major



components of quality, along with accuracy.  The aims of the two



groups seem to be quite different.



 



   The two reports each identify the same groups as their targets



-- the users of the survey data outside the agency and producers of



the survey inside the agency.



 



   Another similarity is that both look at major phases of the



survey operation, something essential for a quality profile.



 



   A difference in the two reports was that the SIPP report



actually identified four main sources of information on nonsampling



errors:



 



   Performance data



   methodological experiments



   micro-evaluation studies



   macro-evaluation studies.



 



   The ASP report was more concerned with process and how quality



would be assessed.  In fact, the report stresses the need not to



identify too many sources of error because tracking everything down



might take too long.  Actually, I think the total quality



management movement urges groups to use brainstorming techniques to



identify all possible problems and then Pareto analysis to decide



where to concentrate one's efforts.



 



   Another similarity is that both reports left out major steps in



the survey process.  The SIPP report briefly listed the objectives



of the survey, but said nothing about the objectives being



conflicting.  Producing a survey to give both cross-sectional and



longitudinal data has been a new experience for the Census Bureau.



The two objectives do conflict, at least from the resource point of



view.  There were some references to different needs in imputation,



but the resource needs have probably had more impact on the survey.



 



                                41



 



The ASP report did not even list objectives of the survey as a



potential source of error.  Neither report really addressed the



effects of staff training or compared the kinds of training, length



of training, etc.  It is fairly well known that performance data



does not correlate well with interviewer performance on accuracy.



Training could make a difference, but almost nothing is known at



the present time.



 



   Let me move now to some separate comments on the two reports,



starting with the ASP report.  There was a large group of people



who worked on this survey quality team.  Many of them have done



excellent work in survey methodology, so I think we can expect



great things from this group., The mission of the group is to



contribute to NASS's long term goal of routinely and continually



improving survey quality.



 



   The focus on quality at NASS has taken on the language of the



quality and productivity movement.  For example, they use a simple



definition of quality, "fitness for use."   This led them on a



search to decide what that meant and what objective criteria would



be.  Finally, they decided that they would measure it by comparison



with the Agricultural Statistics Board (ASB) estimate.  If the ASB



value is within plus or minus two standard errors of the survey



indication, then the survey indication is fit for use.  And, in



fact, they have five ratings:  ideal, acceptable, workable,



minimal, and out-of-control.



 



   I find it hard to see why the Agricultural Statistics Board



estimate would be used as the standard.  In some cases, there are



long time series and other indicators that the ASB uses to make its



estimate.   However, for some surveys they have much less



information.  Perhaps NASS is pushing the ASB to use the survey



indicators or explain why they haven't.  Though the example given



in the paper about the integrated multiple frame based June acreage



estimate was interesting, there will not always be that kind of



other data available to compare with.



 



   There is nothing about a Board estimate that measures accuracy.



In some ways, it is as if the SIPP people looked at one of their



macro indicators and said that if SIPP didn't come within two



standard deviations of that estimate, then SIPP was not fit for



use.  At least, with a macro indicator, one might be able to



untangle why estimates differ; that may not be possible to do with



the ASB.



 



   Following Deming's principles, I think the careful



documentation of every survey for which millions of dollars are



spent and on which important decisions are based is important to



the profound understanding of which Deming speaks.  A quality



profile tells you what you know and what you don't know but



should.



 



                                 42



 



   It was interesting to see that KASS also addressed resources,



timeliness, and relevance as major components of quality.  However,



it was not clear how criteria would be set or measurements taken.



The Gantt chart on the QAS was helpful in identifying time periods



and overlaps of one round of survey with the next but it did not



help individuals who have many surveys to work on identify



overlapping periods of high intensity.  The sentence "Too frequent



use of overtime to correct a process that is out of control usually



has a devastating effect on overall performance"  What does out of



control mean?  How does it affect overall performance?  How do you



know these things unless you keep careful records on hours worked



on a survey, overtime, and have some measure of a downturn in



performance?



 



    NASS has several good ideas about looking at relevance,



timeliness, and resources as well as accuracy.  It is an ambitious



undertaking.  I have one word of caution in their drive to use



total quality management techniques to help them.  They focus on



several tools available for a survey quality management system



including charting methods.  I agree that these are useful tools.



But what has been most helpful in the manufacturing and service



industries where TQM is used is bringing in a team that has hands-



on knowledge of all the facets of the survey.  The team would



include data collectors from states, edit specifications people,



estimation people, those who set objectives.  The tools would be



something the team would be taught to use to help them.  They would



all need to learn basic concepts of variability.  Only when all



these people participate, do you get the profound knowledge that



you need to improve a system, not merely tamper with it.  As you



recall, tampering with a system does not take care of the major



changes needed to remove high variability due to special causes.



 



   Let me now move to the SIPP report.  This is a good report that



gets periodic updating.  There are areas not covered in the report,



probably because they did not seem as urgent as the areas covered.



However, I do believe that we will need to see a section on



objectives, meeting multiple objectives, defining concepts,



translating concepts into questions, and so forth.  At the other



end of the survey, something needs to be said about analysis and



publication.



 



   Though the Census Bureau does not use the language of total



quality management, I know that they have thought along those



lines.  Using some of the performance measure standards flies in



the face of everything Deming preaches.  I'm talking about



standards for response rates:



 



     Outstanding................ 97.5 - 100.0



     commendable................ 95.5 -  97.4



     Fully successful........... 91.5 -  95.4



     Marginal................... 88.0 -  91.4



     Unsatisfactory............. 87.9 and less



 



                                     43



 



    Instead of setting arbitrary standards for response rates and



production, the Bureau needs to get a deeper understanding of what



is possible in each type of area in which it does surveys.  For



example, response rates can be charted with upper and lower control



limits for PSU's in New York City.  Probably the response rates



there very seldom, if ever, meet the commendable level.  However,



they may be within normal variability for that area.  Only with



positive efforts at changing the system can the response rates be



lowered.  This is partly what Dr. Deming thunders about -- blaming



the worker who may be doing the best he or she can when it is the



system at fault.  Again, this labelling of people's work does not



make the interviewer proud, and it is really tampering with the



system.



 



   The report gave lots of interesting information on household,



person, and item response rates.  Some of the non-response rates on



asset data are such that it seems questionable that the survey is



the right vehicle for collecting the data.



 



   There is also emphasis on the seam problem, but this is nothing



new.   As I recall, it also showed up in the crime survey.  It



seems that certain biases are endemic to longitudinal surveys.  So



far the Bureau has been content to catalog the measured effect.  We



really need some creative thinking and some money to get some



experiments going to look at recall errors, the placement of



events in time, and the time in sample problems.  Though dependent



interviewing may yield more consistent results, they may be no more



accurate.  Before action is taken to fix a problem, there needs to



be a deeper understanding of why the problem exists.



 



   There was very little information available on the extent of



editing, what it does, why changes are made, and what we call



editing and what we call imputation.   Beller made some very



pertinent comments in his 1979 error profile for NASS surveys.



"The amount of editing on some questions resulted in changing the



level of cattle and calves by an amount two or three times greater



than the error caused by sampling.  This amount of editing is cause



for alarm in that it clearly shows a breakdown in the survey



process."    In both the NASS surveys and SIPP, we need to get a



better picture -- a profound understanding -- of what editing is



doing to the data.



 



   One last point on SIPP.  The only direct estimates of sampling



error were for the third quarter of 1983 using 1984 panel data



collected in wave one.  The survey at that time was based on the



1970 census.  It certainly seems time to recompute variances.



Besides having incorrect variances, it seems like gilding the lily



when the analysts are making actual and implied comparisons that



they multiply by 1.6 times the standard error.  The interpretations



and the comparisons could be quite far off.



 



 



                                  44



 



 All in all, I enjoyed reading these papers.  I think the



documentation of SIPP is more complete but I think NASS is farther



along in trying to improve quality.  They do not want to document



only; their real goal is improvement.  I believe that is ultimately



the SIPP goal too, but no strategy has yet been set forward on how



to move in that direction.



 



 



 



 



 



 



 



 



                               45



 



                           Discussion



 



                      Nancy A. Mathiowetz



                  U. S. Bureau of the Census



 



   The data collected by Federal statistical agencies are used to



both shape federal policy and change the distribution of federal



expenditures; given the magnitude of the impact of these data, the



need for high quality goes without question.  In developing the



Quality Profiles, the agencies responsible for this work are to be



commended for continuing to move the discussion of error beyond



that of sampling error and into the realm of the measurement of



nonsampling error.  Although most agencies have for years provided



discussion of sampling error with release of their data and



research findings, we are just beginning to develop a standard of



reporting which includes a discussion of all of the components of



total survey error.



 



Sources of Nonsampling Error



 



   The sources of nonsampling error are many and include:



 



-    the design of the study (e.g. longitudinal vs. cross



   sectional; length of recall period;



 



-    the questionnaire, both the contents and the structure;



 



-    the interviewer;



 



-    the respondent; and



 



-    the post-survey processing, including coding and keying



   of data.



 



   Rather than reiterate issues raised in the Quality Profiles,



I would like to suggest some other topics of investigation within



these sources of nonsampling error.  My goal in doing so, is not to



criticize the work presented here, but to provide some ideas on



where these Quality Profiles could be expanded.



 



Design



 



   With respect to design, we still know little about the effects



of longitudinal designs on the level of error and the error



variance structure of reports over time.  There has been research



to indicate that respondents suffer from "conditioning" effects,



that is the changing of behavior or the reporting of behavior in



later interviews resulting from earlier interviews.   Some



conditioning may improve reporting in that the respondent knows



 



                              46



 



prior to the interview what are the nature of the questions;



conditioning may also result in a reduction in reporting since



respondents are now knowledgeable about the sequencing within an



interview.  In one study, the best predictor of error in reports of



functional status in the fourth round of interviewing is the length



of time it took to conduct the previous interviews.  The finding



suggests that conditioning effects may be reduced by something as



subtle as reducing the length of an earlier interview.  We need



further research to understand how conditioning impacts the



analysis of change over time and the structure of errors over time.



 



   Longitudinal designs may also be affected by changes in the



respondents the interviewer, or even the interpretation and meaning



of critical concepts in the questions, if the panel has a long



life.  With the proliferation of more longitudinal data collection



efforts within the Federal Government, more research into what



questions are sensitive and which are resistant to conditioning



effects as well as which items are most affected by between



interview changes, is necessary.



 



 



Questionnaire



 



   As noted in a lecture to the Society of Government Economists,



Janet Norwood stated that



 



   ...the quality of a statistical indicator is sometimes



   elusive and often difficult to define.  Effective



   measurement requires an underlying conceptual framework



   and careful identification of the phenomenon to be



   estimated....



 



   In the past 25 years, we have made great strides in



understanding how sensitive response distributions are to minor



changes in question wording.  The merging of literatures from



cognitive psychology, social linguistics, and social psychology



with survey methodology has presented use with new means for



attempting to reduce the levels of error associated with the



questionnaire.   What is now needed in the Federal statistical



system is a means for evaluating the various forms by which the



"same" information is collected and analyzed among various



agencies.    For example, in recent years, the proportion of



individuals lacking health insurance has been a critical issue.



The most widely cited data on insurance coverage comes from the



Current Population Survey, which asks whether each person in a



household was covered at any time during the preceding year.



Persons covered by any source at any time during the year are



counted as insured.  In 1987, the estimate for uninsured from the



March CPS was 17.6 percent.  Notice that this question asks whether



the person has been covered "at any time" during the previous year.



In contrast questions from the 1980 National Medical Expenditure



and Utilization Survey (NMCUES) and the 1987 National Medical



 



                               47



 



Expenditure Survey (NMES), both designed as one-year panel surveys,



indicate that point in time estimates of the uninsured (at the time



the person was interviewed) are approximately 14 to 16 percent at



any one cross-section, but that estimates for all year uninsured



are approximately 9 percent.



 



   There is some conjecture that the response to the CPS may



reflect a respondent's status at the time of the interview rather



than in reference to any time in the previous year, due to the,



similarity in the estimates from CPS and the cross-sectional



estimates from NMCUES and NMES.   From a policy perspective the



difference is critical -- whether to provide health insurance for



the chronically uninsured, approximately 21 million people, or



whether to provide insurance for all individuals ever uninsured,



which appears to be approximately 35 million people in a given



year.  Those attempting to address this issue would benefit from a



consistent definition of uninsured as well as a set of questions



which asks about a consistent time period.



 



Interviewer



 



   The use of response rates, hours per completed interview and



item nonresponse rates traditionally used as measures of



interviewer quality, only begin to capture the errors that are



potentially associated with the interviewer's task.  While each of



these measures provides us with information that we believe is



related to quality, we need to employ more measures that could be



used with respect to understanding error for individual questions.



How well do interviewers understand the concepts underlying the



questions they are asking?  Do they have sufficient training and



understanding to ask non-directive probes when necessary to obtain



an adequate answer?   The increased movement toward telephone



interviewing provides use with a means to routinely randomize



interviews across interviewers to obtain measures of interviewer



variance.  We spend millions of dollars in the training of



interviewers and yet know little about the most effective means for



training interviewers or determining their ability to conduct the



interview as trained.  The review of one or more interviews by a



supervisor provides some information, but if we believe that



training interviewers to read questions exactly as written is worth



the cost, we should be routinely evaluating the association between



the delivery of questions and the error associated with the



responses.



 



Editing and Coding



 



   As noted in the SIPP Quality Profile, much of the between wave



difference in industry and occupation appears to be a spurious



result of either data collection or data processing.  A similar



problem can be found in the coding of medical conditions and



 



                                48



 



surgical procedures based on household reported data.   Not only



coding, but also editing procedures, can contribute to the overall



level of error in estimates.  For example, Duncan and Mathiowetz



(1985), using microlevel validation data, found that trimming



estimates of change in income between two years, that is



disbelieving levels of change beyond a certain level as reported by



household respondents, a procedure often done in editing data from



longitudinal surveys of income, resulted in biased estimates of



change and bias in the coefficients predicting income levels and



change.  Retrospective reports of income were more likely to be



correct for those individuals with a large proportional change than



for those with little or no change.  The finding suggests that



editing procedures should be conservative and based on empirically



derived principles.



 



  Whereas we have learned to be sensitive to question wording



with respect to understanding potential sources of bias, and in



doing so demand documentation concerning question wording and



study design, few, if any, studies provide information on effects



of editing and coding processes.  If consumers of the data are to



understand all aspects of total survey error, coding and editing



decisions need to be researched and documented.



 



 



Adjusting for Nonresponse



 



  For the most part, nonresponse adjustments are made using



demographic and segment information and little if any information



concerning the nature of the nonresponse is factored into the



adjustment.  There is a growing body of literature which suggests



that using information from call records, specifically separating



refusals from those you were unable to locate, in a nonresponse



adjustment may prove beneficial, since difficult to locate (but



eventually interviewed) sample individuals look similar to



respondents who cannot be located.



 



   These comments are intended to extend the excellent work



presented in the Quality Profiles.  The profiles provide details on



the measurement of nonsampling error and the results of several



experiments to reduce these levels of error.  In addition.  I hope



that as others consider producing quality profiles these profiles



are expanded to cover some of these other issues.



 



 



Reference



 



Duncan, G.J. and Mathiowetz, N.A. A Validation Study of Economic



Survey Data, Ann Arbor, MI: The Institute for Social Research,



1985.



 



 



 



                                49



 



 



 



                                50



 



            Session 2



 



    PARADIGM SHIFTS USING



    ADMINISTRATIVE RECORDS



 



 



 



 



 



 



 



 



               51



 



 



 



               52



 



 PARADIGM SHIFTS:  ADMINISTRATIVE RECORDS AND CENSUS-TAKING



 



                        Fritz Scheuren



                   Internal Revenue Service



 



 



  There is a lot in the news lately about problems with the 1990



decennial census in the United States.  Many opinions have already



been offered about what went wrong and what should be done.



Indeed, a paradigm shift may be needed in census-taking.



 



   This brief note talks about the possible role administrative



records might play in a new paradigm.  To get things started, the



word "paradigm" might deserve some elaboration: a paradigm is a way



of thinking and then doing; a pattern of belief and behavior; a way



of seeing reality and using that sense to accomplish something.



Paradigms are common -- the way we get to work would be a humble



example.  Conventional census-taking, under this definition, could



be characterized as a major scientific and technical paradigm.



 



   As long as our paradigms work well for us, we tend not to



change them.  Occasionally, however, paradigms break down and have



to be replaced; e.g., the bridge goes out and we need to find



another route to work.  As Kuhn pointed out in his seminal book on



the structure of scientific revolutions, paradigms break down in,



science, as well (Kuhn, 1970).  Perhaps the most famous example of



this is the revolution in the thinking of astronomers that occurred



when the Ptolemic earth-centered view of the universe was replaced



by the Copernican view of an earth that revolved, with the other



planets, around the sun.



 



   If we look at the problems the U.S. Census Bureau has



encountered with the 1990 decennial census, it can easily be argued



that one of the major barriers to overcoming these obstacles is the



conventional census-taking paradigm.  Kish, in a recent paper he



has written for Survey Methodology (1990), considers at length some



possible alternatives.  My objective here will be to focus on two



of those areas -- rolling censuses and administrative registers



and to explore a new paradigm for the U.S. decennial census.



 



 



 



 



 



 



                                53



 



Conventional Census-Taking



 



    Conventional censuses, like those in Canada and the U.S.,



continue to do many things very well (e.g., Hammond, 1990).



Indeed, at present, we have no adequate substitute for them;



nonetheless, the need for at least some change seems compelling.



Rising costs are a big factor.  There have been many improvements



in census-taking in this century; still, in both Canada and the



U.S., total costs and even costs per person have risen



significantly:



 



o     The 1990  decennial census in the U.S. is budgeted at



    about $10 (U.S.) per person.   Even adjusting for



    inflation, this is a four-fold increase over what the per



    capita expenses were in 1960.  Item content differences



    between the two censuses are small and essentially not a



    factor in explaining the difference.  Both the 1960 and



    1990 Census, for example, asked only 7 population



    questions of everyone (U.S. Bureau of the Census, 1989).



    The Census long-form sample in 1960 contained 35



    questions and was to be completed by 25% of the



    population.   For 1990, the Census long-form sample was



    given to 16% of U.S. households and had 33 questions.



 



o     The situation in Canada is similar with regard to the



    costs of census-taking.  For example, the 1991 Canadian



    Census is budgeted at about $9.50 (CAN) per person.  Like



    the U.S. Census, there are again just 7, albeit somewhat



    different, population items that are asked of everyone.



    Like the 1990 U.S. Census, questions on housing are



    included for everyone (2 in Canada and 7 in the U.S.).



    In Canada, a 20% long-form sample will be employed in



    1991.  The Canadian long-form questionnaire has 45 items



    for 1991.  The 1961 census in Canada was quite different



    from that planned for 1991 and thus meaningful cost



    comparisons are hard to make.  Nonetheless, looking back



    30 years in Canada, the same long-term trend in census-



    taking costs seems to exist; however, per capita costs



    have been roughly the same -- even declining slightly



    in the last two or three censuses.



 



    The U.S. Census Bureau has looked at the growing cost of



conventional census-taking and concluded that a major change may be



needed (Browne, 1989).  Labor costs have grown appreciably in



recent decades in both Canada and the U.S.  Technological



improvements have not been great enough to offset these costs,



though some, like TIGER (Topologically Integrated Geographic



Encoding and Referencing) and CATI (Computer-Assisted Telephone



Interviewing), offer promise.   Greater attention in the U.S. to



improved population coverage is another important factor (Anderson,



1990).  The degree of public cooperation in the census also seems



to be dropping, at least as reflected by the poorer than



 



                                 54



 



anticipated mail response rate for the 1990 U.S. Census.  (It



should be noted that this same tendency is not clearly apparent in



Canada.)



 



   Increasing cost is not the only major problem facing



conventional census-taking.  Perhaps of even greater importance is



the growing rate of obsolescence of the information collected.  The



combination of rising costs and growing information obsolescence



has had the effect of reducing the benefit/cost ratio for



conventional censuses steadily and dramatically.



 



   To obtain more frequent small area data, some countries have



introduced quinquennial censuses.  For example, in Canada this was



first done nationally in 1956.  Budget problems led to the 1986



Canadian Census being cancelled and then reinstated.  Indeed, it is



unclear whether there will be a Canadian census in 1996.  While a



quinquennial census was also legislated in the U.S., funds were



never made available.



 



 



Rolling Censuses



 



    Conventional census-taking, of necessity, must sacrifice both



timeliness and item content (on a 100% basis) to achieve complete



spatial detail and high population coverage.



 



    One of the alternatives that Kish asks us to look at is a



"rolling census."  His proposal envisions the sampling of a country



over a decade in such a way that every area is eventually covered.



In its purest form, space and time become a single dimension and



content remains fixed, such that, at decade's end, we have obtained



cumulative information on the entire country for a given set of



items.



 



    The chief advantage of a rolling census is that it can avoid



the problem of information obsolescence at national and major



subnational levels.    For small geographic areas, though, there



would, of course, still be only one observation per decade.  Unlike



a conventional census, comparisons among small geographic areas



would be very difficult to interpret because the data are being



collected at different points in time (Fellegi, 1981).



 



    For a rolling census or survey, unit costs could be higher, as



Kish notes, than in a more conventional enumeration (indeed,



ceteris paribus, maybe even higher than the cost of existing survey



efforts) . In an age of fixed or declining resources, therefore, it



might not be possible to do a complete "enumeration" each decade,



even if content were significantly scaled back.  Rolling samples



would seem to have their greatest attractiveness not as a



replacement for conventional censuses, but, say, as part of a



strategy to link together census-taking with ongoing surveys and



 



 



                                 55



 



local area population estimates for the intercensal years (Herriot,



Bateman and McCarthy, 1989).



 



   Both the United States and Canada employ monthly surveys to



estimate the national (and some subnational) labor force



characteristics.  The Canadian Labor Force Survey (LFS) of 64,500



households covers 0.67% of the total Canadian population each



month.   "Given the rotation pattern in effect for the LPS, the



0.67% sample per month rolls up into a 6.7% sample of unique



households over a 5-year period" (Drew, 1989).  In the Canadian



context, at least, Kish's proposal may be feasible.  A sample



survey vehicle could be designed, with some reduction in the month-



to-month household overlap, which could achieve many of the



benefits he has stated for a rolling sample, while also meeting the



information needs currently met by ongoing household surveys (Drew,



1989).  This sample would not replace the 100% census count data,



itself, but, might be a partial substitute for Canada's 20% long-



form census sample.



 



  Because the United States has a population about 10 times



larger than Canada, the tradeoffs involving rolling samples and



overall country coverage are not as favorable as they are in



Canada.  The U.S. Current Population Survey (CPS), for instance, at



about 60,000 households, covers only .06% of the total U.S.



population monthly.  Even if cumulated over a whole decade (but,



with no change in its rotation pattern), the CPS would cover just.



roughly 1% of all U.S. households.  This does not compare well in



size to the overall 16% long-form sample being conducted as part of



the 1990 U.S. Census.



 



  To bring the rolling sample population coverage nearer to the



1990 U.S. decennial sample, major changes in the CPS rotation



pattern would be needed.  Other U.S. Census Bureau surveys might



also have to be redesigned if the objective were to achieve even a



partial substitute.  Despite these changes, moreover, the resulting



decade-long sample would still be only a small percent of the total



U.S. population -- perhaps, at best, in the 2% to 3% range,



assuming resources and other requirements remained essentially



fixed.



 



  In both Canada and the U.S., the likely higher unit costs of



a rolling sample may need to be addressed by changes in survey



procedures: how area segments are listed (Royce and Drew, 1988);



how first contact with households is made, etc.  Where is it



written, for example, that a personal interview contact is heeded



before using other modes of collection?



 



 It will be no mean challenge to keep effective sample sizes



equal for the major level and change components now obtained from



ongoing surveys (e.g., Tegels And Cahoon, 1982).  Some compromise



may be needed, moreover, in the extent to which the basic content



of the current long-form census samples can be included.  Despite



 



                             56



 



these challenges, or perhaps because of them, rolling samples



deserve continued serious attention and should be the focus of



extensive practical experimentation.



 



Administrative Registers



 



   With the flowering of scientific sample survey methods in the



1940's (Bailar, 1990), the use of administrative records for



statistical purposes became relatively less important in many



national statistics programs.  By the early 1980's, however, at



least in the developed countries, the pendulum had begun to swing



back.  Philip Redfern has been the major chronicler of this



phenomenum internationally (Redfern, 1987).  While the Danes seem



to have gone the farthest (Jensen, 1983 and 1987), major efforts



have been made in Canada (e.g., Statistics Canada, 1990) and even



some in the U.S. (e.g., Alvey and Kilss, 1990).



 



   A good summary of most of the key barriers to the greater use



of administrative registers for census-taking is found in Redfern



(1989) including the extensive discussion published with that



paper.  Perception barriers by the citizens (e.g., in Germany) are



mentioned as problems.  Psychological barriers by the national



statistical service may, however, be of equal or even greater



importance.  Major scientific "paradigm shifts" generally have this



problem (Kuhn, 1970).  Certainly, this seemed to be part of the



reason for the reception given to the proposal (made by me in 1980)



to explore the feasibility of making administrative records an



integral part of the U.S. Census of Population.  While a sketch of



such a proposal was eventually given at the 1982 American



Statistical Association meetings (Alvey and Scheuren 1982), it



seems, with a few fairly limited exceptions (e.g., Irwin, 1984-



Citro and Cohen, 1985), that serious interest at the Census Bureau



has been notably lacking.



 



   Suffice it to say that in the U.S. very little of the needed



research has been undertaken.   This is true, despite continuing



efforts to give the proposal prominence (Jabine and Scheuren, 1985



and 1987) and to get it discussed widely (Butz, 1985).  Sadly,



therefore, it appears that, in the United States, at least for the



year 2000, we should not expect administrative registers to replace



censuses.



 



   The 1990 U.S. decennial census could have been used as a



proving (or disproving) ground for some of the needed research into



administrative record alternatives.  Why that didn't happen is a



matter that can only be speculated about.  A contributing factor,



quite possibly, is a case of "paradigm paralysis" (Barker, 1988).



The literally decades-long controversy about whether to adjust



census "counts" seems to have locked the U.S. Bureau of the Census



into what some, at least, would call an increasingly sterile



intellectual position (Fienberg, 1990).  The viewpoint that they



 



                                    57



 



have adopted makes it very hard for them to see any alternative,



like a (partial) administrative record approach, that starts out



with the notion that adjustments would be required.



 



    The situation is different in Canada.  Since the late 1970's,



Statistics Canada has assembled many of the building blocks needed



to conduct an administrative record census (e.g., Drew, 1989;



Podoluk, 1987; Verma and Raby, 1989).  While much remains to be



done, such a change could even happen as early as 1996.  For                             For



example, the coverage of the Canadian tax return system, alone, is



quite high and growing.  In 1987, for instance, it has been



estimated that the coverage was about 94% -- i.e., about 3% less



than the 96.8% coverage achieved in the 1986 Canadian Census.  By



1991, the tax return coverage, alone, should be up to about 97% or



better, with overall administrative record coverage still higher



and likely to grow further in the 1990's. (See Table I for more



details oh administrative record coverage in Canada and the U.S.)



 



 



Click HERE for graphic.



 



 



 



 



 



    One concern often raised is that administrative registers,



even after they become adequate in quality and coverage, will be



 



                                58                                                       58



 



limited to only a few, bare demographic variables: head counts,



age, sex and little more.  An immediate observation concerning this



remark is that conventional censuses do little more than this,



themselves at least for the 100% items.  It is also evident that,



while the variables on administrative records are not the same as



those collected in a traditional census, there is more already



available than critics may realize (e.g., Meyer 1990; Alvey and



Scheuren, 1982).



 



   More important even than any current item content comparison



is the need to emphasize that the proposal to use administrative



registers in census-taking does not envision that administrative



records have to be used as they are.  Administrative records will



need to be changed.  In my personal opinion, limited optimism about



achieving needed changes is justified.  However, without a doubt



it is too much to expect of administrative records that they will



be able to capture exactly the same concepts now measured in



censuses and surveys.  Additionally, there almost certainly will



need to be special efforts, using existing census-taking



techniques, to separately enumerate certain groups.  The efforts in



the 1990 U.S. Census to count the homeless would be one such



example.



 



   Censuses and administrative records each have inherent



limitations.  Unavoidable Conceptual differences will be a major



barrier to any shift from one medium to another.  Administrative



feasibility is another issue; however, some hard-to-duplicate



census concepts (e.g., households) may not be as important to the



measurement process as formerly.



 



   Shifts in methodology (from conventional census to



administrative records) for some uses would potentially be



accompanied by a parallel shift in the underlying concepts



measured. some concepts may alter or expand in meaning, including



our ability to measure them (e.g., families).  We also must



ascertain the extent to which respondents answer survey questions



the same way they fill out administrative forms that may have real



direct impact in their lives.



 



    In recent years, traditional survey methodology has been



enhanced by new tools from the field of cognitive psychology.



These cognitive research tools could be used to understand any



conceptual differences between the meaning of terms when they are



used in surveys or drawn from administrative records.  We may not



have what we think we have anyway (Bates and DeMaio, 1989).  In any



case, there is already an extensive body of cognitive research that



can be drawn on (e.g., Dippo, 1987; Fienberg and Tanur, 1989; Jobe



and Mingay, 1990).



 



    It should also be pointed out that, most likely,



administrative registers will not be able to completely meet the



demands of modern society for richer sources of statistics.  Such



 



                                 59



 



demands, of course, appear to be insatiable.   Even if they were



not, administrative records will never have the flexibility and



responsiveness of surveys.  Registers, however, (including partial



ones like those that exist in the U.S.) when linked to survey data,



can be extremely important as auxiliary variables in making



improved direct national survey -- and even subnational survey --



estimates.  The U.S. Census Bureau's Survey of Income and Program



Participation research on the use of Internal Revenue Service data



for improving the precision of national survey estimates is a good



recent example (Huggins and Fay, 1988).  Indirect (e.g., synthetic)



estimates for small areas would still be needed for variables not



on the administrative registers (Platek, Rao, Sarndal, and Singh



1987).  The registers, though, might provide a source of valuable



symptomatic indicators.



 



 



Concluding observations



 



   The case for considering a "paradigm shift" in census-taking



seems compelling, at least in developed countries like Canada and



the U.S.  The rolling census alternative Kish proposes is probably



too expensive to fully implement as a complete substitute for a



census.  Rolling samples do offer real promise, however, if they



can be integrated into the current ongoing survey operations of



Canadian and U.S. national statistical programs.  Such samples



could provide a needed link in addressing small area estimation



needs that might otherwise not be met.  Less promising, but still



possible, is their use as a (partial) substitute for the census



long-form samples.



 



   As far as administrative registers are concerned, critics may



have been unduly pessimistic.  The Canadian situation, however,



differs from the United States:



 



o    In Canada, it is already within the realm of feasibility



   to combine rolling samples with administrative records as



   an alternative to conventional census-taking.  This is



   not to say that enormous practical challenges don't



   remain.  The 100% count portion of the Canadian census,



   though, could be done with administrative records as a



   starting point, augmented by a large-scale survey to



   measure and potentially adjust for undercoverage.  The



   Canadian 20% census long-form sample might be, at least



   partially, replaced by a rolling sample.  The content of



   the Census long-form is considerably richer than that of



   household surveys, but the content differences could be



   made up through additional questions "piggybacking" the



   on-going surveys at regular intervals.  Coverage issues



   surrounding the use of administrative records could also



   be addressed directly with rolling samples, especially to



   calibrate for changes in administrative records between



   censuses.



 



                              60



 



o   In the United States, the U. S. Census Bureau has begun to



   look at alternatives other than conventional census-



   taking (Bounpane, 1988).   Unfortunately, the research



   needed to look at an administrative register Alternative



   has barely begun.  Whether the Census Bureau will find a



   better approach than the use of administrative records



   and rolling samples remains to be seen.  Whatever other



   alternatives they study, however, the use of



   administrative registers as a partial replacement for the



   conventional 100% counts definitely needs to be



   considered.  A preliminary research agenda updating



   earlier ideas will appear in Scheuren, Alvey and Kilss,



   1990.



 



   Naturally, with such radical proposals, the answer is



uncertain.   Like Kish, I  believe that "the balance of variance



components" favors a change from conventional census-taking in most



cases.  However, as Kish states, "theoretical as well as empirical



investigations will be needed to decide matters" (Kish, 1990).



 



   In a change as big as the one proposed here, the "balance"



that needs to be struck goes, of course, well beyond looking at



variance (and bias) components.   One issue that needs to be



emphasized more is that some aspects, at least, of the paradigm



shifts being considered could go to the heart of the social



contract that exists between national statistical agencies and the



people that those agencies have a mission to serve.  For instance,



in the U.S. Constitution, there is a requirement that an



"enumeration" of the population take place every ten years.  Would



the use of administrative records or rolling censuses fit within



this "Constitutional paradigm?"  Perhaps the starting place is to



adopt a broader definition of "enumeration."



 



   Another example where social contract issues arise is the



extent to which the greater use of existing (or expanded)



administrative data for statistical purposes might be seen as an



unwelcome increase in the intrusiveness of the State into the



private lives of its citizens (Grace, 1989).  As legitimate as



concerns about "intrusiveness" might be, though, there is no



evidence in a North American context, at least, that they pose an



insurmountable barrier.  On the contrary, there have been virtually



no adverse public reactions to past U.S additions to administrative



records for statistical purposes (e.g., of residential address,



information in 1972, 1974 and 1980 tax returns).  To my knowledge



the issue, so far, has not come up directly yet in Canada, at least



at the Federal level.



 



    In summary, to make these kinds of changes there is the need



for a lot more scientific research.  Studying the implementation



technologies will be an even bigger job.  Finally, the issues go



beyond our profession and may well be settled in other arenas.



Wherever they are decided, it is incumbent on us, as statisticians,



 



                                 61



 



to frame the debate in terms of feasible options.  Hopefully,



exchanges such as ours today will help lead the way along that



path.



 



 



References



 



Alvey, Wendy and Kilss, Beth (eds.) (1990).  Statistics of Income



and Related Administrative Record Research, U.S., Department of the



Treasury, Internal Revenue Service.  See also Kilss, Beth and



Alvey, Wendy (eds.) (1984).  Statistical Uses of Administrative



Records: Recent Research and Present Prospects, vols. 1 and 2, U. S.



Department of the Treasury, Internal Revenue Service.



 



Alvey, Wendy and Scheuren, Fritz (1982).  "Background for an



Administrative Record Census," 1982 American Statistical



Association Proceedings, Social Statistics Section, pp. 137-146.



 



Anderson, Margo (1990).  According to Their Respective Numbers



...'  for the Twenty-First Time," Chance, vol. 3, no. 1, pp. 12-18.



 



Bailar, Barbara (1990).  "Contributions to Statistical Methodology



from the Federal Government," Survey Methodology, vol. 16, no. 1,



Statistics Canada.



 



Barker, Joel Arthur (1988).  Discovering the Future: The Business



of Paradigms, Institute for Information Studies.



 



Bates, Nancy A. and Demaio, Theresa J. (1989).  "Using Cognitive



Research Methods to Improve the Design of the Decennial Census



Form," Proceedings of the Fifth Annual Research Conference, U.S.



Bureau of the Census, 267-285.



 



Bounpane, Peter (1988). "A Sample Census:  A Valid Alternative to



a Complete Count Census?" 46th Session of the International



Statistical Institute.



 



Browne, David (1989).  "U. S. Bureau of the Census:  Facing the



Future Labor Shortage," Asian and Pacific Population Forum, vol. 3,



no. 4.



 



Butz, William (1985).  "Comment: The Future of Administrative



Records in the Census Bureau's Demographic Activities," Journal of



Business and Economic Statistics, vol. 3, no. 4, pp. 393-395.



 



Citro, Connie and Cohen, Michael L., eds. (1985).  The Bicentennial



Census: New Directions for Methodology in 1990.  National Academy



Press, Wash., DC.



 



Dippo Cathy (1987).  "A Review of Statistical Research at the U.S.



Bureau of Labor Statistics," Journal of Official Statistics, vol.



3, no. 3, pp. 289-297.



 



                                62



 



Drew, J. Douglas (1989)  "Address Register Development and its



Possible Future Role in Integration of Census, Survey and



Administrative Data," A paper presented at the U.S. Bureau of the



Census/Statistics Canada Interchange. (Unpublished)



 



Fellegi, I.P. (1981).  "Comments," Discussion of a paper by Leslie



Kish entitled "Population Counts from Cumulated Samples," Using



Cumulated Rolling Samples to Integrate Census and Survey operations



of the Census Bureau, An Analysis, Review, and Response,



Congressional Research Service, the Library of Congress.



 



Fienberg, Stephen (1990)  "An Adjusted Census in 1990?  An Interim



Report," Chance, vol 3, no. 1, pp. 19-21.



 



Fienberg, Stephen and Tanur J. (.1989).  "Combining Cognitive And



Statistical Approaches to Survey Design," Science, 243, pp.



1017-1022.



 



Grace, John W. (1989).  "The Use of Administrative Records for



Social Research," Statistics Canada Workshop, December 12, 1989,



Ottawa, Ontario.



 



Hammond, Robert B. (1990).   "The 1990 Decennial Census: An



Overview," Conference Proceedings, Advanced Computing for the



Social Sciences, sponsored by the Oak Ridge National Laboratory and



the U.S. Bureau of the Census, April 16-12, 1990, Williamsburg,



Virginia..



 



Herriot, Roger; Bateman, David V.; and McCarthy, William F. (1989).



"The Decade Census Program -- New Approach for Meeting the Nation's



Needs for Sub-National Data," to appear in American Statistical



Association Proceedings, Social Statistics Section.



 



Huggins, Vicki and Fay, Robert (1988).  "Use of Administrative Data



in SIPP Longitudinal Estimation," American Statistical Association



Proceedings, Section on Survey Research Methods.



 



Irwin, Richard (1984).  "Feasibility of an Administrative Records



Census in 1990," Special Report on the Use of Administrative



Records, Committee on the Use of Administrative Records in the 1990



Census, unpublished Census Bureau report.



 



Jabine, Thomas B. and Scheuren, Fritz (1985).   "Goals for



Statistical Uses of Administrative Records: The Next Ten Years,"



Journal of Business and Economic Statistics, vol. 3, no. 5, PP.



380-391.



 



Jabine, Thomas B. and Scheuren, Fritz (1987).  "Statistical Uses of



Administrative Records in the United States: Where Are We and Where



Are We Going?"  Proceedings of an International Symposium on



Statistical Uses of Administrative Data, J.W. Coombs and M.P. Singh



(eds.), Statistics Canada, December 1988, Ottawa, pp. 43-72.



 



                               63



 



Jensen, Poul (1983).  "Towards a Register-Based Statistical System



-- Some Danish Experiences," Statistical Journal of the United



Nations Economic Commission for Europe, vol. 1, no. 3, pp. 341-365.



 



Jensen, Poul (1987).  "The Quality of, Administrative Data from a



Statistical Point of View:  Some Danish Experience and



Consideration, Proceedings of an International Symposium on



Statistical Uses of Administrative Data, J.W. Coombs and M.P. Singh



(eds.) Statistics Canada, Ottawa.



 



Jobe, Jared B. and Mingay, David J. (1990).  "Cognition and Survey



Measurement: History and Overview," Applied Cognitive Psychology,



in press.



 



Kish, Leslie (1990).  "Rolling Samples and Censuses," Survey



Methodology, in press.



 



Kuhn, Thomas S. (1970).  The Structure of Scientific Revolutions,



Second Edition, Enlarged, The University of Chicago Press, Chicago.



 



Meyer, Bruce (1990).  "The Tax System: Comparisons of Demographic,



Labour Force and Income Results for Individuals and Families,"



Small Area and Administrative Data Division, statistics Canada.



 



Platek, R.; Rao, J.N.K.; Sarndal, E.E.; and Singh, M.P. (1987).



Small Area Statistics, New York: Wiley-Interscience.



 



Podoluk, J. (1987).  "Administrative Data as Alternative Sources to



Census Data," Proceedings of an International Symposium on



Statistical Uses of Administrative Data; J.W. Coombs and M.P. Singh



(eds.), Statistics Canada, December 1988, Ottawa, pp. 273-290.



 



Redfern, Phillip (1987).  "A Study of the Future of the Census of



Population: Alternative Approaches," Eurostat Theme 3 Series C,



Luxembourg:  Office for Official Publications of the European



Communities.



 



Redfern, Phillip (1989).  "Population Registers:  Some



Administrative and Statistical Pros and Cons," The Journal of the



Royal Statistical Society, Series A (Statistics in Society), vol.



152, pt. 1, pp. 1-41.



 



Royce, Don and Drew, J. Douglas (1988).  "Address Register



Research: Current Status and Future Plans," 1991 Research and



Testing Project, 1991 Census, Statistics Canada, Ottawa.



 



Scheuren, Fritz; Alvey, Wendy and Kilss, Beth (1990).  "Paradigm



Shifts: The Integration of Administrative Records and Surveys," a



paper delivered at the 151st Annual Meeting of the American



Statistical Association, August 7, 1990, in Anaheim, CA.



 



 



                                64



 



Statistics Canada (1990).  "Research Papers and Reports,"



Bibliography Small Area and Administrative Data Division, Ottawa,



Ontario. (unpublished)



 



Tegels, Robert and Cahoon, Lawrence S. (1982).  "The Redesign of



the Current Population Survey: The Investigation into Alternate



Rotation Plans," Proceedings of the American Statistical



Association, Survey Research Methods Section.



 



U.S. Bureau of the Census (1989). 200 Years of U.S. Census Taking:



Population and Housing Questions, 1790-1990, Superintendent of



Documents, U.S. Government Printing Office, Washington, DC.



 



Verma, Ravi B.P. and Raby, Ronald (1989).  "The Use of



Administrative Records for Estimating Population in Canada," Survey



Methodology, vol. 15, no. 2, pp. 261-270.



 



 



 



 



 



 



 



 



                               65



 



                 AN ADMINISTRATIVE RECORD PARADIGM:



                        A CANADIAN EXPERIENCE 



 



                              John Leyes



                          Statistics Canada



 



1.0. Introduction



 



   In 1979, Statistics Canada began a formal review of the poten-



tial of using administrative records for social statistical



applications for small area data (Statistics Canada, 1979).  Based



on this review, it was concluded that the highest coverage of the



population and the greatest potential for social administrative



data would arise through the use of the personal income tax re-



cords.  With few exceptions, then, this paper considers data



derived from the personal income tax file in Canada.



 



   The Canadian tax system differs from the U.S. tax system.  For



example, in Canada, there is no joint filing; and the tax system is



used as an instrument to provide benefits to persons and families



with low incomes.  The personal income tax return is known as the



T1. The Tl serves a purpose similar to the IRS' Form 1040.



 



     In its earliest days, Statistics Canada's work with the



personal income tax file was subject to a number of expected a



priori shortcomings.  These shortcomings represented an adminis-



trative records paradigm (or rules of the game).  The shortcomings



included the following:



 



o     Population Coverage.  The income tax system is based on



     individuals only.  Since only 60% of Canadians were



     filing tax returns in the mid-1970's, coverage was deemed



     inadequate for social statistical applications.



 



o     Population Coverage Bias.  The age profile of taxfilers



     differed from the age profile of the population.  This



     was judged to be an unacceptable bias.



 



o     Income Coverage.  Not all income received by Canadians is



     taxable; hence, the income coverage of the Tl was



     considered incomplete.



 



o     Income Distribution Coverage.  Since both the elderly and



     the young frequently have low incomes and do not file tax



     returns, data from the tax file would be inadequate for



     public policy purposes directed at these target groups.



 



 



 



 



                                  66



 



o    Dimensionality of Variables.  Since any single ad-



    ministrative record has a specific and narrow application



    in program administration, the range of data variables



    were also judged inadequate as a source of social data.



 



o    Concepts and Definitions.  The concepts and definitions



    used in household surveys and censuses of population can



    only be approximated through the use of an annual tax



    file.



 



    Each of the above represented a limitation or shortcoming for



data derived from administrative records in general, the Tl in



particular.  In spite of these shortcomings, the work began, and



this paper is directed at a few findings that resulted from work in



Canada with the personal income tax records in the development of



family data since 1984.  Perhaps this paper may even indicate some



of the potential of using the Tl as a source of small area data in



post-censal periods in Canada.



 



 



2.0. The Development of Taxfiler Family Data



 



    The taxfiler family concept has been designed to emulate the



census family concept.  A census family:



 



  "[r]efers to a husband and a wife (with or without



  children who have never married, regardless of age), or



  a lone parent of any marital status, with one or more



  children who have never married, regardless of age,



  living in the same dwelling.  For census purposes,



  persons living in a common-law type of arrangement are



  considered as now married, regardless of their legal



  marital status; they accordingly appear as a husband-wife



  family in census family tables."  (Statistics Canada,



  1982, p. 29)



 



  This concept is suitable for household collection methods



since respondents are asked to report on the relationships between



all residents of a dwelling.  With administrative records,



secondary information such as reported marital status, value of



exemptions/tax credits, ages of taxfilers, addresses, child care



expenses, and so forth, are used for forming families.



    It has not, therefore, been possible to emulate the exact



census family concept.  The major sources of difficulty arise with



older children (whether they have ever been married or not when



they reside with their parents) and with common law couples.  In



 



 



 



                              67



 



general, the census family concept works reasonably well for



families with dependent children, and some success has been



achieved in estimating single parent and common law families, as



can be seen in Table 1.



 



     In 1984, Statistics Canada began estimating families from the



individual taxfiler (Tl) data.  The creation of families from the



Tl is based on a six-step process:



 



  i. Taxfilers, reporting the Social Insurance Numbers (SIN)



     of their spouses, are matched to form husband-wife



     families;



 



 ii. Other husband-wife families are formed from taxfilers who



     declare themselves married but do not report spousal



     SINs;



 



iii. Non-dependent filing children who reside with their



     parents are matched to their parents;



 



 iv. There is an intermediate step to unduplicate records, to



     identify one-filer husband-wife family units, to assign



     a unique postal code to family members, and to assign a



     family composition type to each family unit;



 



  v. Common law spouses are matched from the pool of in-



     dividuals classed as single parent families and non-



     family persons; and



 



 vi. In Step 6 non-filing family members are imputed.



 



 



    With this brief introduction and description of the taxfiler



family data, it is now possible consider some data findings.



 



 



3.0. The Coverage Shortcomings, Some Empirical Findings



 



3.1. Population Coverage Comparison: 1985 Taxfiler Family File



     (T1FF) to 1986 Census of Population



 



    The taxfiler family data have been placed into four clas-



sifications: husband-wife families, single parent families, common



 



 



 



                                  68



 



law families, and non-family persons.  The data in Table 1 reflect



the first three of these classifications (common law families are



noted twice, once as husband-wife families, and then separately).



 



   In creating the taxfiler family (T1FF) data, a record is



created for each family member and for each non-family person.



Thus, there is a record for a taxfiler and a record for each person



that is imputed.  Line three of Table 1, therefore, is an estimate                                                                                                   estimate



of the TIFF population that can be identified through the tax                                                                                                      the tax



system.



 



 



Click HERE for graphic.



 



 



 



 



 



                                69                                                                           69



 



   There are several highlights in Table 1:      



 



o    The T1FF population has varied between 93.7 and 95.8 per-



    cent during the 1982 and 1987 period (line 6 of Table



    1);



 



o    The number of total tax family records (taxfilers plus



    imputed) increased at a slightly lesser rate than the



    number of taxfilers alone (i.e. lines 2 and 4);



 



o    Coverage of husband-wife families was slightly higher



    than coverage of total family records (i.e., lines 6 and



    9); and



 



o    single parent overcoverage decreased for 1986 and 1987.



 



3.2.  Population Coverage Bias, 1985 T1FF to 1986 Census



 



    In Table 2, some broad age range comparisons have been



included.  The first age range is, perhaps, a bit unusual since it



includes the population 29 years of age and below.  This age range



resulted from an arbitrary decision, namely, that the maximum age



of a matched filing child could be 29.  Furthermore, for imputed,



non-filing dependent children, there is limited age information and



no gender information.  Thus, children, whether imputed as depen-



dents or identified as taxfilers who reside with their parents,



have been placed into one.age range.



 



    In reviewing column 4 of Table 2 (i.e., % ratio), it can be



noted that the coverage of the T1FF to the 1986 Census was approx-



imately 50% or higher for age ranges under 60.  The T1FF coverage



of the 1986 Census population declined more rapidly for the popula-



tion 65+.



 



 



 



 



 



 



 



 



 



                                 70



 



Click HERE for graphic.



 



3.3. Coverage of Aggregate Sources of Income



 



    In conducting the 1986 Census of Population, sources of income



data were collected for the 1985 calendar year.  Table 3 contains



a sources of income comparison between the 1985 T1FF and the 1986



Census.



 



     For both data sources, the largest component of income was



wages and salaries.  The T1FF estimate was 96% of the Census



estimate.  In the government transfers section of Table 3, con-



siderable variability existed, primarily because some transfer



payments were either not subject to taxation or were received by



individuals with low incomes who did not file a Tl.



 



3.4. Income Distribution Coverage



 



    Table 4 includes a time series comparison of median incomes



between the T1FF and the Survey of Consumer Finances (SCF) for the



period 1982-87.  The T1FF medians were lower for all years.  More-



over, the medians were about 95% for the first four years.  In the



fifth and sixth years, the medians declined to about 92%.  This                                                                                            This



decline can be partly attributed to the introduction of a



refundable Federal Sales Tax Credit.  This credit was available to



 



 



 



 



                                                                                            71



 



individuals and families with low incomes some of whom may only



file a tax return to obtain this credit.



 



Click HERE for graphic.



 



 



    Since it is generally assumed that taxfilers have higher in-



comes than non-taxfilers, one would expect the SCF to have lower



medians since some respondents would have low incomes and not file



tax returns.  Clearly, these findings are inconsistent with such an



expectation.



 



3.5. Dimensionality of Variables



 



    Since any single administrative record (for example, the Tl)



has a specific and narrow application in program administration,



the range of data variables might be judged inadequate as a source



of social data.  Although the T1FF data are oriented to the income



tax system, Table 5 indicates (mainly by reference to the



footnotes) that some comparability in the variables existed between



the 1985 T1FF and the 1986 Census.



 



 



                             72



 



Click HERE for graphic.



                                                                                         

 



 From Table 5, it is clear that the T1FF data,lack the richness



of the census data.  T1FF has a low coverage of non-taxable sources



of income and a low coverage of those taxable sources of income re-



ceived by low income persons who do not file tax returns.



 



4.0. Major Directions for 1989+



 



  Two new initiatives have been started.



 



o  The development of a pilot Longitudinal Administrative



  Database (LAD) to enable research studies of poverty/wel-



  fare/income dynamics in Canada for the period 1982-86.



  The LAD was designed as a 10% sample to parallel the



  Panel Survey of Income Dynamics (PSID) that was begun by



  the Survey Research Center, University of Michigan about



  20 years ago. (Duncan, 1984)



 



o  The development of an Administrative Record Consolidation



  File (ARC) through the linking of multiple records on a



  sample basis for the purpose of (a) improving the cover-



  age of the population and (b) improving some of the vari-



  ables on the taxfile.



 



5.0. Summary and Related Observations



 



   The T1FF data possess some positive characteristics.  The data



are annual and small area estimates can be produced.  Furthermore,



if 95% is high coverage, the comparisons in this paper have



indicated a fairly high coverage of the population by the T1FF.



 



                               73                                                          73



 



  One potential benefit of administrative data seems to lie in



the domain of longitudinal databases.  While longitudinal surveys



can only be created in the future, based on current decisions and



funding, longitudinal administrative databases can be created



retrospectively, based on current decisions and funding.  For



example, a decision was made in late 1988 to begin creating a



longitudinal database for the period 1982 to 1996.  While the data-



base has not yet been completed, early indications are that the



database will be a source of useful information for the development



of social policy and and for the analysis of income dynamics.



 



  To conclude, this paper has been prepared to illustrate some



findings that may not be widely known.  In preparing this incom-



plete report on an evolving new paradigm in Canada, it is hoped



that members of the research and statistical community will provide



comments and insights that will improve and stimulate the continued



evolution of this work.



 



 



 



 



 



 



 



 



                             74



 



                                                     



 



 



Click HERE for graphic.



 



                             75



 



                                                                                           

References



 



Duncan,, Greg J. (1984) Years of Poverty, Years of Plenty.  Survey



Research Center, Institute for Social Research, The University of



Michigan, Ann Arbor, Michigan. 1984.



 



Statistics Canada.  (1979)  Final Report of the Task Force on



Administrative Data Development.  UnPublished.  Ottawa, Canada.



March 16, 1979.



 



Statistics Canada. (1982)  1981 Census Dictionary.  Statistics



Canada, 1981 Census of Canada.  Catalogue Number 99-901.  Ottawa,



Canada.  May 1982.



 



Statistics Canada. (1988) Postcensal Estimates of Families,



Canada.  Provinces and Territories.  Catalogue Number 91-204.



Ottawa, Canada. 1988..



 



Statistics Canada. (1989) Total Income: Individuals. 1986



Census of Canada. Catalogue Number 93-114. Ottawa, Canada.  March



1989.



 



Statistics Canada. (1990) Bibliograrphy: Research Papers and



Reports, Small Area and Administrative Data Division.  Small Area



and Administrative Data Division, Statistics Canada.                       Ottawa,



Canada.  Unpublished.  February 22, 1990.



 



Statistics Canada. (Annual) Family Incomes: Census Families.



Survey of Consumer Finances.  Catalogue No. 13-208 (Annual).



Ottawa, Canada. 1989.



 



Vigder, Michele and John Leyes. (1989) Administrative Data to



Mid-Decade Census Data Comparison.  Small Area and Administrative



Data Division, Statistics Canada.  Ottawa, Canada.  Unpublished.



January 31, 1989.



 



 



 



 



 



 



 



 



                                       76



 



                         DISCUSSION



 



                        Gerald Gates



                 U.S. Bureau of the Census



 



  The theme of these papers by Fritz Scheuren (IRS) and John



Leyes (Statistics Canada) is shifting the paradigm of census taking



to allow for more frequent detailed information for small



geographic areas at reasonable costs.  Scheuren points to two



weaknesses in the U.S. census taking process -- 1) the increasing



costs of enumeration and 2) the increasing obsolescence of the



information between censuses.  He discusses new paradigms that have



been proposed by Kish and others which employ rolling samples and



other techniques to obtain mote frequent small area data.  His



primary focus, however, is on administrative registers that could



be modified to serve a census function as well as their intended



administrative uses.  His intent is to frame the debate for



feasible options that will lead to a lot more scientific research



on this topic.



 



  Leyes addresses the census paradigm in terms of research



undertaken by Statistics Canada using administrative records.



Primarily, he describes the development of a family tax file



representing approximately 95% of the census in terms of population



covered.  He looks at coverage of this file in comparison with the



census; with surveys conducted by Statistics Canada; and with



administrative data maintained by other agencies.  Finally, he



describes a project to develop a linked administrative file that



would allow Statistics Canada to estimate the characteristics of



the population missed in the family tax file.  The work Leyes



describes has implications for shifting the census paradigm to



address cost, accuracy, and timeliness issues.



 



  Turning first to the Leyes' paper, I have a few specific



reactions to the role Statistics Canada plays with Revenue Canada



and with the content and coverage of the family tax file.  The



family tax file could only have been created with a great deal of



cooperation from Revenue Canada.  The Canadian tax form contains



demographic characteristics such as age, sex and marital status



that have no practical tax program application.  In addition, all



information from the tax return is available to Statistics



Canada -- this is not the case in the U.S.  Another major



difference between the two countries is the negative income tax



provisions in Canada which increases coverage of the tax file



 



 



 



 



                               77



 



(from 89% in the U.S. to 96% in Canada).  Despite these advantages,



the Canadian tax form, like the IRS Form 1040, contains mailing



address rather than physical address.



 



   I also found the Canadian work on record linkages to be quite



impressive, especially as it relates to creating retrospective



longitudinal databases to deal with emerging issues (e.g., income



and health care issues relating to the elderly).  Also, these



linkages permit, as Leyes states, adjustment of the family tax file



for undercoverage.  This feature allows Statistics Canada to use



the family tax file as an independent source for producing



population estimates between census years.  Since these linkages



are only done on a sample basis due to privacy concerns, their



utility is diminished somewhat.  In the current U.S. situation, the



reduced coverage and content of the Form 1040 file makes l00%



record linkages critical, while similar privacy concerns need to be



addressed.  (I should add that Form W2 earnings records could



improve the coverage possible with only 1040 tax returns, but this



will continue to miss nonworkers and omit some of the detail



available on the 1040.)



 



   Scheuren's paper raises some important issues regarding the



need for research on alternatives to the traditional once-a-decade



enumeration.  I complement Fritz on his persistence over the years



to explore traditional census alternatives.  His current paper



addresses the need for a census alternative to deal with "problems"



facing the 1990 census in terms of costs (low mail response rates)



and increasing data obsolescence.  Although administrative records



remain his primary focus, Fritz sees a need for research in other



areas, such as rolling samples.  He believes that rolling samples



offer real promise if they can be integrated into current ongoing



survey operations.  Although the "rotating" sample techniques have



been proposed for 2000 census planning (Herriot, Bateman, McCarthy,



1989), little research has been done and we have no plans to



incorporate this technique into the current surveys.  There are



several reasons for this which reflect the different goals of



current surveys and intercensal estimates



 



o    a rolling design will create inefficiencies because of



   increased interviewer travel (and reduced workloads)



   which will come from abandoning Primary Sampling Units



   (PSUs) in favor of more geographically disperse samples;



 



o    survey procedures that, as a cost saving feature,



   incorporate an alternative to the traditional first time



   personal visit, could result in lower response rates



   (telephone) or delays in the interviewing process (mail)



 



o    for surveys such as SIPP, the sample may be too small to



   spread out geographically;



 



 



                              78



 



o    sponsors may not want long form questions added to their



   questionnaires nor want intensive sample in areas with



   small population.



 



    The second major point I would like to address regards, as



Fritz put it, our "missed opportunity" to use the 1990 census as



a proving ground for the use of administrative records in the



census process.  The 1990 census Research and Experimentation (REX)



program considered many applications for administrative records



including all uses made in 1980 plus an administrative records



census and a coverage improvement program designed to enumerate



parolees and probationers through state administrative records.



All of these uses were abandoned because of resources available and



the expected minimal improvements given the costs.  (The



parolee/probationer operation was accomplished by parole officers



who distributed and collected questionnaires from persons in their



charge.)  An additional use, which was tested on a small scale as



part of the 1988 dress rehearsal, involved supplementing the Post



Enumeration survey (PES) with names obtained from administrative



records in order to improve the PES as a coverage measurement tool.



(Wolfgang, 1989)  An evaluation of this test, which will be



released shortly, may encourage further research in this area.



 



 



    Several administrative records uses that were adopted in the



1990 census include:



 



o    use of local lists of shelters and street locations to



    assist in enumerating the homeless;



 



o    use of vendor lists for developing the mail register;



 



o    macro-level consistency checks for content evaluation;



 



o    encouraged use by local jurisdictions as a way of



    improving outreach activities.



 



 



    Like Fritz, I believe that more extensive use of



administrative records, in a productive way, will require changing



administrative records.  But, it will take more than that.  It will



take institutional changes in the way administrative agencies view 



their role in the census statistical process.



 



 



    By way of tying this challenge to the future research



activities of the Census Bureau, allow me to expand slightly on



Fritz' paradigm analogy and relate it to the environment in which



we operate.  Both Leyes and Scheuren see administrative records



playing a key role in shifting the "census" paradigm.  Under this



assumption, I suggest that, rather than a single census paradigm,



there are actually three interrelated paradigms that require equal



consideration.  These are the once-a-decade enumeration,



intercensal population estimates, and administrative records



information systems.



 



                                   79



 



    Before we consider approaches to shifting these three



paradigms we need to think about the role the public and



bureaucracies may play.  We need to consider the social contracts



that exist between the government and the American people.  The



statistical agency has a specific obligation to census respondents



to ensure privacy (confidentiality) and reduce burden to the extent



possible.  In addition, the statistical agency must fulfill its



obligation to the American taxpayer to use its resources in the



most efficient manner in providing the information needed by



society.  Balancing these tradeoffs will determine which direction



the paradigm shift takes.



 



    Shifting the administrative records paradigm also requires a



new partnership between Federal agencies and, possibly, between



Federal agencies and the states.  The administrative agency must



accept new unrelated tasks that are not part of its primary



mission.  Traditionally, agencies avoid taking on tasks that differ



significantly from those that are at the heart of the



organization's mission.  (Wilson, 1989)  Even  within an



administrative agency, the statistical functions often take a back



seat to administrative functions.  Despite laws and additional



funds that reflect these new tasks, when push comes to shove, the



primary mission (in the case of the IRS, collecting taxes) will



most likely win out.



 



    A census reliance on administrative records requires a



commitment by the administrative agency to the census function



which heretofore has not existed.  Where information is lacking,



such as physical address and household relationships, change must



be encouraged.  Where change in the administrative process could



negatively affect the census use, accommodations must be made.  In



the past, changes have occurred but they have not always been



anticipated or beneficial.  For example,



 



o    Physical location information was added previously to the



    Form 1040 by the Census Bureau for the General Revenue



    Sharing Program.



 



o    The 1986 Tax Reform Act required the IRS to collect SSNs



    for children (a plus) but eliminated the personal



    exemption for persons 65 or older (a negative).



 



o    The SSA recently introduced a program of assigning SSNs



    to infants at birth using state birth records.  Despite



    Census Bureau objections and concerns of its own



    statistical office, SSA did not require that race of



    child (or mother) be part of the application process.



 



    If we assume that planning for paradigm shifts is good



which I think we must -- then we need to consider, as Fritz



suggests, which options are feasible.  First let me discuss options



as they relate to the traditional census.  The basic Constitutional



 



                                 80



 



requirement for apportionment requires an actual enumeration every



ten years.  Seven items are requested from each resident to



provide:  1) the basis for the apportionment of Congress; 2) a



sampling frame for use in the next decade; and 3) a base for



developing intercensal estimates.  To obtain this information from



administrative records (i.e. an administrative records census), may



require a Constitutional amendment in addition to changes to the



way administrative agencies do their jobs.  Research on this aspect 



should concentrate on the most useful sources of information with



the least amount of change required.



 



 



    A second component of the census consists of the housing



questions asked of every household.  The Census Bureau is exploring



the possibility of obtaining this information in future censuses



from records of city or county tax offices, accessors offices, or



recorders offices.  Such an option has the potential to reduce



burden and costs of census taking while offering comprehensive



coverage of the nation's housing.  One of the key requirements for



such an operation would be fostering interest in the local



jurisdictions to change/standardize their information systems to



maintain the items needed for the census.  This could be done by



promoting the changes as an improvement to existing administrative



systems and as a rich source of data for administering housing



related programs.  In this way, we win acceptance for the changes



needed for statistical purposes through the administrative benefits



they provide.



 



    The final component of the census is the long form sample



questions.  This component provides a source of detailed



information for small geographic areas -- but only once a decade.



As Scheuren suggests, these data could come from a rolling census



design in the event that the basic census (count) is done through



administrative records, but there are many problems as I have



noted.



 



    The intercensal estimates paradigm is certainly tied to the



census paradigm and any change to the census will most likely



necessitate changing the way we do intercensal estimates.  The



current population estimates program was a byproduct of the General



Revenue Sharing Program.  We will evaluate alternative designs in



the years ahead to see if the current program is meeting the needs



of users.  The work of Statistics Canada on developing family tax



files definitely needs to be considered.  In addition, recently



proposed legislation would put greater reliance on currently



available population estimates for funds allocation formulas which



will in turn put pressure on the Census Bureau to expand the



utility of these estimates.  A possible alternative which is being                                                                                   9



given some consideration by the Census Bureau would involve



conducting a large sample survey at mid-decade and modeling the



results to administrative records linked to TIGER geography.  The



administrative file could be constructed by linking the tax returns



obtained by Census with the social security number applicant file



 



                                 81



 



to be obtained from the Social Security Administration (assuming we



can address the privacy issues).



 



   In conclusion, greater reliance on administrative records in



the census process needs public acceptance and a commitment from



all those affected to make it work.  Perhaps the increasing costs



and respondent burden involved in traditional census taking will



encourage this change.  Scheuren and Leyes have shown us some



options.  We will need to explore these and others -- and fund the



necessary research -- so that, as we move into the 21st Century, we



are able to avoid the pitfalls and take advantage of the



opportunities that lie ahead.



 



References



 



Herriot, R.; Bateman, D.; and McCarthy, W. (1989).      "The Decade                                                                  cade



Census Program -- New Approach for Meeting the Nation's Needs for



Sub-National Data," to appear in American Statistical Association



Proceedings, Social Statistics Section.



 



Wilson, J. (1989).  Bureaucracy -- What Government Agencies Do and



Why They Do It; New York: Basic Books, Inc., p. 190.



 



Wolfgang, G. (1989).  "Using Administrative Lists to Supplement



Coverage in Hard-to-count Areas of the Post-Enumeration Survey for



the 1988 Census of St. Louis."   Proceedings of the Section on



Survey Research Methods, American Statistical Association.



 



 



 



 



 



 



 



 



                              82



 



                         DISCUSSION



 



                        Edward J. Spar



                      Market Statistics



 



 



  The Scheuren paper, is provocative and challenging.  At the



same time, some of the ideas presented here should be challenged



back.  For example, Scheuren mentions how expensive the decennial



census has become - $10 per capita.  But based upon what is this



expensive, in other words, as compared to what?  If each



individual, has to spend about $1 a year for the decennial census,



is this still considered to be too expensive?  Maybe we should have



a check off box on the 1040 form for those who wish to contribute



a dollar to the census instead of presidential election campaigns.



Money better spent.



 



   Scheuren also points out the problem of the decline of public



cooperation.  However, when all the bodies are counted, what figure



makes a successful census.  In 1980, 98.6 percent of the population



was counted.  Let's say that this time 96.8 percent of the



population is accounted for.  Does this make the decennial census



effort a failure?  This will depend upon the differential



undercount.  Should we begin to find other ways to reach people



based upon this?  We will still have for the very most part usable



small area data to work with.  Most decisions will not change at all



if the response rate does not decline drastically.  Perhaps



adjustment will adequately solve much of the undercount problem.



 



   We should certainly accept the possibility of the need for



"paradigm shifts".  But there seems to be a problem.  The paper



tells us that the rolling sample approach and the use, of



administrative registers just won't do the job that's needed, and



all things being equal, might even be more expensive.



 



   If accurate data are needed not only for redistricting and



reapportionment, but the allocation of funds for over 100 federal



programs, and if local communities need information to update their



plans and allocations, you immediately have to fall back on some



intensive decennial census activity.  And what about private sector



uses?  Correct market decisions based upon detailed information is



still what pays the bills, including the tax bill.  If you eliminate



detailed information for local areas, efficiency will decline,



which is something we as a nation cannot afford.



 



   As Fritz knows, I'm a very strong supporter of the use of



administrative records for making intercensal estimates.  And it has



been shown in this country, and in Canada as the next paper shows,



that excellent work can be done in linking administrative data



sets.



 



 



                                83



 



   Therefore I believe that our best approach so far is not to



throw out the present paradigm.  Instead, we have to find ways to



convince the American people that they have an important stake in



knowing what their about.  Further, we have to convince the policy



makers that once in ten years is far too infrequent, a point that



Scheuren makes quite well.  Also, we mustn't abandon the concept of



a quinquennial census, and we have to convince the policy makers



that more intercensal work is needed.



 



   For the first time in many years, you, the statistical



agencies have a special opportunity.  Over the years, there has been



no one in very high circles who had a real interest in statistics



and was also close to the decision makers.  At present, the Chairman



of the Council of Economic Advisors has the ear of the president.



We know that he believes in the need for timely accurate data.



Therefore, the Federal statistical system needs his support and you



should ask for it.



 



   On to the Leyes paper, which was a pleasure to read.  This



paper portrays a cogent attempt to build a file over time which



will eventually yield excellent information between census efforts.



However, the Canadian Privacy Act seems to limit the use of these



data.



 



   Statistically, however, this is kind of model where different



data sets are linked, that we in the United States should explore



to make better intercensal estimates.  Perhaps this is where the



paradigm shift should take place.  Finally, I wonder if the private



sector in Canada has taken advantage of these files for marketing



purposes?  How does the private sector in Canada interact with these



data, if at all?



 



   Two points on both papers.  First, both discuss the inability



to generate household information.  I think that this would be



harmful to both the public and private sectors, and I urge more



work be done to solve this shortcoming.



 



   Second, the private sector has developed many linked files,



some good, some bad.  There are claims that over 80% million



households can be reached with at least one of these files, and



demographic data are attached to these files.  I suggest that your



agencies, at the very least, learn what has been done in the



private sector and maybe take advantage of it by getting us all



together and sharing our knowledge.



 



 



 



 



 



 



 



 



                              84



 



                         Session 3



                  SURVEY COVERAGE EVALUATION



 



 



 



 



 



 



 



 



                               85



 



                              86



 



   CONTROL MEASUREMENT, AND IMPROVEMENT OF SURVEY COVERAGE



 



                       Gary  M. Shapiro



                     Bureau  of the Census



 



                     Raymond R. Bosecker



          National Agricultural Statistics Service



 



I.  Introduction



 



   Coverage errors can cause serious biases in estimates based



upon sample survey data.  Undercoverage may be substantial in many



surveys, especially of selected subpopulations.  For example, the



estimated undercoverage of Hispanic males aged 14 and over is 23



percent in the Current Population Survey (Hainer et al., 1988).  In



economic surveys, new businesses may be missed at a higher rate



than older ones.  If the characteristics of the missed portion of



the population are very different from those of the covered



portion, serious biases in the survey estimates for the total



population will result.



 



   This paper is a condensation and editing of Survey Coverage,



Statistical Policy Working Paper 17 (U.S. Office of Management and



Budget, 1990).  The 115-page working paper was prepared by the



Subcommittee on Survey Coverage of the Federal Committee on



Statistical Methodology.  Subcommittee members are Cathryn S. Dippo



(Co-chair), Gary M. Shapiro (Co-chair), Raymond R. Bosecker,



Vicki Huggins, Roy Kass, Gary L. Kusch, Melanie Martindale, and



D.E.B. Potter.  Robert Casady, Charles Cowan, John Paletta and



Richard Pratt also wrote parts of the working paper.  This paper



has numerous unattributed quotes from the full working paper.



Although the authors of this short paper accept responsibility for



all errors, credit for the good ideas and concept of the paper



belongs to all subcommittee members.  We would also like to thank



Melanie Martindale and Vicki Huggins for their useful comments on



this paper and Cora Wisniewski, Sue Chandler and Bessie C. Johnson



for their typing.



 



   The purpose of both this paper and the full report is to



heighten the awareness of survey program planners and data users



concerning the existence and effects of coverage error and to



provide survey researchers with information and guidance on how to



assess and improve coverage in sample surveys.



 



 



 



 



 



                               87



 



   This report utilizes a broad definition of coverage error.



This is defined to include all possible sources of error which are



not classified as observational or content errors (U.S. Department



of Commerce 1978).



 



   Section II of this paper discusses selected major sources of



coverage error.  IIA discusses errors which might occur before the



first stage of sampling and IIB those that might occur after the



first stage.  Issues associated with the creation and maintenance



of sampling frames, the choice of sampling frame and strategy,



field listing and interviewing are included.  Section III discusses



selected methods for preventing, reducing and evaluating coverage



errors.



 



 



II. Major Sources of Coverge Error



 



A. Sources of Coverage Error Before Sample Selection



 



   (1) Conceptual Issues -- The importance of thinking carefully



about the research goals, concepts, and targeted populations) for



a survey cannot be overemphasized.  Coverage errors can be



inadvertently designed into a survey from the beginning by



incorrect specification of the concepts to be measured or the



population(s) to be targeted by the survey.  Vague definitions of



populations and concepts tend to create coverage errors because



they lead to inappropriate unit inclusions on, or exclusions from,



a frame and even to naming a population which cannot be adequately



represented by a frame.



 



   (2) Frame Construction -- Once a decision is made concerning



the target population, either the sample design must be based upon



available sampling frames or a frame must be constructed



specifically for the study.  Dalenius (1985) notes the following



three important properties of a frame:



 



o    Makes it possible to compute estimates concerning a



   population which is sufficiently "close" to the target



   population.



 



o    serves to yield a sample of elements which can be



   unambiguously identified.



 



o    Makes it possible to determine how the units in the frame



   are associated with the elements of the (sampled)



   population.



 



   The first stage of sampling is usually dependent upon a frame



consisting of a physical listing of units.  This may be a list of



names of individuals, establishments, institutions, counties,



cities, streets, etc., or a list of numbers attached to city



blocks, land area segments, houses, pages, or any number of unique,



 



                              88



 



definable entities.  However, as Kish (1965, p. 53) notes, a "Frame



is a more general concept: it includes physical lists and also



procedures that can account for all the sampling units without the



physical efforts of actually listing them."  Deming (1960) cites one



exception to a list of units.  This occurs when a watch is used to



sample time intervals during which customers leaving a store are



interviewed.



 



   The units listed in the initial frame may not correspond to



the units about or from which information is sought.  Often,



additional frames are needed for successive stages of sampling in



order to progress from available sampling units to the units to be



contacted or measured.  For example, areas may be selected from a



listing or array of all blocks in an area frame.  Housing units



inside sampled areas may then be listed and sampled in order to



achieve a listing of persons to be sampled that are members of the



target population from which information is sought.



 



   A more complex example is the procedure for selecting items to



be priced in the Consumer Price Index.  The sample of priced items



is selected from items sold by a sample of outlets which, in turn,



was selected from a list of outlets created from information



provided by interviews with consumer units in addresses sampled



from the decennial census, new construction permits, and area



listings.  In this case, interviews are conducted in a sample of



housing units to create a sample frame of establishments, not a



population frame, from which a sample is selected.  Within the



sample outlets, probability methods are used to select increasingly



more detailed classes of goods until a particular item is selected.



A complete list of all the items available for sale is never



constructed.



 



 (3)   Frame Errors -- Kish (1965) states that a "frame is



perfect if every element appears on the list separately, once, only



once, and nothing else appears on the list," and classifies



possible frame errors into four types: missing elements clusters



of elements appearing on the list, blanks or foreign elements, and



duplicate elements.



 



   Missing elements is the frame error which causes greatest



concern.  Because they are missing, no examination of the sample



from the frame will reveal the nature of that component of the



population.  Often, conclusions are erroneously extended beyond an



incomplete frame on the tenuous assumption that missing units are



like or very similar to those represented on the frame.



 



   The initial sampling units may contain clusters of subunits



which must be incorporated into the sampling design.  An example is



a list of farm operator names of which the vast majority represent



a one-name/one-farm relationship but some represent a one-name/



multiple-farm relationship.  In this situation, there is a distinct



 



 



                                89



 



possibility for coverage error unless the interviewer has been



thoroughly trained.



 



    If a frame is created or an existing list modified for a



particular one-time survey, elements on the list which are blank or



are not members of the population of interest should be removed.



If they are not removed, those appearing in the sample must be



identified and properly handled in the survey process.



 



    Duplication of units on the frame may result in overcoverage,



i.e., some members of the population are represented more than



once.  Population totals may then be overstated and means could be



biased.



 



    4) Frame Maintenance -- Frame maintenance procedures are



discussed as they relate to the classes of coverage error just



described.  These procedures can be classified as follows:



 



o     Adding new frame elements or births,



 



o     Eliminating or identifying inactive frame elements or



    deaths,



 



o     Correcting misclassified frame elements,



 



o     Identifying existing frame elements no longer in scope,



    or in scope for the first time, and



 



o     Determining whether or not elements have combined with



    other elements or have split from existing elements



    (e.g., change in ownership, mergers, and divestitures in



    an economic setting).



 



    When the research population is dynamic, it is important that



a frame which represents it be updated to reflect births.  Section



III discusses several methods for doing this.



 



    The failure to identify deaths on a sampling frame does not



necessarily imply a bias, since any deaths sampled would be



representative of the universe of deaths.  But, biased sample



estimates can result if an inactive element is sampled and imputed



for when no response is obtained.



 



    A problem associated with many frames is not that elements are



missing, but that they are misclassified or are not classified at



all with respect to one or more variables.  This assumes importance



if the variable or variables that are misclassified determine



either the elements eligible for sampling or the subpopulations for



which estimates are produced.  Housing occupancy status (vacant or



occupied), geographic codes, SIC codes, etc., are examples of such



variables.



 



 



                               90



 



   Closely related to the problem of misclassification is the



problem of out-of-scope elements, i.e., elements that if properly



classified would not be part of the universe of interest.  As with



death elements, the presence of out-of-scope elements on a sampling



frame does not result in any biased sample results should they be



sampled (assuming the sample process identifies them as out-of-



scope).



 



   The composition of elements comprising a frame will often



change over time.  This is especially true for economic-based



frames, where, for example, individual plants are bought and sold



by companies, two or more companies merge, or companies divest.



From a coverage point of view, ownership is important because the



continued sample status of a sold establishment often depends upon



the status of the buying company.



 



 



B. Sources of Coverage Error After Sample Selection



 



    The full Survey Coverage report discusses three broad kinds of



error occuring after the initial selection of a sample from a



frame:  (1) incorrect association of sampling to reporting unit;



(2) editing errors; and (3) other nonsampling errors.  We discuss



only the first of these in this paper.



 



    Misclassification of occupied housing units as vacant units is



a frequent type of classification error in household surveys.  In



many surveys, the population of interest consists of occupied



housing units, but the frame consists of other types of units as



well.  In the Current Population Survey (CPS), for example, an



interviewer is generally given specific addresses for interview.



When an interviewer is repeatedly unable to find anyone home at an



address (s)he must classify it either as a vacant noninterview (out



of scope) or as a noninterview unit occupied by persons eligible



for interview.  In October 1966, the CPS reinterview concentrated



on measuring this type of coverage error (U.S. Bureau of the



Census 1968).  This research revealed that more than 10 percent of



the units classified as vacant were actually occupied by eligible



persons.



 



    In two separate evaluation projects in the 1970 Decennial



Census, 11.4 percent and 16.5 percent of the units initially



enumerated as vacant were misclassified (U.S. Bureau of the Census



1973).



 



    We believe that error in listing persons within interviewed



households (within-unit) is the most serious source of coverage



error occurring after sample selection.  Alexander (1986) has



estimated that within-unit error results in overall undercoverage



of four percent for persons 12 and over in the National Crime



Survey.  Within-unit error is probably more serious for blacks and



Hispanics.  Hainer et al (1988) point out that in the CPS, black



 



                                  91



 



female undercoverage is close to the overall undercoverage of seven



percent, but black male undercoverage is about 20 percent,



suggesting that most of this undercoverage results from within-unit



error.



 



     There are several instances in which authors have speculated



on large biases caused by within-unit error.  One example of this



is discussed by Hainer et al. (1988): "... Cook (1985) presents



evidence suggesting that the National Crime Survey may



underestimate the number of gun assaults by as much as one-third.



He offers the explanation that the National Crime Survey does not



adequately cover the kinds of people criminologists believe are



most likely to be involved in the life of the streets (including



participation in criminal activity...)" (Cook 1985, see also Martin



1981).



 



     Hainer, et al. (1988) discuss at length the ethnographic



research that has been done on household survey coverage.  They



suggest there are two main causes of respondent reporting error



resulting in missed persons:



 



o     Some people, especially black and Hispanic males, are



     deliberately omitted because of potential loss of



     household income if their presence in the household were



     known to authorities.



 



o     There is a lack of correspondence between survey



     definitions of household residency and how people



     actually live.



 



 



III.  Methods for Dealing with Coverage Errors



 



     The previous discussion focused on sources of coverage error



in selecting and maintaining sampling frames.  Solutions to



problems arising from the limitations of available frame sources



are a major challenge to the survey design statistician.  Some



options, however, are available for dealing with coverage error.



The options discussed are: Questions to specify concepts, current



sampling frame, updated frame for births, random digit dialing,



multiple frames, reinterview, estimation procedures, and evaluation



methods.



 



 



A. Preventing Incorect Specifications of Concepts



 



   To avoid coverage errors caused by incorrect specifications of



concepts, it is useful to ask a series of questions:



 



o  To what population(s) of units does this problem refer?



 



 



 



                                92



 



  Distinguish among populations about which information is



sought, those which will be frame units, and those which may be



reporting units, if different from the frame units.  For example,



suppose one wished to do research on "the scholastic achievement



(as measured by grades) of children of recent immigrants."  In this



case, "children of recent immigrants," more suitably specified



perhaps as, "persons aged roughly 5 to 17 enrolled in Grades 1



through 12 of the U.S. public schools and living in a household in



which at least one related head has been resident in the United



States 5 or fewer years," would be the population about which



information is sought.  However, it seems likely that one might need



to construct two or more frames in order to reach this population.



One of the frames might have U.S. public schools as units, while



another might consist of residential addresses to be screened.  In



this example, reporting units might well consist of two groups,



school recordkeepers and parents or guardians.



 



o    Is (are) this (these) populations) observable or



   potentially measurable?  How?



 



   Continuing from the example above, one can see that the



suggested specification of "children of recent immigrants" takes



account of some of the presumably unobservable "children of recent



immigrants", such as those who may be homeless and those who may



not be currently enrolled in school.   Among recent immigrants,



those who entered the country illegally may not be observable, as



well as those who died following entry, leaving school-age



dependents.  Sources for obtaining U.S. public schools and



residential addresses might be lists from various agencies.



Thinking through all possible categories of the populations of



interest should reveal those subsets which cannot be measured or



reached; those whose measurement (observation) might be achieved;



and those which seem reachable with some existing or proposed



methodology.   Thus, the "children" may be reached by means of a



household survey, school survey, and/or institutional survey



(hospitals, orphanages).



 



o    Are there one or more subsets of this (these)



   populations) which cannot be measured/observed in some



   way?  What are these?  Would they ever be measurable?



 



   Continuing the example of "children of recent immigrants of



some of the unobservable components of the populations discussed



have already been mentioned.  The potentially measurable components



might be those who cannot be reached now but who might be reached



using a methodology that is prohibitively expensive, such as



scanning all death certificates or other sources of information to



identify deceased recent immigrants.    Thus, it may be useful to



distinguish the inherently unobservable from the practically



unobservable components of populations of interest.



 



 



 



                               93



 



o     Does time enter into the answer to one or more of the



    questions above, in the sense that the measurable



    populations) may change or may have changed?



 



    Continuing the example of "children of recent immigrants," one



may find that a change in a legal boundary or definition can turn



"internal migrants" to "recent immigrants" or vice versa.  This



would happen, for example, if Puerto Rico became a U.S. state, thus



solving the problem of how technically to classify migrants to the



mainland, who would become "internal migrants".  Such a change



might force a redefinition of the size and location of the



populations of interest.



 



o     Have previous efforts been made to build a frame of this



    (these) population(s)?  What problems were encountered in



    frame construction?  Was one of these faulty



    conceptualization?  Which of these problems has been



    solved?



 



    This series of questions focuses on the need to locate



previous research, to attempt to contact those who designed and



conducted the research, or to obtain procedural histories about it



and to evaluate carefully the definitions and language used by



others.  An assessment of previous research often reveals use of



frames built for other purposes by still earlier researchers,



especially when the frames are very expensive to assemble.



Information needed for adequate frames may now be available (such



as improved school lists) due either to improvements in information



processing or to changes in laws regarding availability of



administrative data.



 



 



B. Current Frames and Updating Old Frames



 



    Use of old frames can result in serious coverage problems,



because births may be partially or totally excluded and other units



may be misclassified.  An obvious but important solution is to use



current or recently built or updated frames whenever possible.



 



    When an old frame must be used, it is important to have



updating procedures to include births.  One effective method for



detecting new units is to periodically canvass the existing frame



elements.  As an example, all of the larger multiunit companies and



some of the smaller companies on the Standard Statistical



Establishment List are canvassed on a yearly basis.  Companies are



questioned as to whether or not they have started new operations.



 



    A second method of identifying new units results from coverage



maintenance operations performed for samples selected from the



frame.  As part of the questionnaire administration process in



nearly all surveys, inquiries are made about the status of the



sampled units and whether any changes in their status have occurred



 



                                94



 



since the last data collection period.  Although the inquiries are



targeted to sampled units believed not to be births, sometimes



incidental information about other units (including births) can be



obtained.



 



   Several methods can be used for including new units in



household surveys.  The Bureau of the Census includes most new



housing starts in its household surveys by sampling from building



permit files.  This is an efficient procedure, but building permit



files do not identify illegal new construction, conversions, and



new mobile home placements; nor do they identify new special



places, such as dormitories, fraternity houses, boarding houses,



and public housing.  To illustrate, it was estimated for the 1985



American Housing Survey that approximately 25 percent of all new



mobile homes were missed (Schwanz, 1988).



 



 



C. Random Digit Dialing



 



   One household sampling method employed in an attempt to avoid



omission problems is random-digit dialing (RDD) (Waksberg, 1978).



The use of telephone directories as sampling frames often results



in unacceptable levels of undercoverage because they omit unlisted



numbers for some nontypical portions of the population.  With RDD,



a sample of telephone households is located through the use of



randomly generated telephone numbers.  In this way only those



households without telephones are omitted.  For many surveys, this



could be considered a trivial exclusion.  In others, differences



between telephone and nontelephone households may have a profound



impact on the characteristics being measured.  For example,



measures of poverty and income from entitlement programs would most



likely be biased.



 



 



D. Multiple Frames



 



   Coverage may be improved through the use of multiple frames.



Sometimes no single frame fully covers the target population and



merging independent source lists would be impractical.  In this case



separate probability samples from different frames can be used to



expand coverage beyond any available single frame.



 



   The application of overlapping multiple frame sampling most



commonly found in Federal surveys is the use of an area frame and



an overlapping list frame.  The area frame is generally designed to



provide complete coverage by including all U.S. land parcels as



sampling units.  The list frame is nearly always incomplete (a



common attribute of lists), but its use provides certain sampling



efficiencies which enable the multiple frame survey to provide the



same precision at a much lower cost than would an area frame survey,



alone.



 



 



                               95



 



E. Reinterview



 



  Reinterview can often be profitably used for both evaluation



and control of coverage error.  In the CPS, the regular reinterview



program is able to detect misclassification of occupied housing



units as vacant units, errors made in listing housing units in area



segments, and errors made in missing persons within interviewed



units.  However, the CPS reinterview program serves many purposes



and consequently fails to detect a number of these errors.  A



special intensive coverage check was done in the 1966-67 CPS



reinterview.  This check was much more successful than regular



reinterview in detecting vacant unit misclassification and area



segment listing errors, but still found few instances of within-



unit errors (U.S. Bureau of the Census 1968).



 



  A type of reinterview can also be used for nonresponse follow-



up. A subset of original noninterviews can be more aggressively



pursued to obtain complete or at least partial, interviews, or



alternatively, refusal households can be sent a very brief mail



questionnaire asking why they refused and collecting basic



demographic information.



 



F. Estimation Procedures



 



   Estimation procedures may also be used to decrease the bias of



survey estimates relative to the target population.  One such



procedure is the use of ratio estimation or benchmarking.  The



Bureau of Labor Statistics employs a benchmarking procedure to



revise monthly employment estimates from the Current Employment



Statistics survey. (U.S. Bureau of Labor Statistics 1989) Sample



estimates are compared each year with later summarizations of



mandatory UI reports filed by employers.  The UI data, which serve



as a benchmark, are an aggregation from the same source as the



micro-data used to construct the frame from which the sample was



selected, except that the benchmark data are one year newer.  Hence,



the benchmark file takes into account new firms or changes in



industrial classification to ensure more accurate coverage.  The



completeness of the UI administrative data affords the opportunity



to analyze and adjust for frame deficiencies (Thomas, 1986).



 



G. Macro and Micro Level Evaluation



 



   Evaluation methods to independently determine the



representativeness of the sampling frame(s) used are very useful



for quality control.  One method of measuring the degree of frame



coverage error is comparative analysis.  Comparative analysis can



occur at two levels.  The first is a macro level evaluation, which



compares known population values with totals derived from summing



characteristics for each sampling frame unit.  The second type of



analysis is performed at the micro or individual sampling unit



 



                               96



 



level.  This most often involves matching of data available from



different sources for individual units.



 



   The Bureau of the Census utilizes a macro-level approach for



frame completeness evaluation called demographic analysis.  With



this method, demographic data from various sources are used to



develop expected values for the population as a whole and by race,



age, and sex to compare with the census counts.



 



   On a micro-level basis the Bureau of the Census matches census



returns against Administrative records for drivers' licenses from



State departments of motor vehicles and against registers of



resident aliens supplied by the Immigration and Naturalization



Service.



 



IV. Conclusion



 



   This paper has presented many of the major points treated in



the full Survey Coverage report, whose purpose is to provide



information about the types and effects of coverage error in



surveys and guidance on how to assess and improve survey coverage.



We found few studies, however, which actually measure coverage



errors in surveys and even fewer which address the impact of



coverage error on survey estimation.  The paper implies that



significant resources should be allocated to the conceptual and



planning stages of surveys, and that procedures providing for the



evaluation of coverage and for minimizing and controlling coverage



error be clearly established and included in the survey design.



 



   As to the seriousness of coverage error, the largest single



source of coverage error identified in the full Survey Coverage



report for an economic survey is a 20 percent underestimate in the



1988 Economic Census statistic of receipts for nonemployer



establishments due to misclassification.  For household surveys,



large single source of overall coverage error is an estimated 4



percent undercoverage in the National Crime Survey estimates of



persons aged 12 and over due to within housing unit listing errors.



(Undercoverage from this source for some subgroups is much worse.)



Since we know that single sources themselves can be significant,



the overall effect of all sources of coverage error on survey



products is of great concern.



 



   Several leading methods for identifying and assessing coverage



error and for improving coverage have been mentioned here.  The



full report treats these and other methods in detail.  It also



provides case studies of specific Federal surveys which illustrate



various frame and coverage issues.



 



   The methods that apply to most surveys and which can lead to



significant improvements in data quality are the use of multiple



 



                                97



 



frames to improve coverage at the sampling stage and weighting



adjustments to reduce bias from coverage error.



 



References



 



Alexander, C. (1986), "The Present Consumer Expenditure Survey's



Weighting Methods," in Population Controls and Weighting Sample



Visits, Washington, DC: U.S. Bureau of Labor Statistics.



 



Cook, P. (1985), "The Case of the Missing Victims: Gunshot



Woundings in the National Crime Survey," Journal of Quantitative



Criminology, 1, 91-102.



 



Dalenius, T. (1985), "Elements of Survey Sampling," Notes prepared



for the Swedish Agency for Research Cooperation with Developing



Countries (SAREC).



 



Deming, W. (1960), Sample Design in Business Research, New York:



John Wiley and Sons, Inc.



 



Hainer, P., Hines, C., Martin, E., and Shapiro, G. (1988),



"Research on Improving Coverage in Household Surveys," Proceedings



of the Fourth Annual Research Conference, U.S. Bureau of the



Census, pp. 513-539.



 



Kish, L. (1965), Survey Sampling, New York: John Wiley and Sons,



Inc.



 



Martin, E. (1981), "A Twist on the Heisenberg Principle: Or, How



Crime Affects Its Measurement," Social Indicators Research, 9,



197-223.



 



Schwanz, D. (1988), "1985 Type-A Unable-to-Locate Rates for the AHS



National Unit Samples," Internal memorandum, U.S. Bureau of the



Census.



 



Thomas, A. (1986), "BLS Establishment Estimates Revised to March



1985 Benchmarks," Washington, DC: U. S. Bureau of Labor



Statistics.



 



U.S. Bureau of the Census (1968), "The Current Population Survey



Reinterview U.S. Program, January 1961 through December 1966,"



Technical Paper 19, Washington, DC: U.S. Government Printing



Office.



 



U.S. Bureau of the Census (1973), "The Coverage of Housing in the



1970 Census," Report PHC(E)-5, Washington, DC: U. S. Government



Printing Office.



 



U.S. Bureau of Labor Statistics (1989), Employment and Earnings, 36



(12).



 



                               98



 



U.S. Department of Commerce (1978), "Glossary of Nonsampling Error



Terms: An illustration of a Semantic Problem in Statistics,"



Statistical Policy Working Paper 4, Washington, DC: U. S. Government



Printing Office.



 



U.S. Office of Management and Budget (1990), "Survey Coverage",



Statistical Policy Working Paper 17, Washington, D.C.



 



Waksberg, J. (1978), "Sampling Methods for Random Digit Dialing,"



Journal of the American Statistical Association, 73, 40-46.



 



 



 



 



 



 



 



 



                                99



 



                    QUALITY OF SURVEY FRAMES



 



                        Judith T. Lessler



                   Research Triangle Institute



 



1. Introduction



 



    This paper focuses on the quality of sampling frames with



particular emphasis on the relationship of the sampling frame to



the overall error of survey estimates.  It also presents some



examples from studies that have been conducted by the Research



Triangle Institute (RTI).



 



   The frame is a fundamental element of scientific survey



research.  Probability sampling involves selecting a subset of units



from a finite collection of units in a manner that lets one



determine the probability of obtaining that subset.  The sampling



frame is the finite population of units to which the probability



sampling mechanism is applied.  Thus, the type of frame used for a



survey and any deficiencies or inefficiencies in it affect the



total error of the survey estimates.



 



2. Definition of a Frame



 



    The population of frame units is not necessarily equivalent to



the population for which information is to be collected.  In this



paper, I refer to the population the survey researcher wishes to



make measurements on as the target population and the individual



components of that population as elements.  This population may not



be the same as the inferential population.  For example, the



National Human Monitoring Program of the U.S. Environmental



Protection Agency (EPA) conducted a special study of mirex residues



in human adipose tissues (Leininger et al., 1980).   Mirex is a



persistent insecticide that has been used to control fire ants.



Human adipose tissue specimens were collected from selected



surgical patients and cadavers and chemically analyzed for the



presence of mirex residues.  The inferential population in this



study was not the sick and the dead, but, rather all persons living



in the areas subject to application of the insecticide.



 



    Just as the target population is not necessarily the same as



the inferential population, neither is the population of frame



units the same as the population of target elements.  Thus, noting



this distinction and the role of the frame in survey sampling, I



once defined a frame as follows:



 



    "The frame consists of materials, procedures, and devices



    which identify, distinguish, and allow access to the



    elements of the target population.  The frame is composed



    of a finite set of units to which the probability



 



                               100



 



   sampling scheme is applied.  Rules or mechanisms for



    linking the frame units to the target population elements



    are an integral part of the frame.  The frame also



    includes auxiliary information (measures of size,



    demographic information) used for (1) special sampling



    techniques such as stratification and probability



    proportional to size sample selections, or (2) special



    estimation techniques, such as ratio or regression



    estimation."



 



     I like this definition because it clearly recognizes that



different types of frames support different types of sampling and



estimation procedures.



 



     However, I think that it fails to recognize a key aspect of



sampling frames, namely, the types of measurement designs they



support.  To illustrate, if a survey is to be conducted by asking



questions or by gathering information from records, the reporting



units are not always equivalent to the target elements.  For



example, suppose we wanted to know the family income of all



children who attended the Saturday afternoon swimming classes at



Sometown Community Park.  A sampling frame consisting of a list of



all swimming classes and the times that they met would provide us



easy and efficient access to the target population of children.  We



could go to the class and identify each child; however, this would



not be very helpful because few children know their family incomes.



Thus, we need to insert a key word in the above definition --



measurement -- yielding:



 



     "The frame consists of materials, procedures and devices



     which identify, distinguish, and allow access to and



     measurements on the elements of the target population.



     The frame is composed of a finite set of units to which



     the probability sampling scheme is applied.  Rules or



     mechanisms for linking the frame units to the target



     population elements are an integral part of the frame.



     The frame also includes auxiliary information (measures



     of size, demographic information) used for (1) special



     sampling techniques such as stratification and



     probability proportional to size sample selections, or



     (2) special estimation techniques, such as ratio or



     regression estimation."



 



 



3.   Components of Quality



 



     Researchers who are choosing a sampling frame for a survey



need to consider a number of factors when making that choice.



These include:



 



-  coverage of the target population



 



 



                                  101



 



-  efficiency of the sample designs that are supported by



   the frame



 



-  effect of the frame on nonresponse errors



 



-  types, costs, and quality of the measurement designs



   supported by the frame



 



-  cost of constructing the frame



 



-  accuracy of information on the frame



 



   Coverage of the target population:  It is widely recognized



that several aspects of a sampling frame can cause bias in survey



estimates.  Missing target elements, inclusion of nontarget



population elements, unrecognized multiplicities, and failure to



account for the clustered nature of frames during sampling and



estimation can all introduce bias in survey estimates.



 



   Efficiency of sampling and estimation: The structure of the



frame, the information it contains, and the quality of that



information will determine the types of sample designs and



estimation procedures that can be used in a survey.  Simple frames



lacking auxiliary information support simple sample designs;



complex frames containing auxiliary information support more



complex designs, which are generally more efficient.  Frames used



for sampling business establishments are a good example.  Lists



that also include information on the size of the establishment will



permit sample designs that are much more efficient than those that



could be designed using a simple listing of establishments.



 



   Effect of the frame on nonresponse errors: The type of frame



that is chosen also has a major impact on nonresponse errors.



Often, a frame that provides efficient access to large segments of



a target population will also be guarded by "gatekeepers" who can



deny access to the target elements.  For example, if one would like



to conduct a survey of young people aged 12-17, using a school



based sampling frame rather than an area household frame will



provide more efficient access to the great majority of this target



population.  To use such a frame, one usually needs permission from



school district personnel who can, in a single decision, deny



access to large segments of the target population.  In a national



survey, failure to obtain cooperation from the large city school



districts can have a devastating impact on our ability to control



nonresponse errors.



 



   Types of measurement designs supported/cost of making



measurements: The frame that is chosen for the survey also affects



the types of measurements that can be made.  Frames of telephone



numbers using random digit dialing provide access to a very large



part of the household population.  Using this frame, however



generally limits one to making measurements by asking questions.



 



                             102



 



One cannot weigh the person or collect a blood sample although one



could, of course, obtain the person's address and make the direct



measurements in subsequent visits.  These subsequent visits would



cost more than using an area housing unit frame because the sampled



elements would be widely dispersed.



 



  RTI recently completed a survey for the Food and Nutrition



Service of participants in WIC (Women, Infants, and Children



Feeding Program).  Much of the information that needed to be



collected could be abstracted from the WIC records; however, other



information required an interview with the WIC participant.  A



sampling frame that consisted of lists of WIC agencies and lists of



persons served would have been the most efficient for collecting



the record data; however, it would have been very inefficient for



conducting the interviews.  Because of this, we developed



procedures for listing people as they arrived at WIC clinics for



their initial enrollment into the program.



 



  Cost of constructing the frame: When assessing the relative



quality of various sampling frames, we must consider the cost of



constructing the frame.  A frame that includes "size measures" for



the units may be permit more efficient sampling; however, it may be



too costly to determine the size of the units.  The money spent on



constructing the frame might be better spent in increasing the



sample size.



 



   Accuracy of information on the frame:  If the auxiliary



information on the sampling frame is inaccurate, the efficiency of



sample designs and estimation procedures that make use of this



information will be reduced.



 



 



4. Examples



 



    RTI has conducted many types of surveys using many kinds of



sampling frames including area household surveys and random digit



dialing surveys, as well as surveys of schools, businesses,



military personnel and families, nursing homes, hospitals, and so



on.  We also do a number of environmental surveys, and I will



describe two of these to illustrate the points discussed earlier.



 



4.1. Of Flowing Waters



 



    The first example shows how a frame can influence in several



ways the quality of a survey's estimates.  The goal of the 1982



National Fisheries Survey was to measure the biological quality of



the Nation's flowing waters.  After some discussion of exactly what



was intended by the phrase, the "Nation's flowing waters," the



statisticians on the project turned to the task of developing an



operational definition of sampling units and target elements for



use in the survey.  It turns out that the EPA has developed a



cataloging system in which each body of water in the United States



 



                                  103



 



is segmented into well-defined units called reaches, described



according to the following definition (Horn, 1981):



 



   "Most reaches represent the approximate centerlines of



   streams and extend between points of confluence with



   other streams.  The reaches constructed within open



   waters are generally straight lines connecting tributary



   streams with assumed transport paths through the open



   waters."



 



   In addition, the U.S. Geological Survey (USGS) has a system in



which the United States is divided into nonoverlapping areas based



upon the configuration and sizes of watersheds.  There are 2,100



cataloging units (CUs) contained in larger regions called water



basins or hydrologic regions.  When we designed the survey, EPA



maintained a River Reach File, that contained some 68,000 reaches



defined within these CUs.  This file was not complete because it



was estimated that the total number of reaches was around 179,000.



Moreover, a clustered design was not needed to control data



collection costs because the survey was to be conducted by mailing



questionnaires to local fisheries biologists who were familiar with



each waterbody.  In addition, a very accurate (but costly)



digitizing procedure for identifying reaches and for measuring



their length was available.  Thus, staff decided to select the



sample in two stages: (1) sampling CUs, then (2) reaches within CUs



using maps to identify the reaches.



 



   We established the following operational definition of the



target population:



 



   All reaches of rivers and streams that were:



 



a.  contained in the 48 contiguous States;



 



b.  shown on 1:500,000 USGS maps;



 



c.  including watercourses shown on the maps as being



    seasonally intermittent, impoundments, reservoirs, canals



    and constructed channels, and waterways; and.



 



d.  excluding the Great Lakes and other lakes, marine waters,



    estuaries, and wetlands (Glauz, 1984).



 



  One interesting feature of this definition is the specification



of the map scale.  The scale 1:500,000 is in inches -- one map inch



for every 500,000 inches.  Because reaches are defined by points of



confluence, maps with higher resolution would show more reaches and



maps with lower resolution fewer reaches.  Smaller-scale maps were



not available; thus, our definition of the target population was



limited by the materials we had available for identifying its



elements (given the available budget).



 



 



                              104



 



    Measures of size were constructed for the first-stage sample



by obtaining maps of all the 2,100 CUs and measuring the length of



all the eligible waterways using a map meter.  Grids were drawn on



the maps to facilitate keeping track of the measurements, and



cataloging units were randomly assigned to the staff performing the



measurements.



 



    A first-stage sample of 302 CUs was selected with



probabilities proportional to size.  Within this first-stage



sample, a second-stage frame was constructed using automated



digitizing equipment to trace, list, and record the size of each



reach in the 302 selected cataloging units.  A total of 1,303



reaches were selected from this second-stage frame.



 



    This example illustrates several ways in which the frame



influenced the quality of the survey estimates.  First, the



materials and procedures that could be afforded for constructing



the frame limited the target population to reaches that were



visible on the 1:500,000 scale maps.  Smaller reaches could have



been identified by selecting a sample of areas and using a counting



and listing procedure; however, the budget for the survey did not



permit such an activity.  Second, the use of size measures for



selecting first- and second-stage sampling units increased the



efficiency of sampling.  Third, the cost of constructing a complete



list of all reaches required the use of a two-stage design.



 



4.2.  Of Passing Time



 



     The second example illustrates the relationship between the



frame, the definition of a target population, and the measurement



design.  RTI recently completed the National Alachlor Well Water



Survey (NAWWS) that required distributing a sample in both time and



space (Whitmore et al., 1990).  The goal of the survey was to



estimate the frequency of occurrence of the herbicide alachlor in



private rural wells used for domestic consumption.  Because the



water in wells is not static, sample wells could not be monitored



at arbitrary points in time without introducing an unknown temporal



bias into the sample.  Data were to be collected over a 1-year



period; thus, one of the first tasks was to decide on a definition



of the target population by dividing the year into units for which



it was possible to collect measurement.



 



     A major constraint on the choice of a time period for the



survey was the amount of time it would take to make a measurement.



A year into months, weeks, days, hours, minutes, and so on.  The



lower limit would be the time required to draw and package the



amount of water required for an accurate chemical analysis from the



well -- a few hours.  Partitioning the year into hours and



selecting a sample of hours, however would have required a survey



team to be at the well head standing at the ready while they waited



 



                                  105



 



for the sample hour.  In truth, the entire process of collecting



water samples for the survey was much more complicated.



 



  The survey team needed to contact the owner of the well, obtain



his or her consent to draw water from the well, make an appointment



with the resident (not necessarily the owner) for obtaining the



water sample, travel to the site, identify a tap or hole for



collecting the water (collection of water before any treatments was



preferred), measure water temperature by running water through a



flow-through cell, continue to run the water until a stable



temperature was achieved or 10 minutes had passed, fill three large



sample bottles, collect an additional water sample and mix it with



a stabilizing reagent, and package the water bottles for shipping.



In addition, observations and photograph(s) of the well site and



surrounding area were needed as were questionnaire data on water



use, well characteristics, and the surrounding area.  After



considering the time required for the survey teams to implement the



entire measurement process, we decided that (with the resources



available) dividing the year into observational units smaller than



a month would not be feasible.  Therefore, the target population



for the survey was defined as well-months.



 



    An assumption that underlay all NAWWS estimates was that the



herbicide concentrations would be stable for the entire month.



Dividing the year period into smaller units would have reduced



measurement error; however, this would have also resulted in more



missing data because the data collection team would have had severe



difficulty in obtaining the measurements at the prescribed time.



 



   To increase the chance that the concentrations were stable for



the sample month, temporal strata were formed based upon ground-



water recharge conditions.  Prior information was used to classify



each month into a historically low, medium,  or high recharge



stratum.  Because the first-stage spatial sampling units were



counties, temporal strata were created for each county.



 



 



References



 



Glauz, W.D. (1984) 1982 National Fisheries Survey.  Volume II:



Survey Design.  FWS/OBS-84/14.  U.S. Fish and Wildlife Service,



Washington DC.



 



Horn, C. Robert (1981) The reach file: a digital base of streams



and lakes (memo).



 



Leininger, Carol, Donna L. Watts, Charles Sparacino, and Stephen



Williams (1980) Mirex Residue Levels in Human Adipose Tissue: A



Statistical Evaluation.  RTI Project Report, Contract 68-01-5848.



U.S. Environmental Protection Agency, Washington, DC.



 



 



 



                               106



 



Whitmore, Roy W., et al. (1990) National Alachlor Well Water



Survey.  Volume I: Survey Design and Data Collection Final Report.



RTI Project No. RTI/3895/04-03F.  U.S. Environmental Protection



Agency, Washington, DC.



 



 



 



 



 



 



 



 



                         107



 



                            DISCUSSION



 



                          Fritz Scheuren



                    Internal Revenue Service



 



    Judith Lessler and Gary Shapiro and Ron Bosecker deserve our



thanks today for their thorough "coverage of coverage."  They have



very ably reminded us of the important quality features of this



aspect of a survey.



 



General



 



    Taken together, the two papers provide a valuable summary of



current practice.  The papers complement each other nicely.  In



particular, we have been given two viewpoints today -- one, from



the public sector and, the other, from the private sector of survey



research.  Differences in emphasis arise due to these perspectives.



One example would be the degree to which frame construction is ad



hoc (private sector) versus ongoing (public sector).  More



specifically, maintenance of frames is covered in detail in the



Shapiro-Bosecker paper, but only touched on in the Lessler one.



 



    A key issue in frame construction arises when we have a target



finite population, but our real purpose is in making inferences



about an ill-defined superpopulation.  Judy's phrase "of flowing



waters" says it all.  Frame construction is part of learning what



is already known before conducting a survey.  It is part of



connecting the measurement process with the "thing" to be measured.



Coverage adjustments have this flavor of connection, too.



 



   The cognitive research movement needs to be at least mentioned



in the context of survey coverage issues, if only because of the



conceptual challenges in defining the target population and the



even more difficult challenge of "defining" the population of



inference.  Just look at the problem of within-household



undercoverage, for example.  Maybe Judy Lessler or our Chair, Cathy



Dippo, would like to comment on these cognitive aspects, since they



have been heavily involved in this emerging area.



 



   Both speakers have constructed somewhat different taxonomies



of survey coverage errors.  One could profitably relate and refine



their approaches; however, I found both useful as is.



 



   On the whole, the papers do an excellent job of describing



(albeit in broad terms) the main technological aspects of frame



construction, maintenance and coverage.  I have only one quibble:



I was surprised by the complete omission of any mention of record



linkage.



 



 



                              108



 



   Finally, one last point of a general nature: the Shapiro-



Bosdcker paper should whet your appetite for the larger effort



conducted by the Federal Committee on Statistical Methodology



(FCSM).  The FCSM subgroup led by Gary and Cathy Dippo conducted an



excellent series of case studies (Subcomittee On Survey Coverage



1990).  These studies are, however, largely descriptive, rather



than proscriptive -- a point I will turn to at the end of these



brief remarks.



 



 



Quality



 



   This two-day workshop is supposed to be about quality, so I



would like to connect the present papers somewhat more to that



theme than has been done already.  In doing this, I want to shift



the focus from PRODUCT quality to PROCESS quality and look more at



how to improve the processes that we use to construct frames and



conduct surveys.



 



   At, IRS, we are following an action-oriented quality management



approach advocated by Juran (1986), Deming (1986) and others.  This.



is in contrast to the mainstream statistical emphasis which has



long focussed more on measurement and perhaps not enough on



improvement.  Anyway, Juran divides quality, like Gaul, into three



parts:



 



o    Planning. --  The steps to be taken to prepare, including



   establishing the desired level of quality (implicitly or



   explicitly).



 



o    Control. -- The steps needed to implement and to achieve



   the desired level of quality.



 



o    Improvement. -- The efforts undertaken to make further



   improvements in quality over those initially planned.



 



   Figure A provides a generic example giving you some typical



steps taken at each of these three stages of quality management.



This is an approach that we, at IRS, have begun to use to help the



Census Bureau avoid a repetition of the 20 percent underestimate



(for 1987) in the economic census statistics on receipts for



nonemployer establishments -- among the largest coverage problems



mentioned in the Shapiro-Bosecker paper (Greenia 1990; Konschnik



and Moore 1990).



 



 



Conclusion



 



   Let me conclude by making some recommendations on possible



next steps for a follow-up to the fine FCSM efforts to study survey



coverage quality issues:



 



 



                                109



 



o    Complete the learning from each of the FCSM case studies



   by subjecting them to a checklist like that in Figure A,



   to summarize for each case what the quality management              -A



   steps were for survey coverage.



 



o    Choose the "best of the best" approaches. The Japanese,



   word here is DANTOTSU.  This (partly subjective) step is



   the beginning of an initial conjecture on a prescription



   for potentially system-wide improvements.



 



o    Use some of the results of this proscriptive exercise to



   initiate improvements and to gain (back) a deeper



   knowledge of the once-American-now-partly-Japanese ideas



   that surround the second quality revolution.



 



   In the last session, I talked about paradigm shifts in census-



taking.  I am unable to resist doing so again.  In particular, I



would like to refer you to an excellent article in scientific



American (Gomory 1990) on two improvement paradigms: ladders and



cycles.  My belief is that a big -- or ladder -- paradigm shift



(like cognitive methods) may not be needed in the coverage area



(unlike in census-taking).  But, whether it is or not, we must make



better use of small -- or cycle -- paradigm shifts and learn faster



from each other's successes (and failures).  The Federal



Committee's work, as summarized today by Gary and Ron, plus Judy's



ideas, offers a platform for at least some of the improvements



needed.



 



 



References



 



Deming, W. Edwards (1986).  Out of the Crisis, Cambridge:



Massachusetts Institute of Technology.



 



Gomory, Ralph (1990).  "Of Ladders, Cycles and Economic Growth,"



Scientific American, June, 140.



 



Greenia, Nick (1990).  "Sole Proprietor/ship IMF/BMF Connection: An



Application of the Juran Trilogy," Statistics of Income working



paper (unpublished), Internal Revenue Service.



 



Juran, J. M. (1986).  "The Quality Trilogy," Quality Progress,



American Society for Quality Control, Inc., August, 19-24.



 



Konschnik, Carl A. and Moore, Richard A. (1990).  "EC-14, A Study



of the Methodology for Removing Employer Duplicates from the



Nonemployer Universe for the 1987 Censuses of Retail and Services,"



Business Division internal memorandum (September 19), Bureau of the



Census.



 



 



 



 



                               110



 



 



Subcommitte on Survey Coverage, Federal Committee on Statistical



Methodology (1990).  Survey Coverage, Statistical Working Paper 17,



Office of Management and Budget.



 



 



 



 



Click HERE for graphic.



 



Click HERE for graphic.



 



 



 



 



 



 



 



                         DISCUSSION



 



                         Joseph Waksberg



                           Westat, Inc.



 



 



1. Content of the Two-Papers Presented



 



    The two papers present a good review of issues relating to



sampling frames.  Their emphasis is on coverage, but they are not



exclusively devoted to coverage.  They would be useful reading for



anyone developing a design for a new survey, or reconsidering



sampling and related methods for a continuing survey.  Although much



of the material in the two papers covers the same subjects, there



is considerable difference in focus.  As a result, the authors



provide a well-balanced discussion of options normally available



and considerations that should be kept in mind in choosing among



alternatives.  Shapiro and Bosecker mostly describe properties of



frames that affect sample designs.  Judy Lessler places more



emphasis on how the frames can affect measurement methods, and



conversely the way measurements can influence the choice of frames.



The two papers thus complement each other nicely.  The papers contain



definitions, properties of frames, important problems inherent in



some frames, and in some cases suggestions and recommendations for



dealing with the problems.  I'd like to discuss in more detail



several of the points made in the papers.



 



 



2. Minimizing Total Means Square Errors



 



    The authors of both papers imply, although they do not



specifically say, that efforts to improve coverage by choice of



suitable frame and procedures for working with that frame, are all



part of attempts to minimize the total mean square error of survey



estimates.  Although the minimization usually cannot be done in



precise mathematical form, it is almost always part of the



background thinking in developing survey procedures.  Judy Lessler



discusses the relationship of the frame to measurement methods.  In



practice, the situation is even more complex,, with frame,



measurement methods, sample design, and sometimes estimation



methods intertwined.  All four frequently have to be taken into



account in decisions on choice of frame and intensity of efforts to



improve coverage.  Let me give some examples:



 



a.  About 25 years ago,, the sample design for the CPS and the



    other Census-conducted national population and housing



    surveys changed from using area sample frames to list



    samples in most of the U.S.  The list samples consist of



    the set of addresses in the preceding census, plus



    building permits issued for new construction since the



    census date.  In considering pros and cons of the two



    types of frames, it was clear there were biases in both



 



                                114



 



   systems.  Building permits do not quite cover all



    additions to the residential stock of housing, even in



    areas requiring permits for construction.  In addition,



    there is some loss because permits cannot always be



    located in the building permit office.  Finally, in



    theory, the building permit frame should consist of



    permits for units constructed after the date of the



    Census.  The time period is somewhat fuzzy and permits



    issued in the year or so preceding the census cannot be



    unambiguously classified on whether they were included in



    the census (at least not without an inordinate effort and



    cost).  Area frames have other types of bias.  The maps



    Census has used over the years are frequently outdated



    and many are difficult for interviewers to use.  In



    addition, experience over the years indicated that



    interviewers cannot locate all units in area segments and



    a small loss consistently appeared.  This undoubtedly



    affects the quality of the frame although how much and in



    what direction are difficult to quantify.  However, one



    aspect of the comparison of two frames is quite clear.



    The list sample had a smaller variance.  This is because



    over the 10 to 15 years following each census, the



    measures of size of the area segments became seriously



    out of date.  Starting a few years after each census, the



    area segments became quite variable in size, and this



    variability increases progressively over the years.  The



    list sample provides relatively consistent segment sizes.



    The change from area to list sample was mainly introduced



    to reduce the variance arising from variability in



    segment size.  It appeared probable that coverage would



    also improve, although the evidence on this was weak.



 



b.  Westat has carried out three cycles of the National



    Survey of Family Growth for Health Statistics.  The



    sample designs for first two were based on traditional



    area samples.  For the third cycle, the National Health



    Interview Surveys (NHIS) was treated as the sampling



    frame, and the sample consisted of a subsample of



    eligible persons in the NHIS in the preceding year and a



    half.  The original purpose of this revision in the frame



    was to reduce the cost of the extensive screening



    necessary to locate the required number of eligible



    persons.  In order to keep the screening costs in the two



    earlier cycles in check, a complex sample design with



    variable sampling rates was necessary.  The NHIS



    permitted the elimination of most of the variable rates



    resulting in substantial reductions in variances for many



    statistics.  Although it was recognized that there would



    be a small loss in coverage from inability to locate some



    of the persons who moved after the NHIS interview, it was



    felt that the reduction in variances compensated for it.



    There was a side benefit to the procedure adopted.  The



 



                                115



 



  NHIS contained considerable data on social and health



   characteristics of the persons in the frame.  This



   information was very useful in the nonresponse adjustment



   procedure.



 



c. Random digit dialing (RDD) is, of course, much cheaper



   than face-to-face interviewing, especially when screening



   for a target population is necessary.  The difference in



   cost is so great that except for the major complex



   national surveys requiring an extraordinary degree of



   accuracy and surveys requiring physical measurements,



   most surveys both in the government and private sectors



   are now carried out over the telephone.  Although RDD is



   presumably only a sampling device and the sample persons



   can be interviewed over the telephone or in home visits,



   telephone interviewing is so much cheaper that



   researchers generally pick it.  The frame thus influences



   the choice of measurement methods.  It's interesting that



   the emergence of RDD has spurred research into the



   quality of telephone and face-to-face interviews, and the



   findings have made telephone interviewing a more



   respectable measurement method.



 



 



3. Narrowing Definition of Target Population



 



   Shapiro and Bosecker mention that in some circumstances it is



useful to narrow the definition of the target population to one



that permits use of a more accessible frame.  In some sense, this



is almost always done.  Surveys using area samples implicitly



define the target population as those persons who are normally



reported in area samples, thus excluding the undercoverage normally



found.  Business surveys frequently use businesses with one or more



employees instead of all businesses, etc.  I'd like to discuss two



aspects of a narrower definition.



 



 



3.1. Risks of Narrowing Definition



 



   I think most researchers would agree that the redefined target



population should satisfy two criteria:



 



a. It accounts for a very high proportion of the true



  target, preferably 85 to 90 percent or more.



 



b. Characteristics relating to the subject of the study



  should not be wildly different in the narrower population



  and the missing piece.



 



   The second criterion is quite important.  It's not always



recognized that even if the missing part is a small part of the



 



 



                              116



 



inferential population, in some cases it can have big effects.  Let



me give some examples.



 



   RDD telephone surveys are probably the most common method by



which a population is restricted to permit use of less expensive



sampling and interviewing methods.  About 93 percent of the U.S.



population live in telephone households, so that the first



criterion is satisfied.  The extent to which the second criterion



is satisfied depends on the statistic being studied.  For example,



in examining the feasibility of using RDD for a study of school



drop-outs, the following results emerged.  Figure 1 shows drop-out



rates in telephone and nontelephone households for 14-21 year old



youths.  The shaded and cross-hatched boxes represent all drop-



outs, and youths who dropped out in the past year.  It can be seen



that drop-out rates in nontelephone households are about five times



the rates in telephone households.  The discrepancy is large enough



to substantially affect the total, even though the nontelephone



households only account for seven percent of all household.  In



fact estimates of drop-out rates from telephone households alone



would understate the actual drop-out rates by about 25 percent.



These estimates can be improved somewhat by post-stratifying the



telephone household results, but they still seriously underestimate



the true drop-out rates.  Figure 2 shows drop-out rates for



telephone households as a percentage of drop-out rates for the



total population, and similar ratios when post-stratification is



used to compensate for known deficiencies in using telephone



households as a surrogate for all households.  The post-



stratification cells comprised single years of age, race/ethnicity,



and highest grade attended by the head of the household.  As can be



seen, post-stratification improves these rates considerably.  The



ratio for-total drop-outs goes from 77 to 85 percent, but the rates



are still much below the actual numbers.



 



   Telephone households showed up much better for other



statistics studied in the same feasibility study.  An analysis of



enrollment in education programs for three- to five-year olds



showed only trivial bias in restricting a study to telephone



households.  Figure 3 shows ratios of enrollment rates in telephone



households to all households.  As can be seen, post-stratification



practically eliminates whatever bias exists in the data.



 



    Thornberry and Massey similarly report wide differences among



health-related items in the extent to which telephone households



can be considered to represent all households.  For the vast



majority of items, there is no problem, but problems exist for



items related to income.  For example, estimates of the number of



persons with private health insurance would be overstated about



four percent if it were based only on telephone households.  Most



other health items would be affected only slightly.



 



 



 



                                 117



 



3.2. Population for Which Estimates Are Prepared



 



   When a survey uses a frame that does not include the total



target population, there should be a clear and unambiguous



statement on how the sample was selected.  However, the estimation



methods should attempt to adjust the narrow population so that



inferences can be made about the broader population.  Some



researchers feel that there is something wrong in expanding the



results beyond the boundaries of the frame.  I don't think it makes



any sense to tell data users who are interested in a specific



population, that because it is cheaper or easier you've done a study



of another group and they can't infer anything about the population



they're interested in from the study.



 



   Of course, no one would make such a strong statement.



However, there is an implication that the results tell you about



the inferential population but as a scientist you're not allowed to



say so.  It seems to me that since the only reason for having done



the survey was to shed light on the inferential population, it



makes sense to do whatever is necessary to produce the best



estimates you can for that population.  This is, in fact, a



commonly accepted procedure.  The weighting or imputing procedures



used to reduce nonresponse biases implicitly assume that one wants



to produce statistics for the total rather than the respondent



population.  Similarly results of telephone surveys are usually



inflated up to the level of the full population.



 



   There are some real dangers in not taking the trouble to



produce estimates for the inferential population.  Let me cite an



example where even the producers of the statistics forgot the



statistics referred to a narrow population.



 



   In November 1989, the Census Bureau issued a report on the



Black population in the U.S.  One of the statistics cited in the



report was that the black female to male ratio was l00 to 88



compared to 100 to 96 for whites.  The difference is startling, and



if true has serious social implications.   However, the text



statement of this statistics is followed by a sentence which



mentions that the ratios may be affected by greater census



undercoverage of males than females.  Elsewhere in the report is a



footnote stating that the numbers reflect only the civilian



noninstitutional population.  The term may be affected is a gross



understatement of the effect.  If one takes coverage and



institutional population into account, the discrepancy in the sex



ratios between blacks and whites is cut by more than half.  The



full report gave no hint that the sex ratios are affected that much



by these two factors.  Furthermore, by the time a press release was



issued by the Bureau of the Census, the fine line between the



population actually covered in the CPS and the total population was



lost, and the numbers were described as reflecting the difference



between the total black and white population.  The only way one can



avoid these kinds of misinterpretations is to make the best



 



                              118



 



adjustments one can to have the data reflect the population that



readers of the report assume is referred to.



 



 



References



 



1. J.M. Brick, and J. Burke, "Undercoverage Bias in the Field Test



for the National Household Education Survey," report by Westat Inc.



to the National Center for Education Statistics, 1990.



 



2. 0. T. Thornberry and J.M. Massey, "Trends in U.S. Telephone



Coverage Across Time and Subgroups," Telephone Survey Methodology,                                                                                                                                                                                   ocrv,



edited by R.M. Groves et al, John Wiley & Sons, 1989.



 



 



 



Click HERE for graphic.



 



Click HERE for graphic.



 



 



 



 



                                                                                           119



 



                                                                                                                                         120



 



               Session 4



         TELEPHONE DATA COLLECTION



 



 



 



 



 



 



 



 



                  121



 



                 122



 



           QUALITY IMPROVEMENT IN TELEPHONE SURVEYS



 



                         Leyla Mohadjer



                       David Morganstein



                          Westat, Inc.



 



1. Introduction



 



   The use of telephone as an alternative mode of data collection



in surveys has become very popular in recent years.  Considerable



research has been dedicated during the past decade to evaluate the



quality of data collected in telephone surveys and to compare that



with data collected by face-to-face interviewing.  Simultaneous to



the increased use of this methodology has been efforts at improving



its efficiency and reducing the total error of telephone survey



estimates.  This paper provides a summary of recent methods for



improving the quality of telephone surveys and reviews the recent



literature on the results of these efforts.



 



   Below we discuss several aspects of telephone surveys that



fall into the category of "quality improvement." Most of these



issues are design decisions that affect the expected total survey



error.  From its very beginning, the choice of telephone sampling



over face-to-face sampling was one of improved efficiency.  That



is, the cost per complete in almost every case is significantly



less than that of face-to-face sampling while the 'quality' of the



results, as measured by total survey error, is little if any



reduced.  By way of comparison, mail-out surveys may have a very



low cost per complete, but they suffer from large and generally



unknown biases.  Increasing efficiency is a traditional argument



for system changes, such as the choice of telephone sampling over



face-to-face interviewing, whose principal purpose is that of



quality improvement.



 



    In the following sections, we discuss several aspects of



telephone survey operations in which the quality of a telephone



sample design is established.  We begin with decisions regarding



the survey methodology.  These decisions typically include the,



trade-off of greatly reduced survey cost for what might be, at



most, a small increase in mean square error (MSE).  Less



quantifiable in cost terms is the reduced time to completion of



survey operations afforded by a telephone methodology and



improvements in the level of quality assurance.



 



    Next, we discuss a number of sample design aspects which



impact on the survey cost, schedule and error.  We mention the much



discussed issue of coverage and the general problem of frame



construction as they relate to total error.  A number of sample



design improvements have been developed in the past few years which



can decrease the expected number of wasted calls needed in the



 



                               123



 



process of identifying eligible respondents.  These are described



and compared.



 



   As contrasted to other methodologies, telephone surveys



contain a number of operational features which result in improved



quality.  Among these are aspects of management and supervision and



of direct data entry through Computer Assisted Telephone



Interviewing (CATI).  We discuss and quantify some of the benefits



which accrue from these approaches.  Lastly, we review the topic of



estimation as it relates to minimum mean square estimates.  Several



estimation procedures are required by the sample designs which are



worthy of note.



 



2. Overview of the Properties of Telephone Surveys



 



   Telephone surveys have become an often selected alternative to



face-to-face interviewing for several reasons.  Telephone surveys



can be conducted at a much lower cost when compared to face-to-face



interviewing.  They also allow for the sample results to be



available more quickly than face-to-face surveys.  There are



greater opportunities for quality control through more rigorous



supervision and through frequent monitoring of the interviewing



staff.  Also, telephone interviewing makes it possible to contact



otherwise hard-to-reach respondents such as those living in



difficult to visit or dangerous neighborhoods, in bad weather



conditions, or late at night (Groves and Kahn, 1979).  The sample



design effects for estimates derived from telephone surveys are



smaller than those coming from more heavily clustered area



probability designs.  Finally, telephone surveys have smaller



interviewer effects.  Discussions on these issues are provided in



different sections of this paper.



 



   Considerable research has been dedicated to improving sampling



techniques, to increasing response rates, and to reducing



noncoverage bias.  Research has also focused on the issue of data



quality, a comparison of collection modes, and the influence of



collection mode on the quality of the data.  Several authors such



as Groves (1979) and Jordan (1980) have stated that one of the



causes of lower performance for telephone surveys when compared to



face-to-face surveys is the lower degree of operational experience



with telephone surveys.  Leeuw and Zouwen (1988) have analyzed the



results of a number of studies in this area.  Their work confirmed



that the difference between the face-to-face and telephone



interviews is becoming smaller over time.



 



   Leeuw  and Zouwen (1988) integrated findings on interviewing



mode differences and have provided a review of this topic.  The



method of analysis they used made it possible to present an



overview of mode differences found with respect to data quality and



estimate the size of these differences.  The main conclusions of



their paper are the following:



 



                              124



 



-   Response rates are generally higher, for face-to-face



   interviews than for telephone interviews;



 



-   The majority of studies did not find statistically significant



   differences in modes.  When differences were found, however,



   they were in favor of face-to-face interviews; and



 



-   Only small differences were found between random digit dialing



   (RDD) and face-to-face, and the differences have become



   smaller over time.



 



   Leeuw and Zouwen (1988) also point out that one major



difference between the two modes is the lack of visual support in



telephone surveys.  This makes the respondent's task of answering



some questions difficult in telephone surveys.  It also results in



reduced control over the respondents behavior in telephone



surveys.  On the other hand, since the questions come through the



phone, responses are meaningless for other persons in the same room



with the respondent, especially for closed questions.  This reduces



the potential influence of "bystanders" on the respondents.



 



   The fact that telephone interviewing can be contained in a



small area offers many potential benefits.  Interviews done by



telephone are subject to more supervisory control than field



surveys, resulting in a positive effect on the quality of data from



telephone surveys.  Unlike the face-to-face mode, supervisors can



monitor telephone interviewing anonymously and frequently with



little impact on survey costs.  This allows for rapid modification



of questionnaire wording found to be problematical.  In addition,



they can arrange for needed interviewer re-training or they can



make appropriate re-assignments if an interviewer is observed to be



unsuitable for their assignment.  In addition, with CATI systems,



it is much easier to put checks and probes in different parts of



the interview to insure that answers provided by respondents are



consistent throughout the questionnaires.  All of these features



should result in reduced non-sampling error.



 



   Two disadvantages of telephone surveys are the noncoverage of



persons living in households without telephones, and lower response



rates when compared to face-to-face surveys.  Section 3 provides a



discussion of undercoverage in telephone surveys and the methods



available to compensate for the undercoverage.  Section 4 discusses



nonresponse issues in telephone surveys.



 



3. Undercoverage in Telephone Surveys



 



   Households without telephones are not included in telephone



surveys since the sampling frames do not include such households.



A considerable amount of information has been published on the



nature of possible biases resulting from the use of a telephone



sampling frame.  Thornberry and Massey (1988) have analyzed trends



 



                                125



in telephone coverage in the U.S. across time and subgroups of the



population.  They indicate that estimates for the entire U.S.



population, may experience only minor biases because of the high



rates of telephone usage, about 93 percent of the population can be



reached by telephone.



 



    Although overall telephone coverage has risen to a very high



level, it is not uniformly distributed across the population.



Thornberry and Massey (1988), Groves and Kahn (1979), and Banks



(1983) have shown striking differences between telephone and non-



telephone households with respect to demographics, economics, and



health characteristics.



 



    As might be expected, telephone coverage correlates highly



with income.  Massey (1988) points out that other variables such as



employment status, education, marital status, and race are also



correlated with income and thus affect telephone coverage.  More



lower-income persons tend to be missed in telephone screening.



This, in effect, results in higher telephone penetration for whites



than blacks.  Telephone coverage is lower in the South than in the



rest of the U.S., and it is lower in rural than urban areas.



 



    Massey (1988) points out that noncoverage bias is a function



of the noncoverage of a telephone survey frame, and of the



difference in characteristics between the covered and uncovered



population.  Even though the percentage of households with



telephones may increase and the overall noncoverage rate becomes



smaller, large differences between telephone and nontelephone



households can result in significant noncoverage bias.  Surveys



which focus on income or variables related to income may experience



high noncoverage bias.  It is true that the estimates of



characteristics for the total population may not be drastically



affected by the omission of nontelephone households, however, for



some subdomain estimates there could be large biases due to the



exclusion of households without telephones.



 



 



3.1 Methods to Compensate for Undercoverage



 



    Several methods are available in telephone surveys to address



the problem of noncoverage bias.  One approach which may eliminate



certain kinds of undercoverage bias is the use of dual frames.



Dual frame, mixed mode surveys use a combination of RDD and face-



to-face samples to overcome the noncoverage of households without



telephones.  Research in the area of such mixed mode surveys



include Sirken and Casady (1988), Groves and Lepkowski (1985),



Lepkowski and Groves (1984), Biemer (1983), and Casady et al.



(1981).



 



    Sample weighting adjustments in the form of post



stratification factors can be used to decrease the effects of



noncoverage.  The post-stratified weights are frequently employed



 



                                  126



 



in national surveys to compensate for noncoverage bias.  The



subgroups established for the purpose of post-stratification are



specifically tailored to each study.  Subgroups are defined on the



basis of variables thought to be correlated with the major



statistics to be obtained from the survey as well as variables



correlated with telephone penetration and nonresponse distribution.



Massey and Botman (1988) have investigated the impact of post-



stratification survey adjustments in national surveys.  They



discuss several post-stratified weighting adjustment methods for



RDD surveys, and show the effect of these adjustments on the



estimates.  Other work done in this area includes Banks (1983),



Banks and Undersign (1982), and Thornberry and Massey (1978).



Their results show that, although these methods reduce the effects



of undercoverage, they do not completely eliminate the bias.



 



 



3.2 Within Household Coverage



 



     The main focus of research in the area of coverage in random



digit dialing surveys has been on sampling frame inadequacies,



i.e., the exclusion of nontelephone households from the frame, as



discussed earlier.  However, there is another cause of



undercoverage that arises from failure to obtain complete listings



of household members in responding households.  This is usually



referred to as within household coverage.  Within household



coverage also exists in face-to-face surveys.  Maklan and Waksberg



(1988) used two surveys conducted by Westat and compared their



within-household coverage rates with those obtained by the Current



Population Survey (CPS).    They concluded that the coverage of



persons in households with telephones generally available in RDD



surveys is at least as good, if not better, than that provided by



CPS.



 



 



4. Nonresponse Issues in Telephone Surveys



 



    Groves (1988) gives an overview of nonresponse issues in



telephone surveys, and distinguishes between those factors common



in both face-to-face and telephone surveys and those factors that



are specifically related to the selection mode.  Factors such as



length of the questionnaire, subject matter (topic of the survey),



sensitivity of the questions, refusal conversion and callback



routines are common in both modalities.  The differences in



response rates that Groves (1988) cites between face-to-face and



telephone surveys are that refusal rates are higher for telephone



surveys, and relatively more of the refusals take place immediately



after interviewers have introduced themselves prior to describing



the purpose of the survey.  However, as pointed out earlier, Leeuw



and Zouwen (1988) have shown that these differences have become



smaller over time.  Researchers have varied the introductory



section in an effort to reduce early refusals.  A number of



researchers have reported some improvements by using advance



 



                               127



 



letters to alert sample persons about the survey and the upcoming



telephone call.



 



 



5. Choice of a Frame, List vs.  RDD



 



    Essentially, there are three types of sampling frames



available for telephone surveys.  List frames use information



available in telephone directories, or other frames based on



telephone directories, to generate telephone number sample.  This



is the alternative with the greatest undercoverage problem.



Second, random digit dialing provides a frame of all possible



telephone numbers, and thus covers both listed and unlisted



numbers.  Third is a multi-frame approach which uses both



directories and RDD.  Lepkowski (1988) provides a description of



these frames and methods of sample selection used with them.



 



6. Two-Stage RDD



 



   Random digit dialing was originally developed to overcome the



coverage problems inherent in directory samples; however, surveys



of residential respondents were burdened by the excessive effort



required to filter many nonworking or business telephone numbers.



The Mitofsky-Waksberg cluster sample technique eliminates much of



this inefficiency by utilizing the manner in which the telephone



industry initiates new phone exchanges which is to assign a prefix



to either a business or a residential clientele.  Accordingly, it



is possible to select a probability sample that is significantly



richer in residential numbers than would be obtained by conducting



a simple random sample of telephone numbers.



 



6.1 Waksberg Method for Reducing Effort



 



   The method frequently used for large scale residential



telephone surveys is a two-stage cluster procedure.  This method



was originally developed by Mitofsky (1970) and Waksberg (1978),



and is usually referred to as the Mitofsky-Waksberg method.  In a



1978 article, Waksberg demonstrated mathematically that this



procedure provides a probability sample of households with



telephones, in which all telephone numbers have the same



probability of selection.  Further, the method was shown to require



a smaller number of telephone calls than the sampling procedures



previously used for RDD, and thus, as a quality improvement,



significantly reduces the cost and time involved in such surveys in



comparison with dialing numbers at random.



 



   The majority of numbers dialed completely at random are



nonworking, business and other nonresidential numbers.  Current



estimates are that about 75 percent of the potential numbers within



 



                              128



 



existing telephone prefixes are nonworking and another three



percent are businesses or institutions of some type.  Given that



only about 20 percent are residential numbers, a typical RDD simple



random sample requires that five calls be made to locate a single



household.  In some cases, the telephone companies do not provide



a message that the number dialed is not a working number.



Additional checking necessary to distinguish between not-at-homes



and nonworking numbers adds further to the cost of achieving



completed interviews.



 



  The Mitofsky-Waksberg sampling method is designed to reduce



the number of nonproductive calls.  It takes advantage of the fact



that a high proportion of nonworking and commercial numbers occur



in consecutive sequences.  Essentially, the procedure involves two



steps: first, "household cluster identification" (identifying and,



selecting a sample of blocks of 100 numbers called "telephone



clusters," which contain working, residential telephone numbers);



and second, dialing random numbers within the clusters.  Users of



this technique typically locate three residential numbers for every



five attempted within each cluster, a significant improvement in



efficiency for minimal additional effort.



 



 



6.2  Modified Waksberg Method



 



   The "standard" Mitofsky-Waksberg method, which produces a



self-weighting sample, involves designating a desired number of



household clusters, and sampling a constant number of households



per cluster.  There are, however, some awkward operational features



arising from the requirement for a constant number of households



per cluster.    For example, before the need for more telephone



numbers in specific clusters can be determined it is necessary to



wait until the required number of households have been identified



and interviewed.  Since a large number of calls are required to



determine whether a telephone number is residential and, if so, to



obtain the cooperation of the household, the standard method is



rather time-consuming.



 



   To improve the data collection process and to reduce the data



collection time, researchers have come up with different ways to



speed up the data collection (for example, refer to Alexander



[1988], Potthoff, JASA [1987], and Potthoff [1987]).  The modified



Waksberg procedure that Westat sometimes applies is based upon a



fixed number of telephone numbers (instead of households) per



clutter.  There is thus no necessity to wait until the original



sample of clusters has been completed to determine whether the



desired number of households within clusters has been achieved.



With the modified method, sample size becomes a random variable and



the tight control on sample size offered by the original procedures



is loosened.  What is more, the modified procedure results in a



sample that requires sample weights to adjust for differential



probabilities of selection.  Accordingly, its reduced data



 



                               129



 



collection time is purchased at the price of increased sampling



error.



 



 



7. Efficiency of Estimates Derived from RDD Studies



 



   In designing a two-stage RDD sample, the number of sample



clusters and the average number of sample households per cluster



must be specified.  The choice of the sample sizes is usually made



on the basis of cost and variance considerations.  The extent to



which the variances are increased due to clustering depends on the



intraclass correlation between households within cluster and the



average number of eligible households per cluster.



 



   Clustering generally reduces survey data collection costs.



The magnitude of the cost, however, is very different for face-to-



face than it is for telephone surveys.  The cost savings brought



about by reduced travel costs is a virtual necessity in face-to-



face surveys wherein they could comprise a substantial portion of



the total survey cost.  In telephone surveys, clustering is used to



reduce the cost of dialing and reaching telephone numbers that



belong to households.  Considering the minimal cost of dialing



telephone numbers (especially when compared to travel cost in face-



to-face surveys), cluster sizes in telephone surveys need not be as



large as they are in face-to-face surveys.  As a result, statistics



derived from RDD surveys are generally more efficient (have smaller



variances) than those coming from face-to-face surveys.



 



 



8. Improvements in Locating Rare Populations in Telephone



  Surveys



 



   Studies of specific subgroups of the population that comprise



relatively small proportions of the total population have always



been the focus of many research efforts.  With any method of sample



selection, surveys of rare populations almost always require a



considerable amount of screening.  The frame generally used for



RDD, a computer file provided by AT&T;, comprises all telephone



households.  Subsets cannot be determined except as part of a



screening procedure.  Extensive screening is necessary to locate



members of the rare population, and as a result, it is usually very



costly to sample rare populations through telephone surveys,



 



  One efficient option for sampling members of rare populations



is to use a commercially available tape (the Donnelley tape) that



contains census population characteristics for prefix areas.



Mohadjer (1988) provides an evaluation of the quality of the



information on this tape.  Furthermore, Mohadjer (1990) discusses



the effectiveness of using the Donnelley tape to improve the sample



efficiency in an education study.  She shows that sampling



efficiency is greatly improved by using the Donnelley tape to



oversample blacks and Hispanics.



 



                              130



 



9. Interviewer Effects in Telephone Surveys



 



   Many studies have compared interviewer effects in telephone



and face-to-face surveys.  They mainly speculate that the



interviewer effect is smaller for telephone surveys than for face-



to-face surveys.  In this section we examine the potential causes



for interviewer effects and the way these causes relate to the data



collection mode.



 



   Stokes and Yeh (1988) give the following as the potential



causes for interviewer effect:



 



1. Not following directions exactly;



 



2. Variations in personalities, tone of voice;



 



3. Respondent's reaction to characteristics of the



   interviewers; and



 



4. Different response rates for different interviewers.



 



   The main belief is that the variability among interviewers is



smaller in centralized telephone surveys than in face-to-face



surveys.  The reason of ten given is that these effects can be



controlled better by monitoring and supervision in a centralized



data collection facility.  Telephone interviewers can be much more



easily monitored and training can be more uniform as well as more



frequent.  Furthermore, interviewers have the opportunity to



observe and learn more from each other in a centralized facility



such as a telephone center.  This makes interviewer behavior more



uniform in telephone surveys than in face-to-face surveys.



Differences between interviewers can be detected much easier,



especially in centralized facilities.  When differences are



observed between interviewers, steps can be taken readily to reduce,



them.  For example, changes in training or instructions to



interviewers can be implemented more quickly.



 



   The interviewers personality and the respondents reactions



to the interviewer also have smaller effects in telephone surveys.



The tone of voice is the only variable that is thought to have a



higher effect in telephone surveys than in face-to-face surveys.



This effect is suggested because of the lack of visual contact in



telephone surveys (the lack of visual contact increases the effect



of tone of voice on respondents).



 



   A number of steps can be taken to limit these interviewer



effects even further.  There are several quality control measures



which can provide a quick assessment of interviewer performance and



which can identify the need for action.  Strict supervision is



especially important in the early stages of data collection to



 



                                131



 



insure that all interviewers are following directions and have a



clear understanding of the survey purpose and instrument.  Group



meetings to emphasize important aspects of the procedures and



individual conferences with weaker interviewers should be used to



limit the effect of interviewer differences.  All interviewers



should be monitored when they first begin data collection.  Staff



who fail to meet or exceed standards should not be allowed to



continue until they have undergone remedial training.



 



 



10. Computer-Assisted Telephone Interviewing (CATI)



 



   The use of CATI was a quantum step in telephone survey quality



improvement.  Survey organizations have used CATI with increased



frequency in recent years because of its many benefits.  It is



believed that CATI improves the quality of the data collected, it



reduces the cost of data collection, and it increases the



timeliness of telephone surveys.



 



   A CATI system has the potential for providing clean data



immediately after interview completion.  Three CATI features



contribute to this capability.  First, most data edits can be done



on-line as the responses are entered.  A CATI program can prevent



interviewers from entering out-of-range responses ("hard range



check") and can be programmed to require verification of unlikely



responses ("soft range check"), e.g., such as an age of 100 years.



Second, consistency checks are possible in the CATI program



appropriate to inconsistent responses verified during the



interview.  Third, CATI can be set up so as to prevent the



interviewer from leaving a question incomplete.  If the interviewer



has difficulty recording an answer (e.g., difficulty categorizing



the answer into the precoded choice on the CATI screen), they can



be trained to enter a "comment" explaining the circumstances.  A



quality control monitor can be responsible for reviewing all



interviewers comments on a daily basis to resolve difficulties and



to update the data files as required.



 



  It was previously observed that a telephone interview



methodology helps to reduce between-interviewer variances because



of greater opportunity for monitoring and supervision.  Since



interviewers can be easily observed without disturbing the



interview process, frequent monitoring can be used to uncover



interpretation and presentation difficulties, all of which



contributes to reduced interviewer variance.  A CATI approach



represents yet another step in this same direction.  Between



interviewer differences in understanding the flow of the instrument



can be virtually eliminated.



 



  Often sampling efficiency can be improved through the use of



complex respondent selection procedures.  Unfortunately, complexity



can breed errors especially when interviewers are tired or are



dealing with a difficult set of questions.  Through the use of



 



                              132



 



CATI, very complicated sampling rules can be implemented, virtually



without error.  The interviewer enters the household composition



and the software selects a random respondent using pre-specified



sampling rates.



 



    As pointed out by Nicholls (1988), additional advantages of



the CATI systems can be summarized in the following way:



 



-     Rather than being managed by the interviewers, the status



    of each sampled case is available in the computer,



    thereby improving sample management.



 



-     The scheduling and assignment of cases are done by



    computer.  The scheduler schedules the appropriate time



    to call respondents taking into account the time



    differences across the U.S.



 



-     On-line interviewing makes it possible to display the



    instruction to the interviewers, display the survey



    questions, and response categories without any need to



    use paper and pencil.



 



-     Answers to closed questions which are not in the



    permissible range can be determined at the instant the



    response is entered.  The software can prompt the



    interviewer that this answer contradicts another response



    given by the same respondent at an earlier point in the



    interview and a correction can be made immediately.  This



    reduces the need for data retrieval.



 



-     Branching or skipping to the next item is done by the



    computer.  This improves the quality of data collected



    for more complex data collections that involve



    complicated skip patterns and subsampling at different



    stages of data collection.



 



-     Interviewers may interrupt, resume or repeat some of the



    sections.  Also they can go back and correct previous



    answers or write notes on the screen in appropriate



    places.



 



-     The system improves supervision.  The screen and the



    telephone conversation can be seen and heard with no



    disturbance to the interviewing process.  The telephone



    conversation with the respondent can be monitored.  All



    of these advantages result in faster reaction to the



    needs for clarification, re-training or re-assignment.



 



-     The survey results are virtually ready for weighting and



    tabulation upon completion of the data collection phase.



    This more timely data collection makes possible survey



    schedules that could not have been met in the past.



 



                               133



 



-     The CATI system maintains records of on-line calls,



    outcomes of the calls, response rates, and the amount of



    time spent by interviewers.  It can also be used to time



    different parts of the questionnaire if the survey length



    becomes a problem.



 



 



  Since effective CATI interviewers must be able to perform a



number of demanding tasks simultaneously, the task of training



suitable staff is more challenging.  Interviewers must establish



rapport with a respondent, accurately read the questions shown on



the terminal screen, correctly code the response, and enter



messages to the respondent's file indicating that a probe (e.g.,



reading a question, prescribed clarification of an item, etc.) was



required.  In addition, they must record verbatim a respondents



comments oh a question, and keep the respondents interest long



enough to complete the interview.  This is a set of qualifications



that require interpersonal, computer, and typing skills that



surpass those of traditional telephone interviews.  Fortunately,



the improved ability to monitor telephone interviews conducted via



CATI assist in assuring that suitable staff is adequately prepared



for the survey.



 



 



11. Summary



 



  Face-to-face interviewing has long been the standard data



collection method selected when the highest quality survey results



were required.  The authors have reviewed those features of



telephone surveys which can result in improved survey quality, that



is, reduced total survey error for the same, or even for less, cost



as other modes such as face-to-face interviewing.  After review of



these features, survey designers are better able to choose between



a telephone sampling approach and a face-to-face methodology.



 



 



References



 



Alexander, C.H., "Cut Off Rules for Secondary Calling in a Random



Digit Dialing Survey," Telephone Survey Methodolocry, edited by



Robert M. Groves et al, John Wiley & Sons, 1988.



 



Banks, M.J., "Comparing Health and Medical Care Estimates of the



Phone and Nonphone Populations," Proceedings of the Section on



Survey Research Methods, American Statistical Association, 1983



page 569-574.



 



Banks, M.J., and Anderson, R.M., "Estimating and Adjusting for



Nonphone Coverage Bias Using Center for Health Administration



Studies Data," in National Center for Health Services Research,



Heath Survey Research Methods: Proceeding of the 4th Biannual



Conference, National Center for Health Services Research Proceeding



 



                             134



 



Series, Department of Health and Human services Publication No.



PHS) 84-3346, 1982.



 



Groves, R.M., and Lyberg, L.E., Tdlephone Survey Methodology,



edited by Robert M. I Groves et al, John Wiley & Sons, 1988, page



191.



 



Groves, R.M., and Kahn, R.L., Surveys by Telephone: A National



Comparison With Personal Interviews, New York, Academic Press,



1979.



 



Jordan, L.A., Marcus, A.C., and Reeder, L.G., "Response Styles in



Telephone and Household Interviewing: A Field Experiment," Public



Opinion Quarterly, Vol. 44, No. 2, Slimmer 1980, page 210-222.



 



DeLeeuw, E.D. and Van der Zouwen, J., Telephone Survey Methodology,



edited by Robert M. Groves et al, John Wiley & Sons, 1988, page



283.



 



Lepkowski, J.M., Telephone Survey Methodology, edited by Robert M.



Groves et al, John Wiley Sons, 1988, page 73.



 



Maklan, D., and Waksberg, J., Telephone Survey Methodology, edited



by Robert M. Groves et al.  John Wiley & Sons, 1988, page 51.



 



Massey, J. T., and Botman, S.L., Telephone Survey Methodology,



edited by Robert M. Groves et al, John Wiley & Sons, 1988, page



143.



 



Massey, J. T., Telephone Survey Methodology, edited by Robert M.



Groves et al, John Wiley & Sons, 1988, page 3.



 



Mitofsky, W., "Sampling of Telephone Households," unpublished CBS



memorandum, 1970.



 



Mohadjer, L., "A Study of the Effectiveness of Oversampling



Telephone Clusters with High Concentrations of Blacks and Hispanics



in the NHES Field Test," unpublished report to NCES, 1990.



 



Mahadjer, L., Telephone Survey Methodology, edited by Robert M.



Groves et al, John Wiley & Sons, 1988, page 161.



 



Nicholls, W.L., II, Telephone Survey Methodology,  edited by Robert



M. Groves et al, John Wiley & Sons, 1988, page 377.



 



Potthoff, R.F., "Some Generalizations of the Mitofsky-Waksberg



Technique for Random Digit Dialing." Journal of the American



Statistical Association, Vol. 82, No. 398, June 1987, pp. 409-418.



 



Potthoff , R. F. , "Generalizations of the Mitofsky-Waksberg Technique



for Random Digit Dialing: Some Added Topics," Proceedings of the



 



                                 135



 



Section on Survey Research Methods, American Statistical



Association, 1987, pp. 615-620.



 



Sirken, M.G., and Casady, R.J., Telephone Survey Methodology,



edited by Robert M. Groves et al, John Wiley & Sons, 1988, page



175.



 



Stokes, L., and Yeh, M., Telephone Survey Methodology, edited by



Robert M. Groves et al, John Wiley & Sons, 1988, page 357.



 



Thornberry, O.T., Jr., and Massey J.T., Telephone Survey



Methodology, edited by Robert M. Groves et al, John Wiley & Sons,



1988, page 25.



 



Thornberry, O.T., Jr., and Massey J.T., "Correcting for



Undercoverage Bias in Random Digit Dialed National Health Surveys,"



Proceeding of the Section on Survey Research Methods, American



Statistical Association, 1978, page 224-229.



 



Waksberg, J., "Sampling Methods for Random Digit Dialing," Journal



of the American Statistical Association, Vol. 73, No. 361, March



1978, page 40-46.



 



 



 



 



 



 



 



 



                                136



 



     COMPUTER ASSISTED SURVEY TECHNOLOGIES IN GOVERNMENT:



                           AN OVERVIEW



 



                           Marc Tosiano



            National Agricultural Statistics Service



 



 



Introduction



 



     CATI is an acronym for computer assisted telephone



interviewing.  It it the interactive use of computers to assist in



data collection activities typically performed in a centralized



telephone facility of a survey organization.(27)(31)  CATI is only one



use of the computer in the growing realm of computer assisted



survey work.  Other uses of computer assisted surveys include: 1)



computer assisted personal interviewing (CAPI), 2) computerized



self administered questionnaires (CSAQ,), 3) computer assisted data



entry (CADE) of information on paper questionnaires into a



electronic format.(31)(26)   Each of these computer assisted survey



techniques may be used alone for a survey or in combination



depending on survey management requirements and the various modes



used to collect data for a given survey.



 



 



Features of computer assisted surveys



 



     During an interview, the minimum use of computer assisted



survey technology is the presentation of survey questions and their



response categories on the computer screen.  Interviewers read the



question to the respondent and key the answer on the screen by



using the computer keyboard.  However, computer assisted survey



techniques offer many capabilities above and beyond the traditional



paper questionnaire.  These features include enhancements to the



interview proper as well as the automation of survey management



activities.  Obviously, the features available depend on the



software chosen for computer assisted interviewing.(10)   Common



features offered by various computer assisted survey software



include:  (26)(10)(24)



 



 



On-line interviewing:



 



o     Instructional or reference information appears on the



     screen or is available via help screens to assist the



     interviewer.



 



o     Fills are used to customize question wording by inserting



     input from records prior to the survey or from answers to



     previous questions.



 



 



 



 



                                 137



 



o    Answers to closed questions are checked against



    permissible entries.  Some software offers multiple



    responses as well.



 



o    Numeric answers are checked against a pre-defined range.



 



o    Consistency checks are made against data collected



    earlier in the interview.



 



o    Answers detected as invalid can invoke an error



    correction routine or additional probing questions.



 



o    Formats are available for special answers, e.g., date,



    time, money, zip code, etc.



 



o    Open-ended questions or interviewer notes are answered by



    typing text.



 



o    Question order as well as response categories may be



    randomized to reduce order effects.



 



o    Item-based design offers one question per screen or



    multiple related questions per screen; the interviewer is



    forced to answer the questions in a pre-determined



    sequence.



 



o    Form-based design presents a screen that simulates a



    paper form.  The interviewer is free to move the cursor



    around the form and fill in-the form in any order.



 



o    Automatic branching is done based on input from records



    prior to the survey, previous answers in the interview,



    logical conditions, or arithmetic checks.



 



 



Creating the computer assisted questionnaire.



 



o    Some packages offer a menu driven approach to building



    the questionnaire while others require the use of a



    special programming language.



 



o    Some packages come with their own editor to write or



    change the questionnaire, but other editors or word



    processors may be used as well.



 



o    Questionnaire debugging tools of various strengths may be



    available.



 



o    A paper copy of the questionnaire including screen prints



    and a flow chart of the questionnaire may be available.



 



 



 



                              138



 



Survey management:



 



o    The sample is stored in computer media and the software



     maintains the status of each questionnaire.



 



o    Sampling procedures may be available including random



     digit dialing facilities.



 



o    Call scheduling delivers the next case to be called by



     the CATI interviewer.  The call scheduler prioritizes and



     sequences the calls made in the CATI environment.  This



     includes the retrieving of cases at the appointed time



     for a call-back, establishing follow-up calls for busy



     signals or no answers, and targeting groups of cases such



     as strata or replicates.



 



o    Survey managers may generate reports including such



     things as: completions, response rates, refusal rates,



     time per interview or question, call-back appointments,



     etc.  These reports may be by interviewer, by day, by



     shift, cumulative, etc.



 



o    Monitoring individual CATI interviews may be done by



     viewing the interviewer's screen at a supervisor's



     workstation where audio monitoring may be available as



     well.



 



Data handling and analysis:



 



o    Post-survey processing may be done to review, edit,



     clean, or code each interview.



 



o    A codebook may be created containing questions, the



     variable names and location in the dataset, etc.



 



o    An audit trail may be maintained with all previous



     answers if an answer is changed.



 



o    Output files are created in a form ready for the next



     processing stage, these could include SPSS and SAS



     datasets.



 



o    Some packages offer their own statistical analysis



     packages, including histograms, distributions,



     regression, Analysis of Variance (ANOVA), etc.



 



     The features listed above are not available in all computer



assisted survey software.  A survey organization procuring software



for a computer assisted application would have to decide which



features are important and select software accordingly.  In



 



                                  139



 



addition, all software packages will not operate on all computer



hardware; a problem for all computer systems which must be resolved



is the matching of software to hardware.



 



    Computer assisted survey software is relatively new and



constantly evolving; enhancements are usually inspired by the needs



and requirements of end users.(10)  Therefore, another consideration



in choosing software might be the existence of a user support group



and the willingness of the software company to enhance the system



as new features are requested by users and the cost of these



modifications.



 



    These different features of computer assisted survey software



have various effects on costs and quality of data.  For example,



the use of interactive interviewing may improve the quality of



data, but without call scheduling, the productivity of interviewers



may be unaffected.(26)  If improved interviewer efficiency and the



elimination of paper callback records is important, software with



call scheduling would be more attractive.  However, systems with



call scheduling may not be strong in other areas such as form-based



design or software portability across various hardware.  Evaluating



these trade-offs is a difficult but critical task in choosing (or



developing) this type of software.



 



Costs and Data Quality



 



    The CATI concept was originally proposed by the American



Telephone & Telegraph Company; in 1971 they sponsored the first



CATI survey to measure customer satisfaction.(26)  After this



experience, CATI was believed to have three advantages over



conventional data collection methods: "accuracy, speed and reduced



costs".(22)  Since then there have been many studies and papers



evaluating the accuracy or extent of the validity of these original



beliefs.(26,22,9,19,35,41,37,16,17,18,33,36,38)  Some authors have also reviewed



the impact of CATI on survey administration and the internal                                                



structure of survey organizations.(5,6,13,21,50,320  This section of the



paper does not intend to review all of these sources but to briefly



review some of the implications and consequences that arise by



using this new computer assisted survey technology.  Some of the



topics discussed here are not easily definable as advantages or



disadvantages; it often depends on the methods used to implement



this new technology.



 



   The first set of reasons for implementing computer assisted



surveys is to expedite surveys and thereby reduce costs.(24)  There is



always the initial cost of procuring and maintaining hardware and



software.  This overhead cost could be alleviated by utilizing the



hardware and/or software for projects other than computer assisted



surveys.(35)  Some of the hardware configurations used in the past



have been 'dumb' terminals attached to a centralized mainframe



 



                               140



 



computer.(14)  Later, terminals or 'intelligent' microcomputers were



attached to a minicomputer.(27,35)  The latest hardware innovation used



is microcomputers used in a stand-alone or in a Local Area Network



(LAN) environment.(8)  After these items are procured, there are the



costs for training the staff to implement the new technology, and



training the interviewers on use of the system.  Interviewer



training costs also depend on the turnover rate of interviewers.



CATI questionnaire design will take longer than paper design



because it employs many of the features listed previously such as



automatic branching, use of fills, interactive editing and



consistency checks, interviewer 'helps' and special processing



needed for other activities previously done on paper.(20)  This



special processing includes resolution of busys, no answers,



refusals, arranging callbacks and other administrative activities.



As with other programming, CATI-questionnaire designers typically



'steal' code from previous studies whenever possible.  This



efficient use of previous code is enhanced by the use, of modular or



structured programming.  The CATI questionnaire setup for some



surveys could be faster and simpler than creating a paper



questionnaire, but only if the CATI instrument emulates the paper



which is seldom the case.



 



   Once past these overhead costs, there are other cost



considerations.  Interviews typically take longer with computer,



assisted surveys because of the edit checks and additional



questions generated to probe for corrections or clarifications;



another reason could be the interviewer's lack of familiarity with



the keyboard, especially if there is a lot of text to be entered.



These higher costs are somewhat offset by other features of



computer assisted surveys.  The use of an automatic scheduler can



improve interviewer efficiency and reduce the cost of supervision



by eliminating voluminous and tedious paper shuffling; supervisors



are freed to do more real supervising rather than managing



callbacks.(50)  Status systems automatically keep track of each case



in the sample including its current disposition and any actions



taken on the case.  Immediately after each interview, the data is



already in electronic medium; this eliminates the data entry stage



necessary in conventional data collection.  At any time during the



survey, output files are available for preliminary analysis and/or



administrative reports needed to allocate resources during the



remainder of the survey period.



 



    The second and probably the more important set of reasons to



implement computer assisted surveys is to improve survey data



quality and enhance the ability to implement complex surveys.(24,9)



One major source of improved data quality is the ability to perform



on-line edit and consistency checks which means corrections can be



made during the interview with the help of the respondent.  Post-



survey edit checks can be eliminated or greatly reduced.(35)   Many



times, post-survey corrections to the interview are done without



re-contacting the respondent; this results in more unknown or



 



 



                                 141



 



imputed data.  Computer assisted surveys also result in increased



standardization among interviewers, especially in a central



telephone facility.(35,15)  This standardization may help reduce some



interviewer effects typically seen in paper questionnaires such as



following proper question sequence.(16)  However, there are sources of



error possible which did not exist in the paper environment such as



simply keying the an incorrect number for an answer while using



touch typing.  Some of the benefits of complex instruments include:



creation of multiple versions of a questionnaire within the same



instrument, inclusion of pre-programmed probes, use of historic



data from previous surveys, table look-up routines, and other



techniques difficult to employ in a paper questionnaire.  In



addition, computer assisted technology permits easier



implementation of research than does its paper counterpart.  Some



examples are: randomizing questions and answer categories, use of



historic data, use of randomized probes to check respondents



understanding of questions, re-interview and reconciliation



studies, and item-based versus form-based questionnaire design.



 



 



Government CATI Implementations(39)



 



    Early CATI systems were developed by United States market



research organizations in the late 1960's and early 1970'S.(14)



University survey research centers became involved in this



technology in the middle 1970's.(27)  U.S. government agencies did not



begin work with CATI until 1980 when both the Census Bureau and the



National Agricultural Statistics Service (NASS) each established



working groups to investigate this technology.(2,25,35)



 



   The largest installations of CATI in the federal government



are in operation in four agencies: Bureau of Labor Statistics



(BLS), Census Bureau, National Agricultural Statistics Service



(NASS), National Centers for Disease Control (CDC).  BLS has about



70 workstations in 14 sites.  This includes a 10 workstation test



site for developing CATI methods for the Consumer Price Index (CPI)



Surveys which is planned for expansion to a 50 workstation



production facility by 1994.  Another 20 workstation site is in BLS



headquarters for special surveys of the BLS Office of Employment



and Unemployment Statistics.  Their largest use of CATI is 40



workstations in 12 sites for the monthly establishment survey



supporting the Current Employment Statistics Program.  CATI is used



for interviewing, non-response follow-up, and failed edit



reconciliation.  If successful, BLS plans expansion of these 12



sites to all 51 State offices with about 200 workstations in 1994.



 



   The Census Bureau has two CATI sites with about l00



workstations.  one site of 30 workstations is the Field Division's



Hagerstown Telephone Center which collects data for surveys of



household residents and small surveys of industry.  This site is



expected to expand from 30 to between 250 and 300 workstations by



 



 



                               142



 



1994.  The second Census CATI site is in Jeffersonville, Indiana



where 70 workstations ate used to collect data from establishments



for the Retail and Wholesale Trade Industries.  Here, CATI is used



for telephone interviews, data capture from paper questionnaires



and failed edit follow-up.



 



 The National Agricultural Statistics Service (NASS) surveys



farm operators and agricultural businesses with the largest CATI



network in the Federal government.  NASS has about 200 workstations



operational in 14 State offices.  Four additional State offices



have recently installed the hardware and software for CATI and will



soon become operational.  This brings the NASS CATI capabilities to



about 260 workstations in 18 State offices.  Current plans are to



install Local Area Networks in 42 State offices by 1992; this will



increase the CATI workstation count to about 750 nationwide.  While



mostly used by CATI interviewers after business hours, these same



workstations will also be used by the office staff during the day



for normal office operations.  These daytime operations include



survey activities (e.g., data capture of paper questionnaires,



interactive error detection and correction of data collected,



survey management) and all other office work (e.g., word



processing, spreadsheet operations, graphics).



 



   The National Centers for Disease Control (CDC) operates about



150 workstations in 21 State offices to collect data for the



Behavioral Risk Factors Surveillance Survey and other random digit



dialed household surveys.  Little expansion of the CDC CATI network



is expected over the next few years because data collection is



commonly contracted out to other survey organizations.



 



   These Federal agencies are expanding their CATI capabilities



and Plan to complete their initial CATI implementation by 1994.



Unlike many private and university survey organizations, government



CATI installations are Not generally implemented in a national or



regional centralized telephone facility.  Most of the federal



resources are directed toward smaller State offices where the same



equipment is used for other survey related activities and office



automation (BLSL CDC, NASS).  Even with this increase in CATI



activities, CATI will not become the only mode of data collection.



Mailed questionnaires are still important in the mixed mode method



of data collection in NASS and BLS.  Personal interviewing is still



important to all agencies as well, often as part of mixed mode data



collection; the field interviewing staff numbers about 3,000 in the



Census Bureau and about 2,800 in the National Agricultural



Statistics Service.  With these large field staffs, implementing



CAPI may be the next large task facing computer assisted survey



work in these agencies.



 



   In private and university survey organizations the use of



CATI is generally associated with a single centralized telephone



facility.   CATI encourages a centralized facility to benefit from



some of the features listed earlier such as automatic call



 



                               143



 



scheduling, monitoring, and administrative reports.(13,40,1)  A central



facility is better suited to computer assisted operations because



of the shared hardware, software, sample, and technical support.



While CATI improves standardized interviewing and quality control



(by automatic branching, tailored question wording, and probes for



on-line edits) , centralization contributes to survey management



with consolidated and more standardized training and supervision of



interviewers.  One disadvantages of centralization may be that the



interviewers do not have the local knowledge, and cultural



understandinG which local interviewers may share with the



respondents.(21)



 



   A major challenge to federal agencies implementing CATI



involves the resolution of the associated issue of centralized or



decentralized interviewing.  Many agencies already have national,



regional, and/or State offices with commitments to Federal State



agreements, office staff, and an interviewer staff including office



and field interviewers.  These commitments may have as much impact



on implementation decisions as the goals of operational efficiency



and maximizing data quality.  The Census Bureau has transferred the



Retail and Wholesale Trade survey from, the traditional regional



telephone calling to one centralizeD CATI facility in Indiana. the



other previously mentioned three agencies have maintained their



dispersed data collection techniques by implementing CATI in the



existing regional or State offices.  However, these dispersed CATI



facilities can be used as central sites as well.  For example, if



a given sample is so widespread across the country, one or more



State offices can be designated as regional CATI centers for that



survey.(5)  NASS has successfully tested the centralization of CATI



interviewing in regional centers while personal interviews were



still administered from the State offices.  However, this mixed



mode with centralization for only part of the data collection



requires strong communication, coordination, and overall survey



management.(5)



 



   Other organizational considerations revolve around the



question, "How do computer assisted techniques fit in with the



current mode of operations?"  Some of these considerations may be



specific to a survey or addressed for overall computer assisted



operations.  A few examples follow:  What is the role of the



supervisory interviewer?  Should CATI edit checks during the



interview or during post survey processing totally replace existing



batch edits?  How should technological advances in software and



hardware be incorporated into an existing CATI operation?  When a



mailed questionnaire is followed up with CATI or CAPI, how closely



should the interview instrument follow the paper questionnaire?"



 



   In many cases, the difficulties of implementing computer



assisted techniques in government agencies arise from



organizational requirements, not the technology itself.  Some of



the problems encountered with CATI are due to use of a central



 



                               144



 



facility; these problems would be the same if paper questionnaires



were used in the same central environment.  Therefore, it is



important to understand the source of potential problems when



advocating or implementing a computer assisted system



technological and organizational.



 



The Future of Computer Assisted Technology(39) 



 



   As these four government agencies are approaching full CATI



implementation, newer technologies are developing which go beyond



telephone interviews and some re-evaluation is necessary.  Very



little research has been done to measure the cost, timeliness, and



data quality of surveys done with these new approaches.  This paper



reviews some of the major new technologies, and their possible use



by survey organizations.  These technologies can be divided into



five groups: computer assisted personal interviews, computer



assisted self administered questionnaires, geographic and



communication technologies, voice technology, and artificial



intelligence.



 



Computer Assisted Personal Interviews



 



    Now that computers are getting smaller and smaller, computer



assisted personal interviewing (CAPI) is the next natural extension



of computer assisted interviewing beyond CATI.  As mentioned



before, personal interviewing is still important in federal survey



agencies; CAPI can be used to benefit from the advantages listed



earlier and also improve the data transfer between personal and



telephone interviewing for mixed mode surveys.  Unlike the course



of CATI development, government agencies are in the forefront of



CAPI development both for their own use and in sponsoring CAPI



investigations by universities and the private sector.  Also, the



government's implementation of CAPI is proceeding rapidly compared



to CATI.  CAPI investigations have found that CAPI data collection



is acceptable to most respondents and that most experienced field



interviewers can be trained in its use.(7,23,42,44,3,43,49)



 



    In addition to the organizational considerations of



implementation of CAPI there are some technological problems which



need to be addressed.  Assignments and questionnaires must be given



to CAPI interviewers and completed interview data must be sent back



to the office.  National Analysts have used the mail, UPS, And



courier services for this transmittal during the Nationwide Food



Consumption Survey.(44)   Another method is to use automated



telecommunications with modems attached to computers.  Research



Triangle Institute (contracted by the Envirormental Protection



Agency) and the Netherlands Central Bureau of Statistics have used



telecommunications with some success.(43,49)  However, the Netherlands



is returning to the use of mail for data transmission as a simpler



 



                                145



 



and less costly approach.(46)  If a workable solution is found, rapid



telecommunications between the office and interviewers may be



especially advantageous when operating on tight deadlines and using



mixed mode methods.  The Census Bureau and NASS plan to investigate



an integrated CATI-CAPI system where cases can be transmitted



between CATI interviewers in central or state offices and CAPI



interviewers dispersed throughout the field.



 



     The software used in CAPI is typically the same as used for



CATI or personal interviewing in an office environment.  However,



personal interviews in the respondents home or at the doorstep can



be more demanding and distracting.  This may call for special



software features for question formats, entry modes and



questionnaire movement commands which are easier to use.  These



features specific to CAPI interviewers have not yet been determined



or shared.



 



     The hardware used for CAPI applications is still evolving and



being investigated.  Machines must also be evaluated based on the



environment expected for conducting interviews: on a table top,



standing and holding the machine, or both.  The machines generally



available for CAPI include laptop computers, hand held computers,



and slate computers (handwritten character recognition devices).



The laptops are generally 4 to 15 pounds and have various sized and



types of screens and keyboards.  Hand held computers are much



smaller but offer very small screens, keyboards, and limited



computing power which eliminates some software packages.  Slate



computers range from 3 to 4 pounds and are held like a clipboard



while the interviewer reads questions and writes the answers on the



screen with a stylus.  This device emulates paper questionnaire



data entry and some machines are able to recognize special



functions such as tallies, diagrams, maps, and signatures.  Unlike



a year ago, these devices now run DOS based systems and NASS can



run both BLAISE and CASES computer assisted software for CAPI



applications on the Gridpad machine.



 



     The weight of these machines is an important factor in an



interviewer's acceptance of using a machine as a data collection



tool.  Most recent tests of CAPI have been qualitative reports with



inconsistent findings.(42,3,43)  However, recent laboratory research has



studied ergonomic properties of CAPI, interviewer attitudes, and



logistical features of the technology.  This work investigated the



maximum weight of laptop computers which would lead to the



acceptance of CAPI by interviewers for doorstep interviewing;



further research is being done to include newer lighter laptops and



slate computers.(45)



 



    Once the technological problems are resolved, survey



organization and management will require review and modification to



meet the needs of a computer assisted survey environment.  CAPI may



change the methods of assigning, conducting, supervising, checking-



 



                                  146



 



 



                                147



 



in, and reviewing interviews.  These changes will affect staffing



requirements and how to most effectively organize and manage survey



personnel.  For example: 1) CAPI field supervisors must cope with



hardware, software, and telecommunications problems in addition to



interpersonal skills. 2) Interviewer training must include machine



maintenance, CAPI interviewing, and transmission of assignments and



data.  To reduce costs, some of this training could be done as home



study with on-line tutorials.  3) The software will eliminate



survey specific errors such as inappropriate skips or data



inconsistencies; however, supervisors will need to identify



interviewers needing further training in CAPI operations. 4) Field



supervisors and office staff must use new techniques to check-in,



review, and edit CAPI interviews.  Due to computerization, some of



these-functions may also disappear requiring clerical staff to be



replaced with technical staff.  5) With better communications and



data transmission, the relationship of State, regional, and



headquarters staff may change as well.  Data and messages could



travel directly between field interviewers and headquarters.  All



these possibilities and more will affect how CAPI is implemented in



the various survey agencies.



 



Computerized Self-Administered Questionnaires



 



    Establishment surveys usually collect brief numeric responses



from the same respondents time after time.  New technologies may be



welcomed by these respondents if it results in reduced respondent



burden or is perceived as such.  This  area is ripe for the



investigation of computerized self-administered questionnaires



(CSAQ).



 



    BLS is experimenting with voice simulation of the questions



and touchtone data entry of the answers by the respondent.(51,29,4)



When respondents have prepared their reports, they dial a local



telephone number at a nearby BLS office; a voice simulation module



requests the entry of their identification number on the



telephone's touchtone pad.  The voice module then asks survey



questions that the respondent answers by keying the numeric



response on the touchtone pad.  Since this procedure operates 24



hours, this interaction can be done at the respondent's



convenience; without a telephone interviewer and data entry staff,



costs are minimal.  Of course, a BLS interviewer is still needed to



call non-respondents after a cutoff date or to resolve data



inconsistencies.  A further extension of this project is voice



recognition of the respondent which would eliminate the need to key



answers on the touchtone pad.(48)



 



   The Energy Information Agency (EIA) is investigating CSAQ by



using respondents' personal computers.(34)  Respondents who have



access to personal computers are given diskettes containing the



monthly CSAQ, menu-driven procedures to obtain the necessary



 



                            147



 



information from other files, and programmed procedures to



electronically transmit the completed questionnaire to the EIA



ccomputer.



 



 



Geographic and Communication Technologies



 



Other technological developments may assist field interviewers



in some of the administrative work accompanying personal



interviews.  These include automobile telephones, beepers, and



navigational and position-recognizing systems to provide reliable



geographic coordinates.  This technology could be used to:  1)



assist rural field interviewers in locating sampling units by using



coordinate position of landmarks and buildings; 2) update maps by



driving through new streets not on current maps; 3) define,



coordinates of area frame boundaries for sampling because these



coordinates are not affected by changes in physical boundaries or



political borders; 4) recording precise locations of dwellings and



establishments to allow summation of data to any area definable by



geographic coordinates.  On recent examination, the Census Bureau



found that current systems are not sufficiently accurate, reliable



and cost-effective for typical survey applications.(39)



 



Voice Technology



 



The National Bureau of Standards has recommended that this



technology be investigated by the Census Bureau as the next step in           computer assisted methods.28  This technology includes both voice



simulation and speech recognition.  It could be used to conduct



telephone interviews without human interviewers or as an auxiliary



computer tool to reduce the keyboard skill necessary for



interviewers using computer assisted methods.  As mentioned



earlier, some voice technology is being investigated for gathering



data from establishments at their convenience.  For household or



other personal surveys, acceptance of a fully automated computer



interview seems to depend upon respondent acceptance.  However, the



potential cost savings possible from voice technology will probably



stimulate further research in this area; survey agencies will need



to evaluate these new systems as they become available to judge



their applicability to surveys.



 



Artificial Intelligence



 



 Artificial Intelligence is a computer discipline which builds



computer programs that perform tasks requiring intelligence when



done by humans.  This discipline is used to develop expert systems



for problem solving which involve the use of appropriate



information acquired previously from human experts.(11)   This



technology has been used by Westat for computer assisted coding of



 



                           148



 



open-ended questions on a paper questionnaire.  Initially, humans



do all the coding which is recorded by the computer and from this



human input, the computer "learns" how to do this coding as well.



As the coding process continues, the computer program can code



increasingly more open-ended responses while the human operator can



verify these codes and handles the responses not yet "learned" by



the program.(47)  Although this technology may have limited use during



data collection, this may be a computer assisted technique which



could benefit other survey management tasks like case assignments,



questionnaire coding, and automatic call scheduling.



 



Conclusions



 



 Government survey agencies have taken about 20 years to



implement one new technology, CATI.  Meanwhile, technology has



advanced into many other areas such as computer assisted personal



interviews, computerized self-administered questionnaires,



geographic and communication technologies, voice technology, and



artificial intelligence.  This technology explosion means that



survey agencies need to evaluate an ever increasing number of



methods which may improve data collection and survey management.



In addition to investigating new technologies, the associated



organizational and methodological factors must be addressed so that



all implications are considered before implementing advanced



computer assisted survey methods.  All the while, studies must



continue to evaluate the effects of these factors on survey costs,



timeliness, and data quality.



 



Acknowledgement



 



A note of gratitude goes to Bill Nicholls for material



presented in his report as referenced in [39].



 



References



 



 



1 American statistical Association, Proceedings of the Section on



Survey Research Methods, Washington, D. C.: American Statistical



Association, 1978.   Experiences with CATI in a Large-Scale-Survey,



by William L. Nicholls II, pp. 9-17.



 



2 American Statistical Association, Proceedinas of the Section on



Survey Research Methods, Washington, D.C.: American Statistical



Association, 1983.  Measuring CATI Effects on Numerical Data, by



Carol C. House and Betsy Morton, pp. 135-138.



 



3 American Statistical Association, Proceedings of the Section on



Survey Research Methods, Washington, D.C.: American Statistical



 



 



                           149



 



                                                             



 



 



 



Association, 1988.  Development of a Computer Assisted Personal



Interview For the National Health Interview Survey, by Stewart C.



Rice Jr., Robert A. Wright and Ben Rowe.



 



4  American Statistical Association, Proceedings of the Section on



Survey Research Methods, Washington, D.C.: American Statistical



Association, 1989.  Developing a Cost Model for Alternative Data



Collection Methods; Mail, CATI, and TDE, by Richard Clayton and



Louis Harrell.



 



5 Bass, Robert T., and Robert D. Tortora, "A Comparison of



Centralized CATI Facilities for An Agricultural Labor Survey," in



Robert M. Groves et al. (editors), Telephone Survey Methodology,



New York, Wiley Press, 1988.



 



6  Berry, Sandra H. and Diane O'Rourke, "Administrative Designs for



Centralized Telephone Survey Centers: Implications of the Transfer



to CATI," in Robert M. Groves et al. (editors), Telephone Survey



Methodology, New York, Wiley Press, 1988.



 



7  Birkett, N. J., "Computer-Aided Personal Interviewing: A New



Technique for Data Collection in Epidemiologic Surveys," American



Journal of Epidemiology, Vol. 127, No. 3, 1988t pp. 684-690.



 



8  Carpenter, Edwin H., "Software Tools for Data Collection: Micro-



Assisted Interviewing," Social Science Computer Review, Fall 1988.



 



9  Catlin, Gary and Susan Ingram, "The Effects of CATI on Costs and



Data Quality: A Comparison of CATI and Pater Methods in Centralized



Interviewing," in Robert M. Groves et al. (editors), Telephone



Survey Methodology, New York, Wiley Press, 1988.



 



10 deBie, Steven E., Inede A. L. Stoop and Katrinus L. M. deVries,



CAI Software -- An Evaluation of Software for Computer Assisted



Interviewing, Amsterdam, Association of Social Research Institutes,



1989.



 



11 Dictionary of Computing, New York, Oxford University Press, 1986.



 



12  Dillman, Don A.  Mail and Telephone Surveys -- The Total Design



Method.  New York: John Wiley & Sons, 1978.



 



13  Dillman, Don A. and John Tarnai, "Administrative Issues in Mixed



Mode Surveys," in Robert M. Groves et al. (editors), Telephone



Survey Methodology, New York, Wiley Press, 1988.



 



14  Fink, James C., "CATI's First Decade: The Chilton Experience,"



Sociological Methods & Research, Vol. 12, No. 2, 1983, pp. 153-168.



 



15  Freeman, Howard E., "Research Opportunities Related to CATI,"



Sociological Methods & Research, Vol. 12, No. 2, 1983, pp. 143-152.



 



                             150



 



16 Groves, Robert M. and Nancy A. Mathiowetz, "Computer Assisted



Telephone Interviewing: Effects on Interviewers and Respondents,



Public Opinion Quarterly, Vol. 48, 1984, pp. 356-369.



 



17  Groves, Robert M. and Lou J. Magilavy, "Measuring and Explaining



Interviewer Effects in Centralized Telephone Surveys," Public



Opinion Quarterly, Vol. 50, 1984, pp. 251-266.



 



Groves, Robert M., "Implications of CATI," Sociologica1 Methods



Research, Vol. 12, No. 2, 1983, pp. 199-215.



 



19 Groves, Robert M and William L. Nicholls II, "The Status of



Computer-Assisted Telephone Interviewing: Part II -- Data Quality



Issues," Journal of Official Statistics, Vol. 2, No. 2, 1986, pp.



117-134.



 



20 House, Carol C., "Questionnaire Design with Computer-Assisted



Telephone Interviewing," Journal of 0fficial Statistics,  Vol. 1,



No. 2, 1985, pp. 209-219.



 



21  Lyberg, Lars, "Administration of Telephone Surveys," in Robert M.



Groves et al. (editors), Telephone Survey Methodology, New York,



Wiley Press, 1988.



 



22  Nelson, R. O., B. L. Peyton and B. Z. Bortner, "Use of an On-Line



Interactive System: Its Effects on the Speed, Accuracy, and Costs



of Survey Results." Presented at the 18th Advertising Research



Foundation Conference, New York City, 1972.



 



23  Netherlands Central Bureau of Statistics, Automation in Survey



Processing, Select Report 4, Voorburg, Netherlands, Central Bureau



of Statistics, 1987.



 



24 Nicholls II, W. L., "Computer-Assisted Telephone Interviewing: A



General Introduction," in Robert M. Groves et al. (editors),



Telephone Survey Methodology, New York, Wiley Press, 1988.



 



Nicholls II, William L., "CATI Research and Development at the



Census Bureau," Sociological Methods & Research, Vol. 12, No. 2,



1983, pp. 191-197.



 



26 Nicholls II, William L. and R. M. Groves, "The Status of



Computer-Assisted Telephone Interviewing: Part I -- Introduction



and Impact on Cost and Timeliness of Survey Data," Journal of



Official Statistics, Vol. 2, No. 2, 1986, pp. 93-115.



 



27 Palit, Charles and Harry Sharp, "Microcomputer-Assisted Telephone



Interviewing," Sociological Methods & Research, Vol. 12, No. 2,



1983, pp. 169-189.



 



 



 



 



                             151



 



28 Pallett, D.S. (editor), Automation of Data Capture for the



Census in the Year 2000.  Final report to the U.S. Bureau of the



Census from the National Bureau of Standards under the interagency



agreement for fiscal year 1987.



 



29 Ponikowski, Chester, and Sue Meily, "Use of Touchtone Recognition



Technology in Establishment Survey Data Collection." Preesented at



the First Annual Field Technologies Conference, St. Petersburg,



Florida, 1988.



 



30 Schuman, Howard and Stanley Presser.  Questions and Answers in



Attitude Surveys -- Experiments on Ouestion Form, Wording, and



Context.  Orlando, FL: Academic Press, Inc., 1981.



 



31  Shanks, J. Merrill, "The Current Status of Computer-Assisted



Telephone Interviewing," Sociological Methods & Research, Vol. 12,



No. 2, 1983, pp.1119-142.



 



32  Sharp, Harry, and Charles Palit, "Sample Administration with



CATI: The Wisconsin  Survey Research," Journal of Official



Statistics, Vol. 4, No. 4, 1988, pp. 401-413.



 



33  Sudman, Seymour, "Survey Research and Technological Change,"



Sociological Methods & Research, Vol. 12, No. 2, 1983, pp. 217-230.



 



34 Swann, T. C., "Electronic Data Collection in the Petroleum Supply



Reporting System."      Presented at the American Statistical



Association Committee on Energy Statistics, April 28-29, 1988.



 



35 Tortora, Robert D., "CATI in an Agricultural Statistical Agency,"



Journal of Official Statistics, Vol. 1, No. 2, 1986, pp. 301-314.



 



36 Tucker, Clyde, "Interviewer Effects in Telephone Surveys," Public



Opinion Quarterly, Vol. 47, 1984, pp. 84-95.



 



37  U.S. Department of Agriculture,  A Comparison of CATI and NonCATI            on a Nebraska Hoct Survey.  ' Statistical Reporting Service Staff



on a Nebraska Hog Survey.  Statistical Reporting Service Staff



Report No. 89, by Richard Coulter.  Washington, D.C.: Statistical



Reporting Service, May, 1985.



 



39  U.S. Department of Agriculture, Computer Assisted Telephone



Interviewing on the Cattle Multiple Frame Survey.  Statistical



Reporting Service Staff Report No. 82, by Carol C. House.



Washington, D.C.:" Statistical Reporting Service, October, 1984.



 



39 U.S. Department of Commerce, The Impact of High Technoloav on



Data Collection.  CATI Research Report N. GEN-1, by William L.



Nicholls II.     Washington, D.C.: U.S. Bureau of the Census,



February, 1989.



 



 



 



                                152



 



40 U. S. Department of Commerce, Proceedings of the Bureau of the



Census First Annual Research Conference, Washington, D.C.: Bureau



of the Census, 1985.  Cost and Error Modeling for Large-Scale



Surveys, by Robert M. Groves and James M. Lepkowski, pp. 330-357.



 



 



41 U.S. Department of Commerce, Proceedings of the Bureau of the



Census Third Annual Research Conference, Washington, D.C.: Bureau



of the Census, 1987.  Use of Historical Data in a Current Interview



Situation, by Bradley.V. Pafford and Dick Coulter, pp. 281-298.



 



42 U. S. Department of Commerce, Proceedings of the Bureau of the



Census Fourth Annual Research Conference, Washington, D.C.: Bureau



of the Census, 1988.     "Discussion" of papers by Rothschild and



Wilson and by Sebestik et al., by William L. Nicholls II, pp.



340-342.



 



43 U.S. Department of Commerce, Proceedings of the Bureau of the



Census Fourth Annual Research Conference, Washington, D.C.: Bureau



of the Census, 1988.  Initial Experiences with CAPI, by Jutta



Sebestik, Harvey Zelon, Dale DeWitt, James M. O'Reilly and Kevin



McGowan, pp. 357-365.



 



44 U.S. Department of Commerce, Proceedings of the Bureau of the



Census Fourth Annual Research Conference, Washington, D.C.: Bureau



of the Census, 1988.   Nationwide Food Consumiption Survey Using



Laptop Computers, by Beth B. Rothschild and Lucy B. Wilson, pp.



347-356.



 



45 U.S. Department  of Commerce, Proceedings of the Bureau of the



Census 1990 Annual Research Conference, Washington, D.C.: Bureau of



the Census, 1990.  Building Predictive Models of CAPI Acceptance in



a Field Interviewing Staff, by Mick Couper, Robert M. Groves and



Curtis A. Jacobs, forthcoming.



 



46 U.S. Department of Commerce, Proceedings of the Bureau of the



Census 1990 Annual Research Conference, Washington, D.C.: Bureau of



the Census, 1990.   The Impact Of Microcomputers on Survey



Processing at the Netherlands Central Bureau of Statistics, by



Wouter J. Keller and Jelke G. Bethlehem, forthcoming.



 



47 U.S. Department of Commerce, Proceedings of the Bureau of the



Census 1990 Annual Research Conference, Washington, D.C.: Bureau of



the Census, 1990.  Improvling Data Quality in National Surveys:



Experience with Computer-assisted Methods in the National Post-



secondary Student Aid Studieg, by James E. Smith and Carmen



Vincent, forthcoming.



 



48  U.S. Department of Labor, Voice Recognition and Voice Response



Applications for Data Collection in a Federal/State Establishment



Survey.  By Richard Clayton and Debbie Winter.  Washington, D.C.:



U.S. Bureau of Labor Statistics, 1989.



 



                             153



 



49 vanBastalaer, Alois, Frans Kessamakers and Dirk Sikkel, "Data



Collection with Hand-Held Computers: Contributions to Questionnaire



Design," Journal of Official Statistics, Vol. 4, No. 2, 1988, pp.



141-154.



 



50  Weeks, Michael F., "Call Scheduling with CATI:, Current



Capabilities and Methods," in Robert M. Groves at al. (editors),



Telephone Survey Methodology, New York, Wiley Press, 1988.



 



51  Werking, George, Alan Tupek and Richard Clayton, "CATI and



Touchtone Self-Response Applications for Establishment Surveys,"



Journal of Official Statistics, Vol. 4, No. 4, 1988, pp. 349-362.



 



 



 



 



 



 



 



 



                              154



 



                          DISCUSSION



 



                   William L. Nicholls II



                 U. S. Bureau of the Census



 



  Marc Tosiano's paper has a didactic purpose.  He presents



basic information about CATI and related topics as background for



a more technical paper on computer assisted survey information



collection (CASIC) to follow.  Since his paper is primarily a



condensation of summary articles on CATI and CAPI previously



prepared by others, it contains much that is familiar and little



that is original.  Rather than add another layer of commentary to



this well worked material, I will use the discussant's time to



counterpose the tone of technological and methodological optimism



which seems to characterize many papers of this conference with



some historical reality.  This also will be familiar material to



some readers, since it is based on the same sources as Tosiano's



paper.



 



 CATI and its associated  technologies  provide  many



opportunities to improve the timeliness and quality of survey data,



often at the same or lower cost per case (Catlin and Ingram, 1988;



Nicholls and Groves 1986; Groves and Nicholls 1986).  But those



increasingly documented benefits have not necessarily prompted



Federal data collection agencies to implement CATI expeditiously in



their major surveys or in ways that optimize those benefits.



 



 The first CATI survey was conducted by Chilton Research in



1971; and by 1980 CATI was in widespread use in commercial market



research and in university survey research (Nicholls, 1988).  But



even those Federal agencies moving most quickly, such as NASS, will



not fully implement CATI in the their major continuing surveys



before 1992.  That will be 21 years, or a full generation, after



CATI was invented.  For the Census Bureau's major household sur-



veys, such as the Current Population Survey and the National Crime



Survey, the earliest conceivable date for full CATI (and CAPI)



implementation is 1994, but slippage, say to 1996, seems increas-



ingly likely under current budgetary constraints.  That would



represent a quarter century after the first CATI survey in the



private sector.  Federal agencies have introduced CATI more quickly



into new and infrequently conducted surveys.  But why has it taken



so long to implement CATI for major, continuing Federal surveys?



 



 There are many reasons.   In the early 1970s, according to



Dillman and Tarnai (1988), the managers of most Federal surveys



regarded the telephone interview as a generally inferior data



collection method and were reluctant to try it.  Where a readiness



for change was present, the technology often was lacking.  CATI



software was initially designed for market research and was not



adequate for many government applications until enhanced by



university organizations with government support in the late 1970s.



 



                             155



 



When U.S. government agencies began active internal development of



CATI, around 1980, they often started with research programs to



assess its effects on costs, data quality, and estimates.  This



research still continues, although both the Census Bureau and



Statistics Canada produced major summaries of results in the late



1980s (U.S. Census Bureau, 1987; and Catlin and Ingram, 1988).  The



familiar delays of government planning, budgeting, and procurement



also undoubtedly played a role in delaying CATI implementation.



 



 CATI's extended incubation period in government also may be



partly explained by its initial association with two related



methodologies, random digit dialing (RDD) and centralized telephone



interviewing, which also are topics of this session.  Together,



RDD, centralized telephone interviewing, and CATI are sometimes



described as "modern telephone methods."  Their joint evolution was



described by Groves and Kahn in their influential 1979 volume



Surveys by Telephone as one of the major developments in the



history of survey methods, ranking with area probability sampling



and the use of computers for survey analysis.  By 1980, Berry and



O'Rourke (1988), among others, have noted that modern telephone



methods (RDD, centralized, and with CATI) had become the dominant



survey methods in U.S. commercial market research and in university



survey research centers.  Government agencies were the exception.



 



 "Modern telephone methods" did not transfer readily to govern-



ment data collection as a package.   This is most apparent for



random digit dialing, whose potential to reduce survey costs



attracted major interest among government statisticians (Biemer



al., 1985).  The National Center for Health Statistics, the Census



Bureau, and Statistics Canada all began their investigations of



modern telephone methods with years of careful testing of random



digit dialing (Marquis and Blass, 1985).  But, as Drew, Choudhry,



and Hunter (1988) have observed, RDD sampling methods are used in



few government surveys conducted in the U.S. or elsewhere in the



world.  The omission of nontelephone households (about 7 percent of



the U.S. total) and the typically higher refusal rates of cold



contact telephone interviews have presented major barriers to the



use of RDD in many or most government survey applications.



 



 Random digit dialing remains a valuable sampling method for



populations with high telephone subscribership such as Canada and



Sweden) and for surveys which can tolerate its coverage and



nonresponse problems.  For some governmental statistical agencies,



however, the early emphasis on RDD proved a diversion from what now



appear to be more fruitful uses of CATI.  Only when RDD was ruled



out as a sampling method for most U.S. government household



surveys, which at the Census Bureau occurred around 1986, could



plans to implement CATI in single-frame, mixed mode designs



proceed.  The somewhat faster adoption of CATI by establishment and



agricultural surveys may be partly attributable to their



traditional reliance on list frame samples.  A change to RDD was



not an issue.



 



                             156



 



 The second major element of modern telephone methods which has



not translated easily to government data collection is centralized



telephone interviewing.  In U.S. university and commercial market



research, the shift from "dispersed" local, interviewers making



calls from their own homes to "centralized" telephone interviewers



calling from large national or regional offices was largely



completed by the late 1970s.  Government household surveys are one



of the few major users of dispersed telephone interviewing



persisting into the 1980s.



 



  Mr. Tosiano's paper has reviewed the ways in which centralized



telephone interviewing and CATI can be mutually supporting method-



ologies.  Computer-assistance is easier to arrange for centralized



interviewers who share the same hardware, programs, sample, and



technical staff.   At the same time, CATI encourages centralized



interviewing to gain these efficiencies and to benefit from such



large-staff CATI features as automatic call scheduling,, online



supervision, and field report generation.  Centralization con-



tributes to standardized field procedures and interviewing quality



control through easier recruitment, training, and supervision of



interviewers, while CATI contributes to these same goals through



tailored question wordings, computer controlled branching, and



online editing.  Supervisory audio-visual monitoring of interviewer



performance, currently feasible only with centralized CATI inter-



viewers, provides feedback ensuring that CATI quality enhancement



features are appropriately used and that interviewers deviating



from performance standards are identified and retrained when



necessary.



 



  "Centralization" has a different meaning for government



establishment surveys than for government household surveys.



Because the establishment surveys typically began with mailed



questionnaire methods, later supplemented with telephone prompting



and interviews, they generally are conducted from offices.  The



choice typically is between national, regional, and state offices.



The introduction of CATI strengthens the arguments for greater



centralization.  The Census Bureau's Business Division is perhaps



unique among Federal agencies in withdrawing its Retail and Whole-



sale Trade Surveys from a set of regional offices to centralize



them in a national site before placing them on CATI.   More



commonly, existing organizational arrangements, Federal-State



agreements, and formal or informal commitments to employees have



resulted in continuation of state-based offices averaging about 10



interviewing stations per state but ranging from 2 to perhaps 30



stations (Nicholls 1988).  In national private sector survey and



market research organizations, CATI installations more typically



reach 45-100 stations.



 



   The introduction of CATI into mixed-mode personal-telephone



household surveys presents even greater organizational problems.



This is illustrated by the Census Bureau's plans to phase CATI into



the Current Population Survey (CPS) and the National Crime Survey



 



                              157



 



(NCS). Both surveys have a rotating panel design.  The first visit



to each  sample address is by personal visit to identify ineligible



housing units and to encourage household participation.  The fifth



CPS and NCS visits also are in person to re-establish personal



contact with the household part way through the sequence of                                    



interviews.  Other interviews are by telephone when possible and



acceptable to the respondent and by personal visit otherwise.  The



same local interviewers traditionally conduct both the personal and



telephone visit interviews, placing the telephone interview calls



from their own homes.



 



  When CATI is introduced into these surveys, no change is made



in the initial visits to each sample address.  These remain



personal visit interviews since comparable response and panel



retention rates have not been attainable with cold contact



telephone interviews (Marquis and Blass, 1985).   CATI replaces



dispersed telephone interviews from the local interviewers' homes



in the second and later visits of the panel design.   This field



design has several potential benefits: (1) reduced field costs; (2)



reduced interviewer recruitment problems in tight labor markets;



and (3) possibly improved survey estimates.  Nevertheless, the



transition poses a number of design and organizational problems



which require time and effort to resolve.



 



The first is developing appropriate methods for rapid but



controlled transfer of individual case records between personal



visit and CATI interview modes.  When the first visit personal



interview is complete, household enumeration data and field records



must be data entered into computer files for second and later



interviews by CATI.  Case records also move from CATI to the local



interviewers for CPS and NCS fifth visits and for personal followup



of households unreachable by CATI.



 



The second transition problem is the temporarily reduced



efficiency of the sample designs.  Both the CPS and NCS employ



cluster samples chosen initially to minimize costs for interviewing



assignments containing both personal visit and dispersed telephone



interviews.  When the dispersed telephone interviews are removed to



CATI, the remaining personal visit cases may no longer constitute



acceptable or efficient field assignments.  Since the CPS and NCS



samples are based on the decennial census, they are efficiently



revised only once a decade.



 



The third problem in moving dispersed telephone interviews to



CATI is the need to reduce the field staff while increasing the



CATI staff.   For the CPS, the Census Bureau's largest current



survey, the transition will be based initially on field interviewer



attrition and has been constrained by the rate at which attrition



occurs.



 



The fourth and final transition problem is finding a



sufficient volume of work for the CATI interviewers.  The CPS



 



                           158



 



conducts its interviews in the third week of each month and the NCS



in the first week with some carryover into the second.  Centralized



CATI interviewing is restricted to even fewer days per month to



permit field followup of cases unreachable by telephone.  These two



surveys will provide the CATI staff with relatively few days of



employment per month.



 



 Of the four transition problems, only the first derives from



the CATI technology. Case transfers between dispersed local



interviewers and centralized CATI interviewers are complicated by



the move between paper-and-pencil records and computer files.  The



problems of field sampling efficiency, field staff phase-down, and



insufficient work at the CATI facilities arise from the central-



ization of previously dispersed interviews.  They would be the same



whether the central facility used CATI or paper-and-pencil methods.



 



The most difficult problems of implementing CATI in government



agencies appear to derive from the organizational issues CATI



typically raises about centralized vs. decentralized interviewing.



 



References



 



Biemer, P., D.W. Chapman, and C. Alexander, "Some Research Issues



in Random-Digit Dialing Sampling and Estimation," Proceedings of



the Bureau of the Census First Annual Research Conference,



Washington, U.S. Department of Commerce, Bureau of the Census,



1985, pp. 71-86.



 



Berry, S. H. and D. O'Rourke, "Administrative Designs. for



Centralized Telephone Survey Centers: Implications of the Transfer



to CATI," in R. M. Groves et al. (editors)  Telephone Survey



Methodology, New York, Wiley Press, 1988, pp. 457-474.



 



Catlin, G. and S. Ingram, "The Effect of CATI on Cost and Data



Quality: A comparison of CATI and Paper Methods in Centralized



Interviewing," in R. M. Groves et al. (editors) Telephone Survey



Methodology, New York, Wiley Press, 1988, pp. 437-452.



 



Dillman, D. A. and J. Tarnai, "Administrative Issues in Mixed Mode



Surveys," in R. M. Groves et al. (editors), Telephone Survey



Methodology, New York, Wiley Press, 1988, pp. 509-528.



 



Drew, J. D., G. H. Choudhry, and L. A. Hunter, "Nonresponse Issues



in Government Telephone Surveys," in R. M. Groves et al. (editors)



Telenhone Survey Methodology, New York, Wiley Press, 1988, pp. 233-



246.



 



Groves, R. M. and R. L. Kahn, Surveys by Telephone, New York,



Academic Press, 1979.



 



 



 



                           159



 



                                                            



 



 



 



Groves, R. M. and W. L. Nicholls II, "The Status of Computer-



Assisted Telephone Interviewing: Part II -- Data Quality Issues,"



Journal of Official Statistics, Vol. 2, No. 2, 19860 pp.,117-134.



 



Marquis, K. and R. Blass, "Nonsampling Error Considerations in the



Design and Operation of Telephone Surveys," Proceedings of the



Bureau of the Census First Annual Research Conference, Washington,



U.S. Department of Commerce, Bureau of the Census, 1985, pp. 301-



329.



 



Nicholls II, W. L., "The Impact of High Technology on Survey Data



Collection,"  CATI Research Report GEN-1, U.S. Department of



Commerce, Bureau of the Census, February 1989.



 



Nicholls II, W.L. and Groves, R.M., "The Status of Computer-



Assisted Telephone Interviewing: Part I -- Introduction and Impact



on Cost and Timeliness of Survey Data," Journal of Official



Statistics, Vol. 2, No. 2, 1986, pp. 93-115.



 



U.S. Census Bureau, Evaluation of CATI Data Ouality and Costs in



the Current Population Survey, CATI Research Report No. CPS-2 of



the Computer-Assisted Interviewing Central Planning Committee,



CATI Research and Analysis Subcommittee, September 1988.



 



 



 



 



 



 



 



 



                             160



 



                        DISCUSSION



 



                      James T. Massey



          National Center for Health Statistics



 



 



 The paper by Leyla Mohadjer and David Morganstein enumerates



and provides a brief overview of almost all of the key



methodological issues related to telephone surveys.  The concept of



total survey design mentioned in the first section of the paper is



an excellent way to compare and summarize the advantages and



disadvantages of telephone surveys versus other modes of data



collection.   The total survey design concept was never fully



developed to compare the different modes of data collection.  Most



of this paper focused on the operational and sample design



efficiencies of telephone surveys to improve data quality.



 



The advantages and disadvantages of telephone surveys given by



Mohadjer and Morganstein are listed below along with several



additional ones:



 



 



Advantages of Telephone Surveys



 



-   Lower cost



 



-   Better quality control and supervision of interviewers



 



-   Better access to some hard to reach persons



 



-   Smaller design effects



 



-   smaller interviewer effects



 



-   Cost effective method to sample rare population (use of



Donnally tape with characteristics of persons in prefix



area)



 



-   Use of CATI to control flow of sample, interview, edits,



and processing



 



-   Local area surveys from central location



 



-   Better use of bilingual interviewers



 



Disadvantages of Telephone Surveys



 



-  Lack of visual aids



 



-  No group interviews



 



 



                          161



 



                                                              



 



 



 



-   Noncoverage of persons without telephones



 



-   Lower response rates



 



-   Cost of dual frame surveys



 



-   Cost of CATI (relative to other telephone surveys)



 



 



Now I would like to turn my attention to where we are in the



development and use of telephone surveys and some areas that still



need research.



 



I see six reasons for the emergence of telephone surveys:



 



1) Better coverage



 



2) Development of CATI which lead to development of CAPI



 



3) Development of better RDD methods



 



4) Higher costs to face-to-face surveys



 



5) Slow dea h of the myth of the length of a telephone



 interview



 



6) Recognition of data quality equal to face-to-face surveys



 



 Considerable progress has been made over the past 15 years in



almost every aspect of telephone surveys including data quality.



There are, however, several areas where progress has been limited



and more research is needed.  These are listed below.



 



1) Techniques to improve response rates:  While response



 rates have improved, there is still much that could be



 done to adapt procedures in the face-to-face surveys to



 telephone surveys.  I just reviewed a paper that used



 several inducement techniques to dramatically improve



 telephone survey response rates in another country.



 



2) Validation of data collected by telephone versus face-to-



 face surveys: Most comparative studies have assumed that



 higher levels of reporting is better.  For some types of



 data this assumption is questionable and additional



 statistical validation studies are needed.



 



3) Research on the collection of sensitive data and other,



 specific types gf informatign:  We should take full



 advantage of one of the key features of telephone



 interviewing, the autonomy and anonymity of the



 interview.  Some research has been done that showed



 sensitive data and questions have socially desirable



 



                              162



 



                                                          



 



 



 



  responses are obtained better over the telephone.  There



  is some recent unpublished data that indicates that



  smoking habits, crimes, and unemployment may have higher



  reporting over the telephone.  These results should be



  published and validated.



 



4)  Research on difficult questions and questions with



  multiple response:  Questions requiring flashcards and



  scaled responses are still more problematic over the



  telephone.  CATI does offer a way to randomize the order



  of responses.



 



5)  Research on better and cheaper ways to correct for



  noncoverage.



 



  Finally, I would like to make two other observations.  In 1984



when the OMB Working Paper 12 on Telephone Data Collection was



published, telephone surveys were primarily used in the Federal



government to conduct follow-up surveys and follow-up interviews.



Most initial contact surveys by telephones used list frames.  This



is still the case today for almost all of the large government



surveys, although greater use of telephone interviewing is being



made.



 



  For those of you who are new to the study of telephone survey,



I recommend you start with the book Telephone Survey Methodology.



It has several state-of-the-art review papers and has an extensive



bibliography.  The paper Owen Thornberry and I wrote contains as



many reference tables on telephone coverage as Bob Groves would



allow us to include.  I hope many of you will continue to conduct



research on telephone surveys and extend our knowledge of this very



valuable data collection method.



 



 



 



 



 



 



 



 



                             163



 



                            164



 



 



 



 



1 For a copy of the latest version, write to Dr. Daniel



Kasprzyk, Chief, SIPP Research and Coordination Staff, Office of



the Director, Bureau of the Census, Washington DC 20233.



 



2 By Fritz Scheuren, Director, Statistics of Income Division



(R:S), Internal Revenue Service.  Based, in part, on a Discussion



of "Rolling Samples and Censuses," by Leslie Kish, to appear in the



June 1990 issue of Survey Methodology.  The views expressed in this



paper are those of the author and do not necessarily represent the



position of the Internal Revenue Service.



 



3 This is a summary of a longer paper that was prepared for



the Seminar on Quality of Federal Data.



 



4  A recent bibliography of the staff papers and reports



prepared on the use of administrative records for social data in



the Small Area and Administrative Data Division was recently com-



pleted. (Statistics Canada, 1990)



 



5  Unmarried persons who (a) declare themselves to be



single, (b) are under the age of 30, (c) reside with their



parents and (d) file a tax return are defined to be "filing



children".



 



6  To minimize the T1FF data processing costs, most of the



T1FF data in this paper are based on samples.



 



7  In processing the 1986 tax file, a somewhat earlier



file was used than in other years.  As a result, the coverage was



lower than in other years.  Had this not occurred, the coverage



in 1986 would have been higher than 93.7%.



 



8  The Tl does contain some information on dependent



children, namely, relationship to taxfiler and birthdate.  This



information is not, however, captured.



 



9  The taxfiling rate for the 65+ population increased



from 60% in 1985 to 75% in 1987.



 



10  The SCF is an annual supplement to the Canadian Labour



Force Survey.  The SCF is similar to the March supplement to the



Current Population Survey (CPS) in the United States.



 



11  This table was adapted from Vigder and Leyes (1989).



 



12  These remarks are attributable to the author and do not



necessarily represent the views of the Census Bureau.



 



13 This paper is a condensation of Survey Coverage, Statistical



Policy Working Paper 17.  Authors are listed in the second



paragraph.  The views expressed are those of the authors and do not



necessarily reflect those of their agencies.



 



 



                 Reports Available in the



                    Statistical Policy



                   Working Paper series



 



 



1. Report on Statistics for Allocation of Funds (Available



  through NTIS Document Sales, PB86-211521/AS)



2. Report on Statistical Disclosure and Disclosure-Avoidance



  Techniques (NTIS Document Sales, PB86-211539/AS)



3. An Error Profile:    Employment as Measured by the Current



  Population Survey (NTIS Document Sales PB86-214269/AS)



4. Glossary of Nonsampling Error Terms: An Illustration of a



  Semantic Problem in Statistics (NTIS Document Sales, PB86-



  211547/AS)



5. Report on  Exact and Statistical Matching Techniques (NTIS



  Document Sales, PB86-215829/AS)



6. Report on Statistical Uses of Administrative Records (KTIS



  Document Sales, PB86-214285/AS)



7. An Interagency Review of time-Series Revision Policies (NTIS



  Document Sales, PB86-232451/AS)



8. Statistical Interagency Agreements (NTIS Documents Sales,



  PB$6-230570/AS)



9. Contracting for Surveys (NTIS Documents Sales, PB83-233148)



10. Approaches to Developing Questionnaires (NTIS Document



  Sales, PB84-105055/AS)



11. A Review of Industry Coding Systems (NTIS Document Sales,



  PB84-135276)



12. The Role of Telephone Data Collection in Federal Statistics



  (NTIS Document Sales, PB85-105971)



13. Federal Longitudinal Surveys (NTIS Documents Sales, PB86-



  139730)



14. Workshop on Statistical trees of Microcomputers in Federal



  Agencies (NTIS Document Sales, PB87-166393)



15. Quality in Establishment Surveys (NTIS Document Sales, PB88-



  232921)



16. A Comparative Study of Reporting Units in Selected Employer



  Data Systems (NTIS Document Sales, PB-90-205238)



17. Survey Coverage (NTIS Document Sales, PB90-205246)



18. Data Editing in Federal Statistical Agencies (NTIS Document



  Sales, PB90-205253)



19. Computer Assisted Survey Information Collection (NTIS



  Document Sales, PB90-205261)



20. Seminar on the Quality of Federal Data (NTIS Document Sales,



  PB91-142414)



 



Copies of these working papers may be ordered from NTIS Document



Sales, 5285 Port Royal Road, Springfield, VA 22161 (703) 487-4650



 



 



 

(wp20a.html)

ARROW UP

 


Page Last Modified: April 20, 2007 FCSM Home
Methodology Reports