Federal Committee on Statistical Methodology
Office of Management and Budget
FCSM Home ^
Methodology Reports ^

 

  Statistical Policy Working Paper 20 - Seminar on Quality of Federal Data - Part 3 of 3


Click HERE for graphic.                          

 

 

                          Statistical Policy

                           Working Paper 20





                  Seminar on Quality of Federal Data





                              Part 3 of 3





             Federal Committee on Statistical Methodology



 



                      Statistical Policy Office



           Office of Information and Regulatory Affairs



                    Office of Management and Budget



 



                           March 1991



 



                MEMBERS OF THE FEDERAL COMMITTEE ON 



                       STATISTICAL METHODOLOGY



                          (February 1991)



 



                    Maria E. Gonzalez, Chair



                 office of Management and Budget



 



Yvonne M. Bishop                  Daniel Kasprzyk



Energy Information                Bureau of the Census



  Administration



                                  Daniel Melnick



Warren L. Buckler                 National Science Foundation



Social Security Administration



                                  Robert P. Parker



Charles E. Caudill                Bureau of Economic Analysis



National Agricultural



  Statistics Service              David A. Pierce



                                  Federal Reserve Board



Cynthia Z.F. Clark



National Agricultural             Thomas J. Plewes



  Statistics Service              Bureau of Labor Statistics



 



Zahava D. Doering                 Wesley L. Schaible



Smithsonian Institution           Bureau of Labor Statistics



 



Robert M. Groves                  Fritz J. Scheuren



Bureau of the Census              Internal Revenue Service



 



Roger A. Herriot                  Monroe G. Sirken



National Center for               National Center for



  Education Statistics              Health Statistics



 



C. Terry Ireland                  Robert D. Tortora



National Computer Security        Bureau of the Census



  Center



 



Charles D. Jones



Bureau of the Census



 



                            PREFACE



 



In 1975, the Office of Management and Budget (OMB) organized the



Federal Committee on Statistical Methodology.  Comprised of



individuals selected by OMB for their expertise and interest in



statistical methods, the committee has during the past 15 years



determined areas that merit investigation and discussion, and



overseen the, work of subcommittees organized to study particular



issues.  Since 1978, 19 Statistical Policy Working Papers have been



published under the auspices of the Committee.



 



On May 23-24, 1990, the Council of Professional Associations on



Federal Statistics (COPAFS) hosted a "Seminar on the Quality of



Federal Data."  Developed to capitalize on work undertaken during



the past dozen years by the Federal Committee on Statistical



Methodology and its subcommittees, the seminar focused on a variety



of topics that have been explored thus far in the Statistical



Policy Working Paper series.  The subjects covered at the seminar



included:



 



     Survey Quality Profiles



     Paradigm Shifts Using Administrative Records



     Survey Coverage Evaluation



     Telephone Data Collection



     Data Editing



     Computer Assisted Statistical Surveys



     Quality in Business Surveys



     Cognitive Laboratories



     Employer Reporting Unit Match Study



     Approaches to Developing Questionnaires



     Statistical Disclosure-Avoidance



     Federal Longitudinal Surveys



 



Each of these topics was presented in a two-hour session that



featured formal papers and discussion, followed by informal



dialogue among all speakers and Attendees.



 



Statistical Policy Working Paper 20, published in three parts,



presents the proceedings of the "Seminar on the Quality of Federal



Data."  In addition to providing the papers and formal discussions



from each of the twelve sessions, this working paper includes



Robert M. Groves' keynote address, "Towards Quality in a Working



Paper Series on Quality," and comments by Stephen E. Fienberg,



Margaret E. Martin, and Hermann Habermann at the closing session,



"Towards an Agenda for the Future."



 



We are indebted to all of our colleagues who assisted in organizing



the seminar, and to the many individuals who not only presented



papers and discussions but also prepared these materials for



publication.  A special thanks is due to Terry Ireland and his



staff for their work in assembling this working paper.



 



 



                    Table of Contents



 



                    Wednesday, May 23, 1990



 



 



                            Part 1



 



 



                        KEYNOTE ADDRESS



 



 



TOWARDS QUALITY IN A WORKING PAPER SERIES ON QUALITY . . . . . . . . 3



   Robert M. Groves, The University of Michigan and U. S.



   Bureau of the Census



 



 



 



        Session 1 - SURVEY QUALITY PROFILES



 



 



 



THE SIPP QUALITY PROFILE . . . . . ... . . . . . . . . . . . . . . . 19



   Thomas B. Jabine, Statistical Consultant



 



INITIAL REPORT ON THE QUALITY OF AGRICULTURAL SURVEY PROGRAM. . . .  29



   George A. Hanuschak, National Agricultural Statistics



   service



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40



   Barbara A. Bailar, American Statistical Association



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46



   Nancy A. Mathiowetz, U. S. Bureau of the Census



 



 



 



      Session 2 - PARADIGM SHIFTS USING ADMINISTRATIVE



 



                              RECORDS



 



 



 



PARADIGM SHIFTS:  ADMINISTRATIVE RECORDS AND CENSUS-TAKING . . . . . 53



   Fritz Scheuren, Internal Revenue Service



 



AN ADMINISTRATIVE RECORD PARADIGM:  A CANADIAN EXPERIENCE. . . . . . 66



   John Leyes, Statistics Canada



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . . . 77



    Gerald Gates, U.S. Bureau of the Census



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83



     Edward J. Spar, Market Statistics                             



 



 



 



                  Session 3 - SURVEY COVERAGE EVALUATION



 



 



 



 



CONTROL MEASUREMENT, AND IMPROVEMENT OF SURVEY COVERAGE . . . . .  87



     Gary M. Shapiro, U. S. Bureau of the Census; Raymond R.



     Bosecker, National Agricultural Statistics Service



 



QUALITY OF SURVEY FRAMES . . . . . . . . . . . . . . . . . . . . . 100



     Judith T. Lessler, Research Triangle Institute



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108



     Fritz Scheuren, Internal Revenue Service



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114



     Joseph Waksberg, Westat, Inc.



 



 



 



                  Session 4 - TELEPHONE DATA COLLECTION



 



 



 



 



QUALITY IMPROVEMENT IN TELEPHONE SURVEYS . . . . . . . . . . . . . 123



     Leyla Mohadjer, David Morganstein, Westat, Inc.



 



COMPUTER  ASSISTED SURVEY TECHNOLOGIES IN GOVERNMENT:



     AN OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . 137



     Marc Tosiano, National Agricultural Statistics Service



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . . . .155



     William L. Nicholls II, U. S. Bureau of the Census



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . . . .161



     James T. Massey National Center Health Statistics



 



 



 



 



 



 



 



 



                                 iv



 



                              Part 2



 



 



 



                       Session 5 - DATA EDITING



 



 



 



OVERVIEW OF DATA EDITING IN FEDERAL STATISTICAL AGENCIES . . . . . .167



      David A. Pierce, Federal Reserve Board



 



EDITING SOFTWARE (An excerpt from Chapter IV of Working



    Paper 18) . . . . . . . . . . . . . . . . . . . . . . . . . . . 173



      Mark Pierzchala, National Agricultural Statistics



      Service



 



RESEARCH ON EDITING . . . . . . . . . . . . . . . . . . . . . . . . 180



      Yahia Ahmed, Internal Revenue Service 



 



DISCUSSION . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 184



      Charles E. Caudill, National Agricultural Statistics



      Service



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . .186



      Richard Bolstein, George Mason University



 



 



 



               Session 6 - COMPUTER ASSISTED STATISTICAL



 



                                 SURVEYS



 



 



 



OVERVIEW OF COMPUTER ASSISTED SURVEY INFORMATION COLLECTION . . . . .191



      Richard L. Clayton, U. S. Bureau of Labor Statistics



 



A COMPARISON BETWEEN CATI AND CAPI . . . . . . . . . . . . . . . . . 197



      Martin Baum, National Center for Health Statistics



 



COMPUTER ASSISTED SELF INTERVIEWING . . . . . . . . . . . . . . . . .202



      Ralph Gillmann, Energy Information Administration 



 



COMPUTER ASSISTED SELF INTERVIEWING:  RIGS AND PEDRO,



      TWO EXAMPLES. . . . . . . . . . . . . . . . . . . . . . . . . .205



      Ann M. Ducca, Energy Information Administration



 



DATA COLLECTION . . . . . ... . . . . . . . . . . . . . . . . . . . .209



      Cathy Mazur, National Agricultural Statistics Service



 



                                     v



 



 DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . . . .212



     Robert N. Tinari, U. S. Bureau of the Census



 



 



 DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . .216



      David Morganstein, Westat, Inc.



 



 



 



                          Thursday, May 24, 1990



 



 



                  Session 7 - QUALITY IN BUSINESS SURVEYS



 



 



 



 IMPROVING ESTABLISHMENT SURVEYS AT THE BUREAU OF LABOR



       STATISTICS . . . . . . . . . . . . . . . . . . . . . . . . . .221



       Brian MacDonald, Alan R. Tupek, U. S. Bureau of Labor



       Statistics



 



 A REVIEW OF NONSAMPLING ERRORS IN FEDERAL ESTABLISHMENT



 SURVEYS WITH SOME AGRIBUSINESS EXAMPLES . . . . . . . . . . . . . . 232



       Ron Fecso, National Agricultural Statistics Service



 



 DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . .243



     David A. Binder, Statistics Canada



 



 DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247



       Charles D. Cowan, Opinion Research Corporation



 



 



                    Session 8 - COGNITIVE LABORATORIES



 



 



 



 THE BUREAU OF LABOR STATISTICS' COLLECTION PROCEDURES



 RESEARCH LABORATORY:  ACCOMPLISHMENTS AND FUTURE DIRECTIONS . . . . 253



       Cathryn S. Dippo, Douglas Herrmann, U. S. Bureau of Labor



       Statistics



 



 THE ROLE OF A COGNITIVE LABORATORY IN A STATISTICAL AGENCY . . . . .268



       Monroe G. Sirken, National Center for Health Statistics



 



 DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278



       Elizabeth Martin U. S. Bureau of the Census



 



 DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . .281



       Murray Aborn, National Science Foundation (retired)



 



                                     vi



 



                                  Part 3



 



              Session 9 - EMPLOYER REPORTING UNIT MATCH



                                   STUDY



 



 



INTERAGENCY AGREEMENTS FOR MICRODATA ACCESS:



     THE ERUMS EXPERIENCE . . . . . . . . . . . . . . . . . . . . . .291



     Thomas B. Petska, Internal Revenue Service; Lois



     Alexander, Social Security Administration



 



SAMPLE SELECTION AND MATCHING PROCEDURES USED IN ERUMS . . . . . . . 301



     John Pinkos, Kenneth LeVasseur, Marlene Einstein,



     U. S. Bureau of Labor Statistics; Joel Packman, Social



     Security Administration



 



RESULTS, FINDINGS AND RECOMMENDATIONS OF THE ERUMS PROJECT . . . . . 309



     Vern Renshaw, Bureau of Economic Analysis; Tom Jabine,



     Statistical Consultant



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318



     W. Joel Richardson, Charles A. Waite, U. S. Bureau of the



     Census



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324



     Thomas J. Plewes, U. S. Bureau of Labor Statistics



 



 



                   Session 10 - APPROACHES TO DEVELOPING



                               QUESTIONNAIRES



 



TOOLS FOR USE IN DEVELOPING QUESTIONS AND TESTING



     QUESTIONNAIRES . . . . . . . . . . . . . . . . . . . . . . . . .331



     Theresa J. DeMaio, U. S. Bureau of the Census



 



TECHNIQUES FOR EVALUATING THE QUESTIONNAIRE DRAFT . . . . . . . . . .340



     Deborah H. Bercini, National Center for Health Statistics



 



DESIGNING QUESTIONNAIRES FOR CATI IN A MIXED MODE



     ENVIRONMENT. . . . . . . . . . . . . . . . . . . . . . . . . . .349



     Gemma Furno, U. S. Bureau of the Census



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360



     Carol C. House, National Agricultural Statistics Service



 



                                      vii



 



             Session 1 1 - STATISTICAL DISCLOSURE - AVOIDANCE



 



 



 



DISCLOSURE AVOIDANCE PRACTICES AT THE CENSUS BUREAU . . . . . . . . .367



      Brian Greenberg, U. S. Bureau of the Census



 



THE MICRODATA RELEASE PROGRAM OF THE NATIONAL CENTER



FOR HEALTH STATISTICS . . . . . . . . . . . . . . . . . . . . . . . .377



      Robert H. Mugge, National Center for Health Statistics



      (retired)



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385



      George T. Duncan, Carnegie Mellon University



 



 



                   Session 12 - FEDERAL LONGITUDINAL SURVEYS



 



 



 



FEDERAL LONGITUDINAL SURVEYS . . . . . . . . . . . . . . . . . . . . 393



      Daniel Kasprzyk, U. S. Bureau of the Census; Curtis



      Jacobs, U. S. Bureau of Labor Statistics



 



THE ADVANTAGES AND DISADVANTAGES OF LONGITUDINAL SURVEYS . . . . . . 407



      Robert W. Pearson, Social Science Research Council



 



LONGITUDINAL ANALYSIS OF FEDERAL SURVEY DATA . . . . . . . . . . . . 425



      Patricia Ruggles, Joint Economic Committee



 



DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438



      Michael Brick, Westat, Inc.



 



DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .447



      Marilyn E. Manser, U. S. Bureau of Labor Statistics



 



 



                         TOWARDS AN AGENDA FOR THE FUTURE



 



 



 



Stephen E. Fienberg, Carnegie Mellon University . . . . . . . . . . .455



 



Margaret E. Martin . . . . . . . . . . . . . . . . . . . . . . . . . 462



 



Hermann Habermann, Office of Management and Budget . . . . . . . . . 465



 



                                          viii



 



                               Part 3



 



                               Session 9



 



                           EMPLOYER REPORTING



                            UNIT MATCH STUDY



 



 



 



 



 



 



 



 



                                   289



 



 



                                 290



 



 



         INTERAGENCY AGREEMENTS FOR MICRODATA ACCESS:



                       THE ERUMS EXPERIENCE



 



                        Thomas B. Petska



                    Internal Revenue Service



 



                          Lois Alexander



                 Social Security Administration



 



 



    The Employer Reporting Unit Match Study (ERUMS) was a pilot



record linkage study carried out under the auspices of the Federal



Committee on Statistical Methodology of the Office of Management



and Budget.  The study linked records of employers and their



reporting units from three agencies: the Bureau of Labor Statistics



(BLS), the Social Security Administration (SSA) and the Internal



Revenue Service (IRS).  The primary linkages involved samples of



the agencies' records for employers in the State of Texas covering



their-activities in 1982.



 



    For the ERUMS Workgroup to gain access to the data sets needed



for the study, arrangements had to be developed that would comply



with the confidentiality provisions and statutes of the Federal and



State agencies that controlled these data sets.  This paper gives



an overview of these arrangements and agreements.  In the first



section, background information on the statistical content and



confidentiality provisions of each of the data sets is provided.



In the second section, the actual arrangements for the release of



confidential microdata are described.  The last section provides a



summary of what we have learned about such data sharing



arrangements.



 



 



Background Information



 



     The goal of ERUMS was to demonstrate the feasibility of



matching employer and reporting unit data from different agency



record systems as a means of obtaining more precise information



about the coverage and content of the data in those systems.  A



purpose was to examine and I evaluate differences in wage and



employment data at the state and county level as reported to those



agencies.  Despite the many difficulties encountered in



establishing the data access agreements, ERUMS demonstrated that



data such sharing Projects can be successful under current laws.



 



 



1.  Data Sets



 



     The ERUMS study was a three-way data linkage study in which



individual microdata records from BLS, SSA, and IRS were matched by



Employer Identification Number (EIN).



 



 



                                  291



 



 a.  BLS provided a 1982 Unemployment Insurance (UI) Address



      File, which, for each state, consists of data for



      individual employers and their reporting units, which are



      often equivalent to "establishments".  The data for this



      file are submitted to BLS by the State employment



      security agencies that operate the Federal-State UI



      Program.  BLS uses the data submitted by the states as a



      basis for statistical reports on employment and wages and



      uses the UI Address File as a national sampling frame



      for its establishment surveys.



 



  b.  SSA provided an edited file of Form W-3 annual reports



      for 1982 and the Single Unit and Multi-Unit Code Files.



      The Form W-3 file provided data on individual employers



      and, in some cases, for each of their reporting units,



      which are frequently equivalent to establishments.  The



      Single Unit Code File contains a record for most entities



      that have filed an application for an Employer



      Identification Number.  The Multi-Unit Code File contains



      a record for each reporting Unit of multi-unit employers



      who are participating in the Establishment Reporting



      Plan, a voluntary program under which employers report



      wage information on Form W-3 separately for each of their



      reporting units.



 



  c.  IRS data used for ERUMS were from a Census-edited file



      based on Forms 941 and 943 for Tax Years 1981-83.  These



      forms are used by employers to report each quarter



      (annually for Form 943) to IRS on income taxes withheld



      from wages and other payments to employees and on taxes



      under the Federal Insurance Contributions Act (FICA)



      under the Social Security system.  Extracts of data from



      these forms are provided annually by IRS to the Census



      Bureau for use in the latter's County Business Patterns



      Program and other statistical programs.  The Census



      Bureau edits the files, particularly the industry codes,



      and imputes certain missing data.  This file was made



      available to the IRS Statistics of Income (SOI) Division



      for use in its business employment and payroll studies



      and was used for ERUMS.  In addition, copies of Form 940,



      Federal Unemployment Tax Return, were obtained for a



      substantial proportion of the ERUMS sample cases.



 



 



 



2.  Data Sharing Issues



 



      For the ERUMS Workgroup to gain access to the data sets needed



for the study, it was necessary to develop working arrangements



that complied with the provisions of confidentiality statutes,



regulations, and policies of the Federal and State agencies that



controlled these data sets.



 



 



                                 292



 



    Although interagency exchange of identifiable microdata was



the key to ERUMS, such data sharing is restricted by Federal



confidentiality laws which generally permit agencies to disclose



statistical information only in summary or other unidentifiable



form.  Since ERUMS was designed to link and compare information



about individual employers collected separately by the different



agencies, the Workgroup had to develop and implement lawful methods



of transferring data on identifiable business units among the



participants.  A related task was to minimize the disclosure of



identifiers in making those transfers and linkages.



 



     The Workgroup was particularly interested in the different



ways an employer may report establishment or multi-unit enterprise



data to various State and Federal agencies.  To examine these



differences, the Workgroup needed to compare employers' reports to



the BLS State UI programs, the SSA FICA reporting, and the IRS



employment tax returns.  Members Of the Workgroup included



employees of these agencies, plus employees of the Bureau of



Economic Analysis, Office of Management and Budget, the Bureau of



the Census, and the Committee on National Statistics of the



National Academy of Sciences.



 



     The Workgroup planned to analyze the information that



corresponded to each EIN as it was reported to each agency.  The



analysis and findings would be entirely statistical in nature with



no reference to the individual (identifiable) cases.



Nevertheless, the planning, processing, and analysis phases each



required access to identifiable data.



 



3. Confidentiality of Federal and State Tax Records



 



     In the ERUMS study, the Employer, Identification Number (EIN)



was the identifier that was common to all the reporting systems.



It was used to define the sample drawn by BLS and was used as the



basis for retrieving, linking and comparing records containing



information from the SSA and IRS files.  By law, the EIN is a tax



identification number, and even when standing alone is protected by



Internal Revenue Code confidentiality restrictions.



 



     ERUMS required access to data from W-3 records which by law



are Federal tax records that are processed and maintained at SSA in



conjunction with the computation of Social Security retirement



benefits.  Since these are tax records, it was necessary to satisfy



IRS that the selection by SSA of sample cases, SSA's disclosure of



W-3 data to BLS, and the use of employer data by other members of



the Workgroup met the requirements of the Internal Revenue Code



dealing with disclosure of tax information. (See No. 4 below.)



 



     BLS selected Texas as the State whose records it would sample,



and it obtained written permission from the Texas State Employment



Security Agency to use their UI records in the project.  The Texas



 



                                   293



 



Unemployment Compensation Act requires Texas employers to maintain



records and file reports to the Texas Employment Commission with



detailed information about the business operations and the number



and compensation of employees.  Texas law prohibits disclosure



except for administering the Act, and it makes improper disclosure



punishable by fines or imprisonment.



 



 



4. Other Confidentiality Considerations



 



      Since the Workgroup was composed of employees from several



agencies and organizations, confidentiality laws did not apply to



them uniformly.  In varying degrees, certain laws, regulations, and



policies affected each agency's access to identifiable records from



particular sources and provided differential access to various



individuals in the Workgroup.  A recurring theme was the necessity



at each phase of the process to identify the persons who needed to



use identifiable data and to ensure that no others had access at



that time.



 



      Besides affidavits and other written procedures to protect the



confidentiality of records, certain technical safeguards were



adopted to minimize disclosure risk.  The first of these methods



was to avoid identifying sample cases by EIN to persons who



performed processing in the participating agencies but were not



directly associated with the Workgroup.  This method was adopted to



conform to, the Internal Revenue Code requirements for tax



information under the agreement BLS had with the State of Texas.



At BLS this led to a decision not to process the data on the



mainframe computer system at the Department of Labor that is



operated by a private contractor.  Instead, BLS used a mini-



computer which was accessible only to BLS employees who were



members of the Workgroup.



 



      State agencies periodically submit to BLS UI address files



that compile identification data for all reporting units at the



most-detailed level that is available from employers' reports.  BLS



compiles these reports under a pledge of confidentiality that



allows the data to be used only by authorized persons for



statistical purposes.



 



      Once BLS selected the Texas sample, it had to create a finder



list so that SSA could extract corresponding records from its W-3



and related files for employers in the sample.  The technical staff



who performed these operations at SSA have routine access in their



usual jobs to the employer records maintained at SSA.  However,



they did not need to know which of the employers' records comprised



the sample selected by BLS from the Texas UI file.  To avoid



identifying those cases that were actually in sample,



furnished SSA with a listing of 7 of the 9 digits of sample EINS.



SSA staff then extracted records from the W-3 and related files for



all records in which these 7 digits appeared without knowing which



 



                               294



 



 



employers were actually in the BLS sample.  This procedure



effectively masked the identities of sample cases derived from



State UI files, and thus significantly limited the number of SSA



employees who were required to sign BLS non-disclosure affidavits.



 



 



Agreements for Interagency Data Sharing



 



    Access by the Workgroup to the data sets needed for the study



was accomplished through three interagency agreements plus an



additional access arrangement.



 



     The Workgroup had originally planned a tripartite arrangement



through interagency agreements of SSA and BLS with IRS.  However,



IRS counsel raised objections that such a multi-party agreement



would be unduly cumbersome, and approval would probably not be



forthcoming.  As an alternative, IRS proposed to contract



exclusively with BLS for the performance by BLS of services that



required access to tax data.  SSA staff would be designated as



special agents of BLS to process the data.  Bilateral BLS/IRS and



BLS/SSA agreements would also have to be drafted under this



arrangement.



 



     The drafting of these arrangements proved to be a delicate



task.  By law, the purposes of IRS participation in the project and



its service contract with BLS had to be related to IRS



administration of the tax laws.  Section 6103(n) of the Internal



Revenue Code (IRC) allows IRS to disclose tax return information to



persons outside of the agency as long as it is for purposes of tax



administration [1].  Specifically, this purpose is to conduct



statistical studies based on return information, which Section



6108(a) of the IRC authorizes IRS to perform [2].  A case was made



that the ERUMS study was one such purpose



 



 



1. BLS and Texas Agreement



 



     BLS has cooperative agreements with 50 State Employment



Security Agencies to use employment statistics collected by the



states for its labor economics research.  The 1982 data used in the



ERUMS study was furnished to BLS in its ES-202 program by the Texas



State Employment Commission under a cooperative agreement.  It was



necessary for BLS to obtain authorization from the State Commission



to use the microdata for the ERUMS study and to provide access for



the Workgroup members.  Under this cooperative agreement, the



access and use of the data were subject to the confidentiality



requirements of the Texas Employment Compensation statute as well



as those set out in the BLS Commissioner's Order No. 2-80.



 



     Each UI program is operated under state law that must conform



to certain minimum federal standards, with reports that enable BLS



to monitor state compliance.  Under the Texas program, each



 



                                  295



 



employing unit is required to file (and update periodically) a



status report with the Texas Employment Commission, describing the



type of ownership, location, and nature of business.  On a



quarterly basis, employers are required to file detailed reports on



wages and contributions.  Multi-Unit employers are asked to file a



voluntary statistical supplement that provides detailed employment,



wage, and contribution reports for each establishment.  The ES-202



reports are compiled by BLS and form the basis for the UI Address



file that BLS maintains.  This is a micro-level employer file that



contains first quarter information for each reporting unit, and the



1982 file provided the Texas sampling frame for the ERUMS sample.



 



    The confidentiality of statistical data collected under the



cooperative agreement is protected by interrelated state and



federal procedures.  At the state level, these UI reports are



collected under the Texas Unemployment Compensation Act which



limits the availability of its UI reports to public employees in



the performance of public duties, except, as the Employment



Commission may find necessary in its administration of Texas law.



At the federal level, BLS receives and maintains these confidential



reports under the authority of the BLS Commissioner's Order that



pledges confidentiality and prohibits disclosure except to



authorized persons for statistical purposes.  This Order precludes



any use of identifiable information for non-statistical purposes,



such as investigation or enforcement.



 



    Under this cooperative agreement with the State of Texas, it



was necessary for BLS to obtain permission from the Texas



Commissioner to select employer sample cases and to make



information about them available to BLS and SSA employees in the



ERUMS Workgroup and later to others in the Microdata Access Group.



In Addition, BLS procedures establish the confidentiality of the



identities and all information pertaining to employers in the



sample.  Members of the Workgroup who were not BLS employees were



appointed as BLS agents pursuant to another interagency agreement



with BLS.  Like BLS employees, other Workgroup members were



required to sign a Non-Disclosure Affidavit before they would be



given access to the microdata.



 



 



2. IRS And BLS Agreement



 



    The initial draft of the statement of purpose by IRS



representatives was, acceptable to IRS counsel since its



justification for sharing of confidential tax information was



defined as for purposes of tax administration, which is permissible



under section 6103(n) of the Internal Revenue Code [1].  However,



the case that was made for IRS tax administration purposes was not



acceptable to other Workgroup participants because they felt that



this did not clearly describe the purposes of the ERUMS project in



general or SSA's role in particular.  In the, subsequent draft, care



was taken to define contractual purposes in language that covered



 



                               296



 



the statistical purposes of the several participating agencies and



that provided for the exchange of records to create a common pool



of data for a variety of analytical purposes, including those



related to tax administration.



 



    In this agreement, IRS contracted with BLS for the performance



of those parts of the ERUMS project that required access to tax



data, including the wage report information that was to be provided



by SSA.  Under this agreement, SSA staff could be designated as



special agents of BLS to carry out their part of the linkage and



analysis operations.  By law, the purposes of IRS participation in



the project and its service contract with BLS had to be related to



IRS administration of the tax laws.



 



    The terms of a contract between IRS and BLS which needed to be



acceptable to SSA enabled BLS to receive tapes containing tax



information from IRS and SSA and to combine them with records in



the UI Address File maintained by BLS.  It imposed strict safeguard



procedures and required BLS to provide IRS with a list of all



persons permitted to see confidential tax return data.  This list



included SSA employees who were required to sign affidavits as



agents of BLS.



 



 



3. BLS and SSA Agreement



 



    The third agreement was a Conditions of Use agreement between



BLS and SSA which enabled SSA to release data from its employer



files to BLS and authorized BLS to link data from these files to



data in the UI Address File and data to be furnished by IRS.  Like



the IRS/BLS agreement, it limited access at each stage of the



project to those persons who needed to use identifiable data, kept



the number of such persons to a minimum, and required adequate



physical security procedures.  This agreement, which needed to be



acceptable to IRS, enabled BLS to use SSA files for the ERUMS



project.  Under this agreement, SSA would furnish BLS with SSA's



Single Unit Code File, Multi Unit Code File, and Employer Report



(W-3) Record.  The agreement authorized BLS to link data from these



statistical files with data in the BLS Unemployment Insurance



Address File and with data to be furnished by IRS, and prohibited



any other linkage.



 



 



4. Microdata Access Group



 



    In the planning and matching stages of the project, the



persons who needed to have access to microdata were those members



of the Workgroup who were performing the record matching and



verification.  At Workgroup meetings, members generally reviewed



data in the form of frequencies and other summaries to track the



progress of the matching operations and to plan future steps.



Occasionally, discrepancies appeared or questions arose concerning



 



                                  297



 



classification of a particular employer or possible mis-match of



data.  Those matters were usually referred to particular members to



resolve, with access to microdata as needed on an ad hoc basis.



 



     When the matching steps were completed and time came to plan



the analysis, new arrangements were needed to enable a different



group of persons to examine identifiable microdata.  The Microdata



Access Group (MAG) was formed for this purpose.  At this point, IRS



agreed that its contractor, BLS, would be permitted to make



Workgroup members its agents as needed for the analysis stage.



This ehabled the Workgroup members who were employees of BEA and



the Committee on National Statistics to become sworn agents who,



like the employees of BLS and SSA, would be permitted to examine



and analyze microdata.  Thus, of the three agencies sharing



microdata (BLS, SSA, and IRS), IRS was the only one that did not



have access to the matched microdata file.  This group met



periodically to plan and perform the analysis, prepare findings,



and to report its findings back to the full Workgroup.



 



      Once the terms of all contracts were agreed upon, the



contracts and the conditions of use agreement were signed by



officials of the participating agencies, and the way was cleared



for the data transfers.



 



 



Summary and Conclusions



 



      To say that the process of discussion and negotiation leading



to the signing of the ERUMS access agreements was painstaking,



sensitive, and costly in terms of staff time and delay in the



study's completion is an understatement.  The disclosure aspects of



the study severely tested the will and resolve of the affected



agencies.



 



   In retrospect, the signing of interagency agreements between IRS



and BLS and between SSA and BLS documented a process of negotiation



by which the study plan was adapted to the requirements of the



varios confidentiality laws that impinged on it.  In addition, it



summarized a process in which a combination of technical and



procedural safeguards were fitted to meet the requirements of the



Federal and State agencies that were involved in the data sharing.



 



      While the participants in the ERUMS study all feel a certain,



degree of Accomplishment due to their collective persistence, none 



are quite so upbeat about the long duration of the study.  Clearly,



the long incubation period for the interagency data sharing



agreements was a major contributor.  However, it is important to



recognize that the prolonged negotiation for interagency agreements



did not result from lack of cooperation among the participants.  On



the contrary, it reflected the complex mosaic of legal restrictions



on use and interagency dissemination of records.



 



 



                                  298



 



     Once it became evident that a single multi-party agreement



would be unworkable for the overall project, the plan was broken



down into component steps of disclosure, record linkage, and



analysis.  Each failure to reach an agreement required a step back



to re-examine the study imperatives and to adapt the procedures to



the practical and legal necessities at each stage.



 



     In addition to adding to the overall time and resources



consumed by the project, these delays further contributed to



supplemental delays, including:



 



 1.  Personnel turnover among the project participants due to



     the extended length of the project's schedule



     necessitated slower progress on the technical issues.



 



 2.  The acquisition of IRS Form 940 data was adversely



     impacted since these have a 5 year retention and were



     scheduled for destruction by the time the sample EIN's



     were determined.



 



     On the positive side, however, ERUMS demonstrated that such



data sharing projects can be successful under current laws if there



is creativity, flexibility, and most of all, persistence.



 



Notes and References



 



[1] Section 6103(n) of the Internal Revenue Code (IRC) allows for



the provision of confidential tax return information for purposes



of tax administration.  Specifically, it reads:



 



     "Certain Other Persons.  -- Pursuant to regulations



     prescribed by the Secretary, returns and return



     information may be disclosed to any person, including any



     person described in Section 7513 (a), to the extent



     necessary in connection with the processing, storage,



     transmission, and reproduction of such returns and return



     information, and the programming, maintenance, repair,



     resting, and procurement of equipment, for purposes of



     tax administration."



 



[2] Section 6108 of the IRC has three parts which call for the



publication of statistical compilation of tax return information at



regular intervals, but, unlike Section 6103(n), such information



cannot identify a particular taxpayer.  This Section is the primary



"mandate" for IRS' Statistics of Income (SOI) program.



 



 a) Publication or other Disclosure of Statistics of Income.



     -- The Secretary shall prepare and publish not less than



     annually statistics reasonably available with respect to



     the operations of the internal revenue laws, including



     classifications of taxpayers and of income, the amounts



 



                                299



 



 



     claimed or allowed as deductions, exemptions, and



     credits, and any other facts deemed pertinent and



     valuable.



 



  b) Special statistical Studies.  -- The Secretary may, upon



     written request by any party or parties, make special



     statistical studies and compilations involving return



     information (as defined in section 6103 (b)(2)) and



     furnish to such party or parties transcripts of any such



     special statistical study or compilation.  A reasonable



     fee may be prescribed for the cost of the work or



     services performed for such party or parties.



 



  c) Anonymous Form.  -- No publication or other disclosure of



     statistics or other information required or authorized by



     subsection (a) or special statistical study authorized by



     subsection (b) shall in any manner permit the statistics,



     study, or any information so published, furnished, or



     otherwise disclosed to be associated with, or otherwise,



     identify, directly or indirectly, a particular taxpayer.



 



     Section 6108(a) has been interpreted as a tax administration



purpose for the Statistics of Income (SOI) Program (unlike 6108(b)



and 61O8 (c)); hence, if a 6108 (a) study requires the use of



"outsiders", then a 6103(n) contract can be initiated as was done



for the ERUMS study.



 



 



 



 



 



 



 



 



                                     300



 



     SAMPLE SELECTION AND MATCHING PROCEDURES IUSED IN ERUMS



 



                              John Pinkos



                           Kenneth LeVasseur



                            Marlene Einstein



                   U. S. Bureau of Labor Statistics



 



                              Joel Packman



                  Social Security Administration



 



 Introduction



 



     The first paper in this session described the experience with



 developing interagency agreements, the third described the findings



 resulting from the study while this one describes the sample



 selection and matching procedures used.



 



     In addition to describing the sample selection and matching



 procedures, the followinq will explain what the ERUMS Workgroup



 considered when developing the protect design.  This paper also



 describes the sampling frames, data, and manual matching conducted



 by the ERUMS Workgroup.



 



     The ERUMS project was a pilot study, designed to develop and



 test procedures for linking and comparing employer and reporting



 unit data from different administrative record systems.  The study



 from its inception was exploratory in nature, and the ERUMS



 Workgroup members hoped to observe and document the similarities



 and differences discovered between the records in the systems being



 studied and, thus, between the systems, themselves.



 



     The scope of the project included employer reporting unit data



 from the Bureau Labor Statistics and Social Security Administration



 employer data files which have similar coverage.  Internal Revenue



 Service data, which were edited by the Bureau of the Census, were



 used to assist in the analysis of the sample.  The ERUMS committee



 members included staff from Office of Management Budget (OMB) ,



 Bureau of Labor Statistics (BLS), Social Security Administration



 (SSA), Bureau of Economic Analysis (BEA), Internal Revenue Service



 (IRS), Census and the Committee on National Statistics (CNS).



 Developing the sample design, selecting the sample, and performing



 the machine and manual match were conducted by SSA and BLS staff



 who were cleared to work with the confidential data.  To conduct



 the final analysis of the data this group was later expanded to  



 include staff from BEA and the CNS.



 



     



                                301



 



     There are two reasons for providing an account of the ERUMS



sample selection and matching procedures.  The obvious reason is



that the results, like those of any research study, are dependent



on the procedures used, and anyone interested in the results is



entitled to a full description of how the study was carried out.



The other reason, equally or perhaps more important, is that ERUMS



was a venture into uncharted territory, and we believe that future



projects of this kind will benefit from the availability of a



detailed road map of the procedures that were developed to match



and compare employer and reporting unit records from BLS, SSA, and



IRS for statistical purposes.



 



Sample Design Considerations



 



     A major design consideration affecting the size and scope of



the project was the limited staff time and resources each of the



participating agencies was able to contribute.  The committee



realized from the beginning, the meat of the project would be in



the manual review of the reporting units from each of the



administrative record systems.  To keep the workload manageable,



the Workgroup decided to limit the study to one State rather than



several.  It was also decided that this State should be large and



be one which could share its data with federal statistical agencies



for research purposes.  The State selected was Texas.



 



     Probability sampling was used at all stages of selection and



provided two benefits.  It ensured that sample results could be



used to produce unbiased estimates for the study population, and it



made possible estimation of sampling errors.  Additionally, the



Workgroup felt it would be useful for both analytical and



methodological purposes to produce weighted estimates.



Consideration was given to designing a baseline sample where a



sample from one agency (e.g., BLS) would be drawn and then a search



for the selected sample members would be conducted on the other



agency's files (e.g., SSA).  This approach would provide matched



units on both files as well as those on the BLS file but not the



SSA file.  This method, however, would not identify those units on



the SSA file but not on the BLS file.



 



     The baseline sample approach was abandoned and it was decided



that samples would be selected in two stages.  The stage one sample



was an equal probability sample of the population which was then



stratified by match status.  The stage two sample was a systematic



subsampling from these strata.  This method of sampling provided a



means for over- sampling selected types of records which were of



more interest to the project and it also resulted in a manageable



sample size.  As a final design consideration, the committee wanted



to ensure that records from both SSA and BLS had an equal chance of



selection.  Additionally, the Committee wanted to develop an



approach that would minimize the number of computer searches



 



                              302



 



required to select the sample and relevant data elements from these



large administrative record files.



 



    The sample design used was one that selected separate samples



from the BLS and SSA files using the same get of random pairs of



numbers.  The purpose of this design was to measure overlap between



the two frames and, more importantly, to measure the amount of



nonoverlap between the two frames.  The nonoverlap included those



sample members on one frame but not the other.  This design also



minimized the computer costs and allowed the committee to select



the sample in one pass through each agency's data file.  Once the



sample was selected, the relevant data elements for each sample



member were downloaded to a micro computer.



 



Sampling Frames



 



    Both the SSA and BLS data files are compilations of



administrative tax records.  The SSA data file includes data from



employer W-2 and W-3 wage reports, whereas the BLS file includes



data from employers' State Unemployment Insurance tax reports.  The



identifying data element common to both the SSA and BLS files and



assigned from a single source is the Employer's Identification



Number, or EIN.  The EIN is a unique 9-digit number assigned to



companies by IRS and is used to track federal tax payments.



 



    When companies pay State Unemployment Insurance Taxes the



State assigns an Unemployment Insurance (UI) Tax number to track



payment.  Since companies are given a federal tax credit for State



UI taxes, they provide their EIN to the State UI tax department.



On an annual basis IRS provides each State UI tax department with



a file of all the EINs registered in the State.  The UI tax



department then reconciles the amount of State UI taxes paid by



each employer against the IRS file of EINs and tax credits claimed



by each employer.  By definition, all companies on the SSA files



should have an EIN reported, because this is what is required for



an employer to be included on the file.  On the BLS State file a few



units did not have an EIN reported since only a State Ul tax number



is required for an employer to be included on that file.  The first



quarter 1982 Texas file had EINs reported for 98.7 percent of all



reporting units.



 



    The sampling frame for BLS was all the EINs reported in the



Texas first quarter 1982 U.I. Name and Address File.  The sampling



frame for SSA was all the EINs reported in the Single Unit or Multi



Unit Code file with wage reports for calendar year 1982.  The SSA



files are continuous files linked over time, whereas the BLS file



in 1982 was a snapshot of one calendar quarter.  Effective with



first quarter 1989 data, the BLS began linking data quarterly and



now has a continuous data file.



 



 



                                 303



 



     The sampling rate was determined by the Workgroup's decision



that 400 EINs would be a manageable sample size and that about one-.



half of the sample should have EINs classified as multis, or



companies with multiple locations.  EINs classified as multis were



of particular interest because there is more variation in reporting



practices.



 



     To derive the sampling rate, the committee looked at the first



quarter 1982 Texas file, which had 267,487 EINs classified as



single units and 3,125 EINs classified as multi units.  A sampling



rate of 6 in 100 was selected since it provided approximately 188



EINs that were multi units.



 



     As previously mentioned, it was decided to select a two- stage



sample.  The first was an equal probability sample of the



population.  This first-stage sample was selected from all EINs



that had 1 of 6 random pairs of numbers in positions 7 and 8 of the



EIN.  The sampling rate of 6 in 100, when applied to both the BLS



and SSA frames provided a combined stage one sample of 19,964 EINS.



The stage one sample was then machine matched and each EIN was



assigned a status classification.  The initial status



classifications are shown below:



 



 



                           MATCH STATUS IN:



 



     Table A



 



     Group                        BLS                        SSA



 



        1                        Single                     Single



        2                        Single                     Inactive



        3                        Inactive                   Single



        4                        Multi                      Single



        5                        Single                     Multi



        6                        Multi                      Inactive



        7                        Inactive                   Multi



        8                        Multi                      Multi



 



 



EINs that were inactive in both systems obviously had no chance of



entering the ERUMS sample.



 



     Another view of the status classifications is shown in



attachment A, which is a 3x3 grid having classifications, single,



multi, and No Wage Report (NWR) on each scale for both the BLS and



SSA files.  Records with no wage reports on the SSA file were



considered inactive.  The bottom right cell on the grid is not



applicable since these would be records that did not exist on



either file.



 



 



 



                                  304



 



     Based upon the interest of the Workgroup three of the basic



classifications or cells were subdivided and are shown as the



shaded sectors on the 3x3 grid (see attachment A).  County and SIC



became matching criteria for those EINS that were single on both



files.  The number of reporting units became a criterion for those



EINS that were multis on the BLS file but were single on SSA file



and those EINs that were multis on both.



 



     These eleven match status classifications became the strata



used for the second stage sample.  The second stage sample



selection had equal probability within each stratum.  The sampling



rates used varied by stratum, from selecting all to selecting 1 in



173.78.  Given the exploratory nature of ERUMS, the intent of the



Workgroup was to pull a larger sample of EINs classified as multis



and nonmatched records.  These cases were expected to present more



difficulties.  Therfore, the Workgroup wanted to, have enough of



these cases to learn what the situations were and to test methods



of dealing with them.  The final sample contained 401 EINS,



including 201 classified as having multi units on, either the BLS or



the SSA files.  The remaining 200 EINs were those not classified as



multis on either the BLS or SSA files.



 



     Once the sub sample was selected, the Workgroup began the



review and analysis phase, which included labor-intensive manual



matching.



 



     The working group reviewed reported employment and SIC and



geographic codes for each of the 401 EINS.  To assist in this



process, the Workgroup made arrangements to have access to IRS data



for tax years 1981 through 1983.  Data for 385 of the 401 EINs were



made available.



 



     During the review process the Workgroup attempted to uncover



the reasons why records did not match or why records were on one



file but not the other.  In this process of looking very closely at



the actual records from each agency, the Workgroup learned much



about the two systems and found reasons to reclassify some of the



records which affected the final match status.  For example, in the



area of multiunits, the BLS system defines multis as companies with



multiple locations within the same State whereas the SSA system



defines multis as companies that have multiple locations in the



United States.  During the review of the multi-unit records,



employment levels were considered and attempts were made to



reconcile differences in reporting units by aggregating employment



of the individual multi units to the EIN level.  As a result of



this review, the Workgroup decided not to use employment as a match



criterion.  It was also decided that for purposes of this study, a



multi unit EIN would be an EIN that had multiple locations within



the State of Texas.  This reduced the number of SSA multi unit EINs



in the final sample from 120 to 10.  The remaining 110 records were



reclassified as single EINS.



 



 



                                305



 



     As noted, the Workgroup also compared SIC and geographic codes



from both files.  SIC codes were first examined to see why there



were non-matches at the four-digit SIC level.  In some cases, the



non matched EINs were assigned SIC code in related industries; in



other cases, the industry code reflected la larger aggregation of



the reporting unit.  Another, and perhaps more important factor that



accounted for differences at the 4-digit level was, both BLS and



SSA have policies for SIC coding exceptions.



 



     The BLS in 1982 had 11 exceptions to 4-digit SIC coding which



meant a 3-digit SIC code was assigned in certain industries in lieu



of the 4-digit SIC code.  This represented 43 4-digit industries.



These are industries which either have a significant amount of



overlapping in their industrial activities or are industries that



historically had been difficult to collect sufficient information



from to assign a 4-digit SIC.  The BLS currently has reduced the



number of 4-digit coding exceptions to 6, which represents 17 4-



digit SIC industries.



 



     The SSA SIC coding exceptions exist in some agricultural



industries and Public Administration, which are coded to the 1



digit level.  This affected 64 4-digit industries.  Approximately 63



other 4-digit industries were coded at the 3-digit level for one



reason or another, typically insufficent information.  In addition



to reviewing SIC codes, the Workgroup also looked at geographic



codes and tried to explain why some records did not match between



files.  Maps and coding manuals were consulted and the review



showed there was some inherent misreporting of county codes by



employers.  Texas has more than 37 cities with the same name as a



county but these cities ate not located in those counties.



Houston, for example, is in Harris County not Houston County and



Austin is in Travis County, not Austin County.  Counties named



Houston and Austin are located elsewhere in the State.  In some



cases the reason for non matching records was that the reporting



unit was coded in an adjacent county.  Texas has a very large



number (254) of counties.  For those employers who keep their



records by city or are not familar with the county names, it is



easy to see the potential for some misreporting.



 



     The Workgroup also looked very closely at the cases having



inactive EINs on either the BLS or SSA files.  Inactive EINs for



the BLS were defined as those that appeared on the SSA file but did



not Appear on the BLS File.  Inactive EINs for SSA were defined as



those on the SSA file with no wage reports for 1982.



 



     When reviewing the BLS inactive EINs, the Workgroup used SSA



SIC and employment data to determine if the employer was exempt



from Unemployment Insurance coverage.  They also looked at IRS data



to determine if the employer became active after the first quarter



of 1982 and at the first quarter 1983 Texas file to see if the



employer reported in 1983.



 



 



                               306



 



     When reviewing the SSA inactive EINS, the Workgroup was able



to use a more nearly complete SSA wage report file that included



wage reports that were either delinquent when the sample was



selected or were in the process of reconciliation with IRS.  As a



result of these additional data, 44 of the 99 EINs originally



classified as inactive on the SSA file were determined to be



active.  The Workgroup also used the BLS 1982 and 1983 first



quarter Texas files to conduct name searches to see if the same



employer reported under a different EIN.  The Texas files were



also used to see whether zero employment was reported, which might



have indicated no wages were paid.  Additionally, IRS data were



then used to see what level of employment was reported to IRS.



 



     The last step in the review and analysis phase was to



determine the final match status of the 401 EINS.  As a result of



the review, it was decided to collapse the 11 categories shown in



Attachment A down to the basic 8 cells shown in Table A.



 



     As part of the final analysis, committee members worked on



completing the documentation for the project and discovered that an



additional 2,608 EINs that were on the SSA file but not the BLS



file were inadvertently omitted from the first stage sample and,



consequently, from the second stage.  Adding cases to the stale 1



and 2 samples at that point in time would have further delayed



completion of the study, so the Workgroup decided the best way to



deal with this problem was to reweight the sample cases in the two



affected strata and rerun the results tables.



 



 



 



 



 



 



 



 



                                       307



 



 



 



 



                           MATCH STATUS CLASSIFICATIONS



Click HERE for graphic.                          



 



 



  KEY: NWR = No Wage Report



       SIC = Standard Industrial Code



        RU = Reporting Units



 



                                  308



 



 



  RESULTS, FINDINGS, AND RECOMMENDATIONS OF THE ERUMS PROJECT



 



                           Vern Renshaw



                   Bureau of Economic Analysis,



 



                            Tom Jabine



                       Statistical Consultant



 



     The other papers in this session have examined the



administrative arrangements and the sample selection and matching



procedures for the Employer Reporting Unit Match Study (ERUMS)



This paper reviews the study's results, findings, and



recommendations.



 



     The main purpose of the ERUMS project was to provide



information on the technical and administrative feasibility of



interagency record linkages.  However, the ERUMS Workgroup hoped



that the study would also shed some light on at least three areas



of substative concern.



 



 1)  We hoped that geographic and industry information for



     reporting units contained in the Bureau of Labor



     Statistics (BLS) Unemployment Insurance (UI) Address File



     could help evaluate the potential statistical usefulness



     of a) reporting unit data supplied by multi unit



     employers participating in the Social Security



     Administration (SSA) Establishment Reporting Plan (ERP)



     for forms W-2 and W-3; and b) State data supplied to the



     Internal Revenue Service (IRS) on Form 940.  SSA has been



     concerned about the quality of its reporting unit data



     because resources for maintaining the ERP had been



     inadequate for some time and the State data supplied on



     IRS Form 940 had never been used for statistical



     purposes.



 



 2)  We hoped that information from LRS and SSA files could



     help evaluate the completeness of employer coverage in



     the UI Address File.  The UI Address File leaves out or



     estimates employer information that is not received by



     its statistical deadline, whereas information for late



     reports was generally available in the IRS and SSA files



     used for ERUMS.



 



 3)  We hoped that the analysis of matched records could help



     evaluate the consistency of industry and geographic



     coding in the BLS, IRS, and SSA systems.



 



     The extent to which the ERUMS project could actually shed



light on these areas was limited by several factors.  First, ERUMS



was a pilot study based on a small sample drawn from a single State



(Texas) for a single year (1982).  The results, therefore, could



 



                                   309



 



not be expected to reflect precisely the status of the data systems



for the entire country or for subsequent years.  (BLS has taken



steps to improve the UI Address File since 1982.)  Second, both the



information content and processing procedures differed somewhat



among the data systems.  The W-2/W-3 data were for calendar 1982,



for example, while the UI Address File that was used contained data



only for the first quarter of 1982.



 



    Finally, a number of unanticipated problems were encountered



in carrying out the study.  The most limiting of these problems



resulted from the slow implementation of ERUMS.  For example, by



the time the final sample of employers was selected, many IRS Form



940s for 1982 had been destroyed.  Therefore, it was not possible



to evaluate the State data contained on the Form 940s.



 



    Another unanticipated difficulty arose because the initial SSA



files used in the matching process omitted some wage reports and



were generally inadequate to determine if employers were actually



reporting multiple units in Texas.  These initial files were later



supplemented with more complete information, but the



supplementation occurred after the final sample had been I drawn;



consequently the size of the sample was smaller than intended for



some categories of employers, especially for multi unit employers.



 



    Finally, it proved to be more difficult than had been



anticipated to account for differences in employer coverage among



the data files.  In part, this was because estimated data were not



identified in the UI Address File (a deficiency being corrected)



and because there was no documentation of such phenomena as dates



when employment started for employers (or ended, or was changed by



reorganization, etc.) or dates when forms filed by employers were



received by the processing agencies.



 



    The clearest conclusion to emerge from the ERUMS project



related to the poor quality of SSA's ERP data for multi unit



employers.  It was evident that SSA would need to take steps to



improve quality control it the SSA system were ever to be useful



for developing data by geographic and industry classification.  The



other findings of the ERUMS project were not so stark as those



relating to the poor quality of SSA's establishment data, but the



study could well reinforce the concerns of those who worry about



the inconsistencies in industry coding that occur when employers



are coded independently by different agencies.



 



    In the following sections of the paper the results,



limitations, findings, and recommendations, of the ERUMS project are



discussed in somewhat greater detail.  Tables A-1 to A-8, which are



referred to in the next two sections, appear in Chapter III of the



ERUMS final report (Statistical Policy Working Paper 16).  In order



to meet space limitations, we have included Only Table A-4 with



this paper.



 



                                 310



 



 



 Results 



 



     As explained in detail by Pinkos et al in the second paper of



 this session, the ERUMS sample was a two-phase sample of employers,



 as defined by unique Employer Identification Numbers (EINs).  Most



 of the results presented in this paper are estimates based on the



 Phase II sample of 401 EINS, weighted to account for the



 disproportionate sampling used in the second phase of the sample



 selection.



 



     Of the Texas EINs that were active in 1982 in the BLS or SSA



 systems, 67.1 percent were active in both systems, 27.6 percent



 were active only in the SSA system and 5.3 percent were active only



 in the BLS system (Table A-1).  Only about 1.0 percent of all



 active EINs were classified as multi unit in one or both systems,



 and most of these were classified as multi unit only in the BLS



 system (Table A-4).



 



     For the matched single unit EINS, i.e., those that were active



 in both systems, an estimated 81.6 percent had the same State and



 county codes in both systems.  The remaining cases were about



 equally distributed in three categories:  same State, different



 county; same State with no county code in the SSA file; and



 different State (Table A-5).  An estimated 70.2 percent of the



 matched single unit cases had the same two-digit industry codes.



 About half of the remaining cases were not classified by industry



 in the SSA system (Table A-5).  When matched against the



 IRS/Census-edited Form 941/943 file, about three-fourths of the



 matched single units from both the BLS and SSA files had two-digit



 industry codes that agreed with those in the IRS/Census file.



 However, when the SSA unclassified cases were excluded from this



 comparison, the proportion of SSA cases that agreed with the



 IRS/Census two-digit code was somewhat greater than the



 corresponding proportion for the BLS matched single unit cases



 (Table A-8).



 



      Only a few EINs (nine sample cases) were classified as multi



 unit in both the BLS and SSA systems.  Matching individual



 reporting units for these cases proved to be difficult.  Overall,



 the nine sample employers had 105 Texas reporting units in the BLS



 system and 60 in the SSA system for 1982.



 



      Of the active SSA EINs not found in BLS's first quarter 1982



 UI Address File, it was estimated that 69.2 percent had reported no



 first quarter employment to IRS on Form 941 and therefore would not



 normally be expected to appear in the BLS system (Table A-6).  For



 another 10 percent of these employers, the analysis suggested that



 they may not have met requirements for UI coverage in Texas either



 because they had no operations in Texas, because of nonprofit



 status or because their payrolls were too small.  For the remaining



 20 percent, the reasons for their absence are not always clear, but



 



                                   311



 



it may have resulted in part from lags in incorporating new



 employers in the UI State agency and BLS files.



 



     Most of the employers who were included in the 1982 UI Address



 File but did not file 1982 W-2/W-3 wage reports (22 sample cases)



 appeared to have ceased hiring employees, gone out of business, or



 gone through other changes that altered their reporting to IRS and



 SSA.  Half of the employers in this group reported no employment in



 the 1982 UI Address File.  Many of the remainder had filed their



 final Form 941 with IRS (at least for the period 1981-1983) for a



 quarter in 1981.



 



      An analysis of the sample EINs that appeared in SSA's Multi



 Unit Code File provided some indication of the extent to which



 multi unit employers were participating in SSA's Establishment



 Reporting Plan (ERP) in 1982 (Table A-7).  An estimated 35.9



 percent of these EINs had been incorrectly added to the Multi Unit



 Code File as the result of a processing error that has since been



 corrected.  Most of the remaining employers had initially agreed to



 participate in the ERP, but more than half of this group did not



 provide separate data for each reporting unit in their W-3 wage



 reports for 1982.



 



 Limitations



 



      Several factors limit the broad applicability of the ERUMS



 findings.  The results reflect the reporting requirements and



 operating procedures associated with the agency record systems in



 1982.  There have been significant changes since then.  In



 particular, BLS has taken several steps to improve the timeliness



 and the completeness and accuracy of data in its UI Address File.



 



      The study was based on data for a single State, Texas, and on



 a small sample of employers and reporting units.  The UI system



 gives the States some latitude in their record-keeping practices,



 so indications of the coverage of employers in the record systems



 of the Texas State Employment Agency in 1982 should hot be assumed



 to apply fully to the UI systems of other States at that time.  The



 small sample size means that estimates based on the Phase II sample



 are subject to relatively large sampling errors.  Because of



 limited resources and the complexity of the Phase II sample design,



 we were able to compute sampling errors only for a few key



 estimates (see Table A-4).



 



      The analysis of the results was complicated by differences in



 concepts and coverage in the record systems used in the study.



 These differences occurred in the basic filing requirements for the



 UI and SSA/IRS systems, the time reference of the basic BLS and SSA



 files used for matching, the definition of reporting units in the



 BLS and the SSA/ERP systems, and the structures of the BLS and SSA



 industry classification systems.  In addition, certain file



 



                                312



 



deficiencies and operational problems made the analyses more



difficult.  About 1.3 percent of the records in the 1982 UI Address



File for Texas did not have EINs and therefore were not included in



the Phase I sample of EINs from that file.  I In the SSA files, a



significant proportion of employers lacked county and industry



codes.  The most serious problem was that a high proportion of



multi unit employers were not reporting separately in 1982 for each



reporting unit, so that we were unable to do a thorough comparison



of reporting units for multi unit employers active in both the BLS



and SSA systems.



 



 



     Although these differences and file deficiencies made the



analyses more difficult, the fact that we succeeded in identifying



and documenting them is an indication  that the ERUMS project



succeeded in its main goal, which was to demonstrate the



feasibility of doing matching studies as a means of evaluating the



suitability of administrative record systems for statistical uses.



 



 



     The data on amounts of employment and payroll available from



SSA, BLS and IRS files were used in reviewing the unmatched sample



cases and trying to understand why they were not present in both



SSA and BLS files.  However, the employment and payroll data were



not added to the data file for the 401 sample EINs that were used



to develop the estimates presented in this report.  Therefore, all



of the results shown are estimates of numbers of employers or



reporting units classified by attributes such as match status, and



geographic and industry codes in the different systems included in



the study.  We did not attempt to estimate what proportions of



aggregate employment or payroll were accounted for by employers who



were unmatched or had different geographic or industry codes.



 



 



Findings



 



     The detailed analyses of the ERUMS data did not suggest that



large numbers of employers who report wages in one of the payroll



tax systems were failing to report in the other system when they



should have been.  They do, however, suggest that late reports and



different procedures for processing the reports in the two systems



created potential problems for using both of the systems data



files for statistical purposes.



 



     Perhaps the clearest finding was that it is not possible to



maintain a usable establishment reporting unit plan for multi unit



employers in the absence of systematic procedures for monitoring



employer reporting and updatig files for changes in the number,



location and industry of each employer's reporting units.  SSA's



Establishment Reporting Plan clearly lacked the necessary resources



to do this in 1982 and there is no reason to think that the



situation has improved since then.



 



                                       313



 



      There, was a moderately high but by no means perfect



correspondence between county and two-digit industry codes for



single unit employers included in both the BLS and SSA systems.  A



substantial proportion of the differences arose from the absence of



county or industry codes in the SSA system.  Comparisons of



industry codes at the three and four-digit level were not attempted



because of the differences in the industry classification systems



used by the two agencies.



 



      With some qualifications, we were successful in matching the



records of employers, as defined by their EINS, in different



systems.  However, we were not successful in matching BLS and SSA



records for reporting units, the main reason being the



incompleteness of SSAs data for reporting units provided under the



voluntary ERP.  Other reasons were the lack of a common identifier,



analogous to the EIN at the employer level, for reporting units and



the slight differences in the reporting unit definitions used by



BLS and SSA.



 



      We learned what we believe are some important lessons for



others who may wish to match business records from different agency



sources, whether for research or operational purposes.  First, the



plans and the necessary interagency agreements should be developed



well ahead of the earliest date at which the files to be linked are



expected to be available.  In particular, the development of



interagency agreements for the exchange of identifiable records is



a painstaking process and considerable time may be needed for their



completion and approval.



 



      Second, successful matching requires in-depth knowledge of all



of the record systems involved and of the specific files that exist



within those systems.  An interagency team approach, with full



exchange of information, is essential because there is unlikely to



be a single individual who has all of the necessary information,



even for the files of a single agency.



 



      Finally, whenever possible, it is essential to pretest



matching procedures before embarking on large-scale operational



applications.



 



 



Recommendations



 



      ERUMS was designed primarily as a demonstration project and



was therefore limited in its coverage and scope.  Nevertheless, the



Workgroup believes that the study results, along with other



information acquired in the course of the study, justified the



inclusion in its report of five formal recommendations addressed



specifically to the BLS and SSA record systems for employers and



reporting units.  These recommendations were:



 



 



 



                                 314



 



1. SSA should undertake a full review of the current status



    and uses of the Establishment Reporting Plan and decide



    either to continue it with adequate resources for



    maintenance and improvement of quality or to discontinue



    it entirely.



 



    (note- such a review was begun by SSA prior to the



    completion of the ERUMS project.  As a result of that



    review, SSA is taking steps to prepare for the



    termination of the ERP.)



 



 2. BLS should review the State Employment Security Agencies'



    procedures for identifying employer births (including



    those resulting from mergers and changes of organization)



    and seek ways of reducing the apparent lag between filing



    of applications for EINs and inclusion of new employers



    on State Agency and BLS lists used as frames for



    statistical surveys and reports.



 



 3. Data in the UI Address File on employment and wages paid



    should be labelled to distinguish imputed data from data



    reported by employers.



 



 4. The EIN should be identified as a key item in the UI



    Address File and efforts should be made to achieve 100



    percent reporting initially and current reporting of



    changes in EINS.



 



 5. BLS and SSA (if it continues the Establishment Reporting



    Plan) should strive to obtain data from employers for,



    their establishments as defined in the 1987 Standard



    Industrial Classification (SIC) Manual Both agencies



    should code industry for all establishments, without



    exception, at the 4-digit SIC level of detail.  Whether



    or not the Establishment Reporting Plan is continued, SSA



    should code all employers identified on Forms SS-4 at the



    4-digit level of detail.



 



    (see parenthetical note following recommendation 1



    concerning the current status of the ERP)



 



    In a broader context, the ERUMS Workgroup concluded that



current efforts to collect economic data at the establishment level



are dispersed among Federal and State agencies, are poorly



coordinated, and place unnecessary burden on employers.  The



Workgroup believes that further, more intensive and extensive



interagency matching studies have an important role to play in



resolving these problems and in determining the possible effects on



statistical programs of prospective major changes in administrative



reporting systems for employers.  We therefore recommend that:



 



                               315



 



6. Further matching studies should be directed at acquiring



    information that will support the eventual development of



    a mandatory reporting system to meet the needs of all



    Federal and State statistical programs for establishment



    lists, including SIC codes.  An interim goal should be



    that all agencies requiring or requesting employers to



    provide data at the establishment or reporting unit level



    adopt common definitions of units and data items to be



    submitted for these units.



 



    Three agencies the BLS, the Census Bureau and the National



Agricultural Statistics Service -- play a dominant role in the



direct collection of establishment-level economic data.  Recent



initiatives of these agencies, under the general guidance of OMB's



Statistical Policy Office, have been directed at greater



coordination of their respective list-building and maintenance



activities.  Further integration of business lists will require



fuller understanding of the similarities and differences of the



three systems, based on matching of individual establishments and



reporting units in the different systems.



 



                          316



 



   



     



 



Click HERE for graphic.                          



 



 



 



1/Numbers in parentheses are standard errors of the percents.



* Indicates a standard error of less than 0.05 percent.



 



                          317



 



 



                    DISCUSSION



 



                  W. Joel Richardson



                   Charles A. Waite



             U. S. Bureau of the Census



 



 



Introductory Comments on ERUMS



 



     First of all, I would like to thank the many people who have



been involved with ERUMS.  Their commitment and resourcefulness



have helped to make the ERUMS project a success.  As Vernon has



detailed, several recommendations were presented that undoubtedly



will improve the business files of the Bureau of Labor Statistics



(BLS) and the Social Security Administration (SSA).  But more



importantly, the ERUMS study provided valuable experience in the



technical aspects of matching interagency data sets.  I am hopeful



that this experience will help to further the efforts of data-



exchange initiatives among federal statistical agencies in the



coming years.



 



     When the preliminary planning for ERUMS began in 1983, the



Census Bureau expected to be one of the participating agencies.



Our business employer files were to be matched along with those of



the BLS, SSA, and the Internal Revenue Service (IRS).  However,



there were significant problems concerning the release of our



confidential data.  Though we realized the importance of ERUMS, we



could not resolve these data-access problems soon enough to allow



us to be an active participant.  As an alternative, the Census



Bureau obtained observer status, which enabled us to closely follow



the progress of ERUMS.



 



     Before critiquing the three papers, I'd like to expound on the



value of the ERUMS study to the federal statistical community.



Warren stated that a major goal of ERUMS was to test the feasi-



bility of matching employer records from the business lists of



different government agencies.  This goal was, accomplished in



ERUMS, and the results showed that the matching of the two distinct



data files is possible.   Additionally, the ERUMS evaluation



revealed problems associated with matching the interagency data



files.  I expect that these findings will be valuable in future



matching studies.



 



     A matching study should be the first step in any data-sharing



proposal -- before a data sharing proposal is accepted by the



participating agencies, it is essential to confirm the



comparability of the data sets and to resolve any conceptual an



definitional differences.  In my view, the ERUMS project showed



that the BLS and SSA data sets are comparable, and that an



effective matching operation is possible.



 



 



                                  318



 



    Although there are obvious discrepancies between the data sets



  -- only 67.1 percent of the EIN records were active in both systems



  -- significant benefits could be realized through data sharing.



First, greater consistency in the industrial classification codes,



geographic location indicators, and related data values could be



achieved by sharing the data for matched records.  Second,



unmatched records could be researched in an effort to ensure the



completeness of each of the employer universes.  Though numerous



issues would need to be explored and settled, such a data-sharing



plan could result in greater comparability among the data series.



 



    Currently, the administration has a legislative proposal in



Congress that would permit limited data sharing between the Census



Bureau and the Bureau of Economic Analysis (BEA).  The primary



purpose of the proposal is to provide BEA with confidential access



to the Census Bureau's establishment information.  This information



will augment and improve the data on foreign direct investment that



BEA collects and publishes.



 



    There are other versions of the legislative proposal in



Congress to share Census and BEA data -- not only with each other,



but, in at least one version, with the Government Accounting Office



(GAO) and the Committee on Foreign Investment in the U.S. (CFIUS).



We are concerned that response rates may decline if our microdata



are made available to such policy-making organizations as GAO and



CFIUS.  For this reason, the Census Bureau does not support this



legislative proposal.



 



    The BEA collects foreign-investment data at the enterprise



level.  The Census Bureau conducted a feasibility study that showed



BEA enterprise-level data could be linked successfully with Census



Bureau establishment data.  By integrating our establishment-level



data with BEA enterprise data, BEA will be able to present foreign



direct investment statistics at a much finer industry and geo-



graphic level.  This is one of many possible data-sharing plans



that could provide significant cost and qualitative benefits to



Federal statistical programs.



 



     I would like to believe that the administration's legislative



initiative, together with successful match studies such as ERUMS,



will provide the impetus for increased data sharing among Federal



statistical agencies in the future.



 



 



Interagency Agreements for Microdata Access: the ERUMS Experience



 



     Tom Petska's presentation focused on the interagency



agreements required to comply with the confidentiality provisions



that govern the three sets of data.  Clearly, the matching of



individual records in the ERUMS project could not take place until



these confidentiality issues were resolved.



 



 



                                  319



 



    Tom has presented thoroughly the problems associated with



sharing the individual records from different agencies.  It is



apparent that these legal agreements represented a major barrier in



the ERUMS project.  To their credit, the ERUMS workgroup was able



to overcome, the confidentiality problems and to formulate a



workable plan -- IRS contracted with BLS to perform the match, and



SSA staff were designated as special agents of BLS to process the



data.  The IRS is permitted to disclose tax information to outside



contractors as long as it is for purposes of tax administration,



and the ERUMS study was considered to be a statistical study



related to the administration of IRS tax laws.  Unfortunately,



considerable time was spent in determining this solution and in



drafting the required legal agreements.  This added considerably to



the length of the ERUMS study.  Future matching studies may face



similar obstacles in gaining access to confidential data.



 



    As an example, the Census Bureau obtains the EIN and related



data values for many small employer businesses from the IRS.  Any



future studies undoubtedly will rely on the EIN to match the



records, because the EIN is the one key identifier common to U.S.



data systems.  But as Tom has pointed out, the EIN itself is



protected by Internal Revenue Code confidentiality provisions.  For



this reason, the EIN and related data that the Census Bureau



obtains from the IRS cannot be released to other statistical



agencies such as the BLS.  Only those business records whose EIN



and related data have been confirmed through direct respondent



contact would be eligible for release.  This would impact on the



completeness of any matching studies between the BLS and Census



Bureau data sets, because a portion of our business universe has



not been directly canvassed.



 



    The BLS was permitted access to IRS records in the ERUMS



project because of tax-administration purposes.  Although



additional studies possibly could be conducted using similar



arrangements, it would require the support of the IRS and other



agencies that furnish the administrative data.  Otherwise, future



studies may require changes to relevant statutes and regulations



before microdata access is authorized.  Such changes are difficult



to obtain.



 



    I do have one minor point on the paper concerning the



confidentiality provisions of the BLS data.  The ERUMS study used



matched BLS records from only one state -- the state of Texas.



Although Tom outlined the disclosure provisions associated with the



data records from Texas, it was unclear whether these provisions



were typical of the other 49 states.  We understand that BLS



affords each state with certain latitude as to the collection of



the unemployment data.  If the states also have different



confidentiality provisions -- specifically, provisions that



strictly prohibit the release of data to Federal agencies other



than BLS -- the ERUMS project may not have been possible using



records from these states.



 



                               320



 



     One of the goals of ERUMS was to gain experience in the



procedure of obtaining access to the confidential data of the



various data sets.  To this end, the ERUMS study was a success.



The study revealed the problems associated with obtaining the



access to the microdata for matching purposes, and also determined



a workable solution that overcame these problems.  However, I



expect that disclosure problems will continue to be a major



obstacle in future matching initiatives.



 



 



Sample Selection and Matching Procedures for ERUMS



 



     John Pinkos's presentation focused on the sample selection and



matching procedures in ERUMS.  As John has pointed out, a major



constraint affecting the sample size was the limited staff time and



resources.  Because considerable analysis was inevitable for the



sampled records, the-ERUMS members agreed to select a relatively



small sample.  As it turned out, 401 cases were selected.



 



     By limiting the sample to one state, and oversampling from



certain categories of records that were of particular interest,



ERUMS was able to create a manageable set of sample records that



were sufficient to meet the study's objectives.  I expect that



future matching studies will benefit from the details, of the



procedures used in ERUMS.



 



     Three sources of data were used in the study -- BLS data, SSA



data, and IRS data.  Cases were selected first from the BLS data



files and then independently from the SSA data file.  Using this



technique -- specifically, by selecting independently based on



certain digits of the EIN -- the ERUMS sample included records that



were present in only one of the two data systems, as well as



records that were present in both systems.  Records present in only



one of the data systems were a critical part of the study, as these



represented potential differences in employer coverage between the



two data files.



 



     The ERUMS study, however, did not sample from the IRS data



set.  The IRS data were used only to help analyze the BLS/SSA cases



selected in the sample.  The IRS file was not included in the



sample selection because of the difficulties in gaining access for



such a purpose.  Although this decision was unavoidable, it may



have compromised the results of ERUMS somewhat.



 



     The IRS data file represents a complete universe of business



employers in 1982 -- all employers who filed payroll tax returns in



with no exclusions as to the size of the business or the



nonprofit status of an organization, were included on the IRS file.



Without this complete file of businesses, ERUMS was left to compare



records from the BLS and SSA data sets.  Although differences were



identified and quantified, the study could not make valid estimates



 



 



                              321



 



on the completeness of the two data sets as compared to the



universe of businesses on the IRS file.



 



      A similar point exists for the matching of multiunit records



from the BLS and SSA data sets.  The ERUMS study showed that about



l percent of all active EINS were classified as multi unit in-one



or both systems.  Most of these were classified as multi unit only



in the BLS system.  One of the findings of the study was that the



SSA multiunit file is deficient, and steps should be taken to



either improve the quality or to discontinue it entirely.  Because



of the obvious deficiency in SSA's multiunit file, no legitimate



conclusions could be reached on the accuracy of the BLS multiunit



file.



 



      One last point on John's paper, he discussed briefly the



comparison of industry classification and geographic location from



the BLS and SSA files.  I would liked to have seen some general



table that presented these results.  Even if the results were



presented at broad industry and geographic levels, it would have



provided some general information on the comparability of these



critical data elements.



 



Results, Findings and Recommendations of the ERUMS Project



 



      The agencies involved in the ERUMS project have gained



valuable experience in the technical aspects of linking data files



and in the administrative requirements for gaining access to the



data.  For this reason alone, the ERUMS project should be



considered a success.  In addition to the experience gained, the



ERUMS project presented several recommendations that will help to



improve the business files of the BLS and SSA.  I understand that



the BLS has already taken several measures to improve the



timeliness, completeness, and accuracy of the data in its



Unemployment Insurance Address File.



 



      Vernon's presentation detailed the recommendations that were



identified in the ERUMS study.  In one of the recommendations, he



stated that BLS should review the procedures for identifying births



in an effort to improve the timeliness of including new employers



in the BLS lists.  I suggest that the BLS review procedures for



identifying deaths as well.  Up-to-date operational status is a



critical element of business employer records.



 



      The final recommendation in vernon's presentation covered the



need for additional matching studies to acquire information that



will support the eventual development of a reporting system to meet



the needs of all Federal and State statistical programs.  Because



of certain legislative barriers -- for example, Title 26 strictly



prohibits the release of IRS data to other statistical agencies --



and significant operational problems, such a far-reaching goal may



not be plausible in the foreseeable future.



 



                                  322



 



    The Census Bureau supports a more achievable goal of data-



sharing among Federal statistical agencies, and would welcome the



opportunity to conduct additional matching studies in an effort to



further data-sharing initiatives.  Before proposing the Census/BEA



data-sharing initiative, we conducted a matching study that



confirmed the feasibility and value of linking our establishment-



level data with BEA's enterprise-level data.  This preliminary



study was a necessary step in the Census/BEA data-sharing



initiative.  Additional matching studies may promote other data-



sharing initiatives in the Federal Government.



 



    The ERUMS project, which effectively matched interagency data



files, may help provide the impetus for increased data sharing in



the coming years.  With the necessary legislative changes,



pertinent data from each of the employer files could be shared



among statistical agencies.  Such a data-sharing plan would provide



major advantages, including greater comparability among economic



data series, less respondent burden on the business community, and



a reduction in overall Government costs.



 



Summary



 



    Comparisons between data sources are beneficial because they



highlight conceptual differences and identify the limitations and



strengths of the data sets.  The ERUMS project successfully met



both of these objectives.  In addition, ERUMS provided valuable



experience it the technical aspects of matching interagency data



sets.



 



    Our current mission should be to use this experience to



further the efforts of data sharing in the Federal Government.



Data sharing offers major advantages to Federal statistical



agencies.  By supplementing business data sets with applicable



information from the data sets of other agencies, the Federal



statistical system will attain greater comparability in related



economic data series.  The ERUMS project showed that interagency.



data sharing is a viable option.  I would like to congratulate the



many people who have been involved with ERUMS for a job well done.



 



 



                            323



 



                        DISCUSSION



 



                       Thomas J. Plewes



                U.S. Bureau of Labor Statistics



 



 



     I appreciate the opportunity to appear at this public



unveiling of the Employer Reporting Unit Match Study (ERUMS)



report.  This is an event that has been long-awaited by all of



those who have been involved in this multi-agency, multi-year, and



multi-faceted project.  I expect that no participant has awaited



this day more anxiously than Warren Buckler, who, along with the



folks here at the speaker's table and many in today's audience, has



spent a great deal of time over the past few years in conceiving,



giving birth, and nurturing this little study.  Indeed, to carry



the metaphor further, is hard to figure out where we stand now on



the continuum from project conception to death.  Is this session a



commencement ceremony, or is it a eulogy?  As my commentary will



soon indicate, I hope that we are gathered for a commencement



ceremony for the statistical community has learned important



lessons about sharing and about the basic quality of two major



business lists in this project at some significant cost.  It would



be a shame if the lessons learned were not put to use in



implementing critically needed program improvements.



 



     I would like to accomplish two objectives in the short time I



have allotted as a discussant.  First, I want to step back to



examine the environmental framework in which this study took place



and contemplate the arena into which the report now has been



thrust.  My second goal is to draw specific conclusions from the



exercise and suggest specific steps that should be taken as a



result of the work that has been done.



 



     What is the environment in which we must consider this study?



It is a complex environment, characterized by:



 



  1. Little sharing of business directory information between



     Federal government agencies, but a growing pressure to



     develop, procedures for sharing so as to reduce the burden



     on respondents.  These pressures are building to the



     extent that I believe sharing will surely be mandated.



     That mandate may come in the form of legislative action,



     a fiat from the Office of Management and Budget using its



     authority under the Paperwork Reduction Act, or of most



     profound consequence, through a centralization of the



     statistical agencies.



 



 2.  A reliance on lists characterized by their primary usage



     as administrative data sources which focus the support Of



     the administration of the law or function.  We have built



     our elaborate business directory programs and constructed



     our business survey frames on databases that have been



 



                               324



 



    developed with only a distant secondary concern for the



     statistical uses of the data.



 



 3.  Difficulty in separating statistical from enforcement



     purposes.  If we, as statistical agencies, make the data



     better and create an environment for comparing lists, we



     enhance their use for enforcement and administrative



     purposes also.  This aspect will be particularly



     troublesome when we involve, as we eventually must, the



     Internal Revenue Service in sharing schemes.  The



     participation of the IRS in the ERUMS process gave us an



     indication of the lengths to which IRS will go to protect



     the tax data, and of the difficulties this injected in



     the ERUMS process.



 



  4. A growing concern over confidentiality of establishment



     records.



 



  5. A lack of consistency of definitions and coding that



     extends throughout the statistical system, but has a most



     profound impact on sharing of administratively-derived



     lists.  Administrative differences in the programs lead



     to inconsistent definitions of even the most simple of



     terms, such as "employment", "address", "wages" and the



     like.



 



  6. An expanding recognition that errors and omissions in the



     business lists are a significant source of error in the



     survey process.  The Federal Committee on Statistical



     Methodology's Working Paper 15, "Quality in Establishment



     Surveys" documented this, and the Tupek-MacDonald paper



     this morning discussed the effect that the Bureau of



     Labor Statistics' Business Establishment List improvement



     project will have on BLS survey quality.



 



     These environmental elements pose formidable challenges to



statistical agencies that want to improve the efficiency of their



operations and reduce burden on their reporters.  For example, in



terms of frames for surveys of nonagricultural businesses, there



are at present two major government lists -- the Census Bureau's



Standard Statistical Establishment List (SSEL) and the BLS Business



Establishment List (BEL) -- and one major private sector list --



the Dun & Bradstreet file -- with a myriad of lesser known and more



specialized lists for more limited purposes.  We can look at the



SSEL as a representation of the of the SSA/IRS administrative data



files with considerable value added by the Census Bureau.



Likewise, the BEL may be seen as a representation of the State



unemployment insurance files with considerable BLS value added.



If these Federal government files do not match, and we suspect they



do not through analysis of the macrodata, the problem can be with



the basis administrative data files, with the value added, or both.



Over the years, Fritz Sheuren's various administrative database



 



                                325



 



comparison projects have documented the systemic differences in the



files very well.  They must be borne in mind.  Fixing the files



once we have identified the root difficulties is quite another



matter.  The statistical agencies do not own them, and they are



exceedingly expensive to change (in terms of budget and response



burden).  Indeed, quite often only a revision in law or nationwide



program practice will do the trick.



 



     Fixing the "value added" portion is somewhat more possible,



but it too is expensive in terms of budget and people.  often there



are good reasons for not fixing the way we add our value, such as



the need to assure the continuity of historical data series.



 



     Definitions are another challenge.  If we want to share lists,



we must think in terms of three types of problem.  In some cases,



repair is relatively simple.  We heard today, for example, that our



definitions of multi-unit employers are already in close proximity.



The EIN and SIC systems are also bedrock.  Our challenge in those



instances where there is close concordance between the files is to



maintain the definitional base in a standardized, current and



relevant manner.



 



     In other areas, we must change the way we do business but, if



we are willing, our task will be reasonably easy.  One match



problem that ERUMS identified was that the project was comparing



annual SSA reporters with lst Quarter UI reporters.  This is one Of



the problems that we can fix with time and resources, because the



data are there.



 



     In a few important other cases, however, we are quite limited



in our ability to bridge definitional gaps.  For example, when



coverages are based on Federal laws, State laws, and judicial



precedent regulating the administrative database, we would be



forced to justify a change in the insurance or tax program on



statistical grounds.



 



     Certainly, confidentiality concerns have a presence in the



equation.  We, glimpse in the Petska-Alexander paper the importance



that necessary confidentiality protection schemes had in this



project, and the price those schemes exacted in terms of time and



precision.  That's one of the reasons I like the Petska-Alexander



paper so much.  It outlines the practical implications of



maintaining a pledge of confidentiality when cooperating on a



project of importance to the statistical agencies.  Everything, as



they so well point out, had to be invented.  There are no text book



examples of interagency agreements on confidentiality.  The



solutions which the project team developed were carefully crafted



to stay within the very restrictive IRS law and were implemented



with an eye toward the reality of the environment.  Thus, there are



really two stories in the Petska-Alexander paper.  One story is



about the difficulties that the team encountered in sharing



confidential data.  The other, written between the lines, is about



 



                                  326



 



the sense of cooperation and dedication that allowed the cumbersome



solutions to move forward.



 



    The Petska-Alexander paper starkly reminds us that the role of



confidentiality policy is important but little understood.  We may



be hopeful that the current situation will be short-lived.  The



National Academy of Science's Committee on National Statistics had



taken on these issue with the formation of an expert panel.  Until



we are able to benefit from that report, however, we are left with



the fact that understanding of confidentiality of business records



has not progressed very far as either science or practice.  Only



recently has a literature on the subject of confidentiality begun



to emerge, but most of it addresses the more emotional topic of



confidentiality of information about individuals.  The literature



pays little attention to issues surrounding confidentiality of



business records.  Without such a foundation, the statistical



agencies have mostly assumed that the issues of confidentiality of



business records are the same as those for individuals.  This



assumption has played an important role in justifying past limits



on sharing between the Federal agencies.



 



    The second paper, by Einstein, Levasseur, Packman, and Pinkos,



also attempts to stand back with benefit of hindsight and make some



sense out of what was a convoluted process.  Since 3 of the 4



authors work with me, these comments may not be as critical as



others may have rendered, for all along the way I "bought-in" to



the approaches taken and the effort expended.  Nonetheless, I view



the documentation that this paper offers in a somewhat different



light than the authors, and draw slightly different conclusions.



 



    The matching process, as described, makes a good deal of



statistical sense.  The team selected a two-stage sample selection



process, stratified into 9 groups.  The second phase, a subset of



about 400 cases of the first selected on a probability basis,



provides for detailed analysis.  Some of the specific steps in the



process were to meet the confidentiality restrictions, but not all.



 



     The process that the team established should serve as a first



step toward developing an on-going statistical process control



system, if and when sharing does take place.  Many of these same



activities should be continued in a recurring program to meet the



objectives of total quality management.  Thus, the work of the team



has long-term, permanent implication.  The authors seemed to



recognize this when they stated that "we believe future projects of



this kind will benefit from-the availability of this detailed road



map".  Probably so, but I speculate that future researchers will



look at the road map and decide against making the journey.  That



is why I would take pains to separate the enduring aspects that



should be the foundation of a quality management system from those



that were necessary to meet more bureaucratic objectives.



 



 



                               327



 



     The contribution of the Renshaw-Jabine paper is to Yield some



hope, in that it reminds us how close we are to an ability to



share, while providing some sober reflection about some major tasks



still lying ahead if we are to share.  Their bottom line is that



the systems are reasonably close in coverage -- eventually most



employers emerged in the systems.  There were troublesome



differences in multi-unit identification, in county coding, and in



industrial classification at the 2-digit level, but I would label



these of moderate concern.  Indeed, under the BEL initiative, BLS



has taken steps to correct many of the inadequacies in its data,



investing with the States in improving SIC coding, interpretation



of SICS, and, more recently, in fixing the multi-establishment



identification problems.  Unfortunately, with lack of resources,



the Social Security Administration has not been able to make the



same investment, so many of the difficulties in the SSA file may



have multiplied.



 



     In summary, we ought not let this expensive experience lie on



the shelf.  We have learned a great deal about two files -- lessons



that should be extended to files maintained by the Bureau of the



Census.  And we need to get on with fixing some of the obvious



flaws in the administrative data.  Most importantly, we have



learned that maintaining confidentiality is possible, that matching



is feasible, and that the will is present at the staff level in the



agencies to make it all come together.  Now it is time for



leadership.  As Senator Bennett Johnston said in an argument before



Congress, "There's a time to stop talking the talk and start



walking the walk."  We have the map.  Let's start walking.



 



 



                               328



 



 



                         Session 10



 



                   APPROACHES TO DEVELOPING



                         QUESTIONNAIRES



 



 



 



 



 



 



 



 



                              329



 



 



                            330



 



 



TOOLS FOR USE IN DEVELOPING QUESTIONS AND TESTING QUESTIONNAIRES



 



                         Theresa J. DeMaio



                     U. S. Bureau of the Census



 



    As the collection of information through surveys becomes more



prevalent in our society, increasing numbers of people find



themselves in a position to develop questionnaires.  Writing a



questionnaire seems like such a simple task -- many people think



that anyone without training or experience can do it.  But



developing a good questionnaire -- one that can obtain good quality,



information that meets the objectives of the survey -- is not as



easy as it looks.  Many different kinds of abilities, including



subject matter expertise, writing capabilities, and knowledge of



social psychological principles are necessary to develop a simple,



cohesive questionnaire in which the questions are clearly worded.



Developing a good questionnaire is not a solitary task -- simply a



matter of sitting down at your desk for a few minutes or even a few



hours.  There are a number of procedures that can be used to



involve potential respondents in content or question development,



and to test and evaluate questionnaire drafts before they are



finalized.



 



     The purpose of Statistical Policy Working Paper #10,



Approaches to Developing Questionnaires. is to provide practical



information about these methods.  The report contains descriptions



of 11 different techniques, which can be used at various stages of



questionnaire development.  The report is structured in three



parts:  tools to develop questions, procedures for testing the



questionnaire draft, and techniques used to evaluate the



questionnaire draft.  This structure was somewhat artificially



imposed for ease of presentation in the report.  In fact, there is



no one ideal way to go about the process of developing a



questionnaire.  Depending on a number of factors, such as whether



you're working from scratch or from an existing questionnaire, how



much time and funds are available for survey development, these



techniques can be used in many different combinations.  In terms of



improving the content of a survey questionnaire before it goes out



into the field, the important thing is that testing and



developmental work be conducted, not necessarily that it be done



according to the structure presented in the report.



 



     Having made this disclaimer, I am nevertheless going to



discuss the techniques that are presented in the first two sections



of the report -- that is, tools for developing questions and



techniques for testing the questionnaire draft.  I'm going to



generally describe the methods contained in the report, and mention



some additional techniques as well.



 



 



                                 331



 



Developing Questionnaires



 



    Part I of the report describes three tools for developing



questions.  The report presents these methods as useful in



developing new questionnaires.  I'd like to expand on this a little



and suggest that these techniques can be used in the early stages



of questionnaire development of any survey.  Most surveys are



conducted more than once; subsequent rounds of data collection



begin with an existing questionnaire draft that is subject to



revision.  These later rounds each have early stages of



questionnaire development, complete with an existing questionnaire



draft.  In these cases too, the methods described in Part I of the



report may be appropriate.



 



Unstructured individual interviews



 



     Unstructured individual interviews are one-on-one



conversations between a researcher and a member of the population



for the survey or proposed survey.  I use the term "conversations"



because the discussion is unstructured; rather than having a set of



specific questions, the researcher uses a topic outline that



collects information on various aspects of these topics in whatever



order, and using whatever terminology the respondent suggests.



Respondents may also bring up additional issues related to the



general topic, which might be incorporated into the topic outline



for later interviews.  The goal is an unstructured setting in which



the researcher finds out how the respondent perceives the topic of



interest, what terminology the respondent uses to talk about the



topic, whether the respondent is knowledgeable and able to provide



information on the topic.  By working from a blank slate, the



researcher is not constrained by the content and terminology of an



existing questionnaire, and the true frame of mind of the



respondent is more likely to surface.



 



Qualitative Group Interviews



 



    Many of you may be familiar with qualitative group interviews



under a different name, such as focus group interviews, group depth



interviews, or focussed discussion groups.  Essentially these are



unstructured interviews with a group of respondents rather than a



single respondent, led by a group moderator.  About 8 to 12 people



participate in a group, and the moderator uses a topic outline to



guide the discussion.  Qualitative group interviews are used for



many research purposes other than questionnaire development.  When 



used to assist in questionnaire construction, the goal is the same                                                    



as the goal of unstructured individual interviews -- to elicit the



terminology used by respondents in thinking about the topic in



question, to determine aspects of the topic that respondents



consider important, and to get a reading on how respondents react



to aspects of the topic that survey planners consider important.



 



                               332



 



The difference between qualitative group interviews and



unstructured individual interviews is, obviously, the group setting



the diversity of opinions held by group members may stimulate



interaction among them that elicits more information than could be



obtained through interviews with each member separately.  In order



for these groups to be successful, however, the ability of the



moderator is an important consideration.  The idea is to stimulate



discussion among all the participants and to avoid domination of



the discussion by some people who may be more vocal than others.



 



 



Participant Observation



 



    Participant observation is a technique that is used as an



independent method of data collection, as well as a tool for



questionnaire development.  It has been extensively used around the



world.  The basic elements of the technique are suitable for



questionnaire design purposes, especially in developing



questionnaires for use by members of other cultures or subcultures



living within our own country.  For example, the homeless



population is a subculture that is currently the object of much



interest, and for which the use of participant observation



techniques is relevant.  Indeed, these techniques have been



successfully used in research on homelessness being conducted at



the Census Bureau.



 



    There are several distinguishing characteristics of



participant observation research.  First, the researcher must speak



the respondents' language.  This is not limited to English as



opposed to a foreign language, but also refers to dialects, slang,



or professional jargon.  Second, the researcher associates with the



members of the community he or she studies and engages in their



activities.  Ideally the researcher lives among the respondents; at



a minimum, he or she develops contacts in the community over a long



period of, time.  The participant observer may also use the



ethnographic interview technique during the course of his or her



research.  This involves using unstructured interviews (the



methodology I previously described) with "key informants."  These



are members of the community who are willing to talk at length with



the researcher or introduce the researcher to other community



members.



 



     From this brief description, it should be obvious that



participant observation is not a methodology that a person can



"pick up" by reading an introductory textbook.  The expertise



required in the use of this technique dictates the involvement of



trained ethnographers.  While that may limit its use somewhat among



U.S. statistical agencies, there are several ways it can be



incorporated in a project.  First, participation observation can be



conducted as part of a project by trained anthropologists hired to



serve on the project staff.  In the homeless project I referred to



a moment ago, we hired an anthropologist to work with a survey



 



                              333



 



methodologist, and this combination has worked out very well.  A



second way to make use of this technique is to consult with



ethnographers who have prior experience among the culture of



interest, and take advantage of this previous experience rather



than conducting original fieldwork.  This could be done either by



hiring the person on staff or doing it on a consultant basis.



 



 



Think Aloud Interviews



 



    Another technique suitable for the early stages of



questionnaire development has gained in popularity since the



Working Paper was completed in 1983.  This is the think aloud



interview.  Also referred to as protocol analysis, this method is



an extremely valuable source of information about how respondents



understand the survey questions put to them, and how they go about



answering the questions.  The purpose of the technique is to get



respondents to talk out loud and verbalize their thoughts as they



respond to questionnaire items.  The data of interest here are



respondents' reactions to the items, their thoughts as they



formulate answers to the items, and what decisions they make in



answering the questions.



 



    Use of the technique requires a questionnaire draft.  Since



the results of these interviews are crucial to the questionnaire



development process, the person doing the interviewing is generally



a researcher or questionnaire designer.  For interviewer-



administered surveys, the questioner first explains to the



respondent that rather than just answering the questions, he or she



should actually think out loud -- that is, say what he/she is



thinking as he/she answers each question.  Respondents differ in



their ability to verbalize their thoughts, and some may require a,



bit of probing to uncover how they arrive at the answer to a



question.  At times it may take skillful questioning to probe



completely what is on a respondent's mind.  The interviews are



generally tape-recorded (with the respondent's permission), since



it is difficult to take notes and concentrate on probing the



respondent's answers at the same time.



 



    This technique can also be adapted for self-administered



interviews.  In this case, the questioner is basically an observer.



The respondent is instructed to complete the questionnaire, reading



the questions and instructions out loud as well as verbalizing the



responses.  I've done quite a few of these interviews, and they



really are quite helpful in detecting layout problems (not noticing



skip instructions, etc.) in addition to uncovering problems with



the questions.



 



    This technique is used with relatively small numbers of



respondents.  Ten or fewer think aloud interviews provide large



amounts of information and can uncover systematic



misinterpretations or other problems.  Use of the technique is an



 



                                334



 



iterative process -- once the questionnaire designer conducts five



to ten think aloud interviews, problem areas will generally



surface.  Then, after revisions to the questionnaire are made,



additional interviews can be conducted to detect problems with the



revisions.  Or alternatively, some other method can be used for the



next round of questionnaire development.



 



 



Testing Questionnaires



 



     Whatever methods are used to develop a questionnaire draft, it



must be subjected to testing before it can be used in the field.



There are a number of ways that this can be done, involving various



levels of time and effort.  Part II of the report mentions three



techniques:  informal testing, pilot studies, and split sample



testing.  I'11 describe each of these briefly and also add another



selection to the menu.



 



 



Testing Multiple Questionnaires



 



     In the questionnaire testing phase, the content of the



questionnaire may be pretty much set except for fine tuning, or



substantive questions may remain about how best to ask about a



topic.  When the latter is the case options that involve the



testing of alternative questionnaires should be considered.



 



 



Experimental Group Session



 



     The experimental group session is a small-scale method of



testing alternative questionnaire versions, applicable only to the



development of self-administered questionnaires.  It may be



conducted with respondents who are selected for their demographic



characteristics and are not representative of any larger



population.



 



     In an experimental group session, respondents come to a



central location (usually a large room containing tables or desks)



for the purpose of completing a questionnaire.  A group session is



held, and 20-30 respondents participate at one time.  The session



is experimental, since more than one questionnaire version is



randomly administered.  A moderator conducts the session, and



questionnaires are randomly distributed to the participants.  After



the questionnaire has been completed, a debriefing form may be



administered to collect additional information about how the



respondent interpreted specific questions.  Multiple sessions are



conducted until the total number of respondents is large enough



(about 500 or so) to facilitate statistical comparisons of the



responses to the alternative questionnaires.



 



                                335



 



      This methodology does not duplicate the response situation of



a self-administered questionnaire where the respondent receives a



form in the mail and returns it that way as well.  For one thing,



the respondent in the group session does not have access to other



household members or to personal records, which may be necessary to



answer some items.  For another thing, once a respondent has joined



the group, he or she generally completes the questionnaire and



turns it in, while at home the questionnaire might remain



unanswered.  Despite these limitations, however, the methodology



has definite advantages in the early stages of questionnaire



development.  It takes a relatively short time to arrange and



conduct the sessions.  If the statistical analysis is conducted



quickly, it can provide rapid feedback about large differences in



response to the alternative questionnaires, for use in later



revisions of the survey instrument.



 



Split Sample Testing



 



      In some situations, the questionnaire designer needs a large



sample of respondents and a more formal test of different question



wordings, question concepts, or methods of categorizing responses.



This is particularly important in developing a major new survey



instrument (such as SIPP when it was introduced several years ago),



or in revising an existing questionnaire such as the decennial



census form.  When the nature of the survey requires large-scale



testing of different versions of a questionnaire, the vehicle of



choice is the split sample test.



 



      Split sample testing, also referred to as split ballot or



split panel testing, involves the use of multiple questionnaire



variants, each administered to a portion of the sample.  The entire



questionnaire need not be different, but the alternative



questionnaires should contain different versions of the items that



are the focus of the test.  In fact, the questionnaires should not



contain too many differences, since all the variations in a



questionnaire can affect response.  One way to deal with this issue



is to limit the number of questions that are tested.  Another might



be to use automated data collection methods such as CATI or CAPI,



which provide the means to randomize several experimental series of



questions with respect to each other.



 



      Once the content of the questionnaires is established, the



alternative questionnaires are randomly distributed among the



sample population, to decrease bias due to factors other than those



being tested.  The procedures for data collection are basically the



same as for a survey containing a single questionnaire with one



exception:  control procedures must be established to I ensure that



each sample case is assigned to the proper treatment group.



 



      In a split sample test, the responses to the question variants



that are the focus of the test are of keen interest.  Thus,



 



                                336



 



statistical analysis of the data is an important aspect of the



evaluation.  In addition, observation of interviewers and



interviewer debriefing can be used for a personal interview survey,



and information gained from these methods can help inform some of



the statistical results of the analysis.



 



Testing Single Questionnaires



 



    The final two techniques I would like to mention involve



testing of questionnaires that are ready for "fine tuning."  That



is, major uncertainties about the content of a questionnaire do not



exist, although changes in the instrument may be recommends base



on the results of a field test



 



Informal Testing



 



    Informal testing as its name implies, is a relatively casual



method of evaluating a questionnaire.  It is relatively small in



scope, involving between 50 and 300 interviews.  The cases for



interview are selected purposively, rather than through any kind of



systematic sampling methods.  This may be accomplished by selecting



participants from a broad range of subpopulation groups, in the



case of a test for a national survey, or limiting the participants



to narrow population segments, if for example, a survey of food



stamps or social security recipiency is being tested.



 



    The informal nature of the test also carries over into the



evaluation system.  While some basic, quantitative information is



calculated from the questionnaire responses, such as item



nonresponse rates, and the number of "don't know" responses most



of the evaluative information is based on observational feedback.



There are several ways of obtaining this feedback.  Observers,



including the questionnaire designers, can accompany interviewers



in the field, or in the case of telephone interviews, they can be



tape-recorded.  Specially-designed evaluation forms can be



completed by interviewers and/or observers.  Also, interviewers and



observers can be debriefed after the interviewing is completed.



Most of the information collected through these methods is



subjective, based on the impressions of the staff present at the



interviews.  The informal testing procedure also allows



unstructured discussion with the respondent at the end of the



interview.  In response to probing by the interviewer or observer



the respondent can provide information about his or her problems



with the questionnaire, the meaning of specific items on the



questionnaire, or other items of information.  Observers who are



involved in the survey as questionnaire designers, subject matter



specialists, or sponsors may use their background knowledge to



guide their probing and obtain useful information for evaluating



the questionnaire.



 



                                 337



 



 



 



Pilot Studies



 



     In contrast to an informal test, pilot studies are much more



formal and are conducted on a larger scale.  Pilot studies are



generally conducted further along in the questionnaire development



cycle, and the goal is to duplicate the final survey design from



beginning to end.  This includes data collection from a larger



sample, scientifically selected to represent the survey universe,



and execution of data processing and perhaps tabulation procedures



as well.  Needless to say, this is a lot more time-consuming than



an informal test, and is not attempted until the questionnaire is



in a final or nearly final state.  Where the informal test seeks to



uncover problems with terminology and question interpretation, at



this point the questionnaire design issues of concern relate to how



well the survey instrument performs in conjunction with the other



aspects of the survey -- for example, errors in key codes or



problems with response range categories for numerical data.



 



     Evaluation of the results of a pilot study is much more



quantitative than the analysis of an informal test.  In a pilot



study, the data capture, editing, and imputation programs are



performed and, to the maximum extent possible, the data analysis



plan is executed.  This tests all the software developed for the



survey and checks to see that the various stages of data processing



are properly coordinated.  Frequently, time constraints limit the



amount of analysis conducted on pilot study data; however, the more



effort is expended at this stage, the less likely you will be to



find surprises when the survey is actually fielded.  In addition to



this formal evaluation of a pilot study, some less rigorous



evaluative tools are also used observation of interviewer



training sessions is generally conducted and modifications to the



training are suggested, as necessary.  Also, observers may



accompany interviewers for a personal visit survey, and both



interviewers and observers are debriefed.



 



     Another use of a pilot study might be to phase in a new



questionnaire in a continuing survey.  Rather than adopting the new



questionnaire wholesale, overlapping samples can be designed, in



which a portion of the respondents receive the new questionnaire



and the rest receive the old one.  The purpose here is not to test



a questionnaire, but to collect information about the alternative



questions and measurement strategies.  The goal is to calibrate the



old and the new questionnaires, to provide quantitative information



about differences in response, which might affect the time series



for the survey.



 



Discussion



 



     The descriptions I've presented demonstrate the wide range of



options available in the questionnaire development process.  The



intent of the report was not to suggest that each of these should



 



                              338



 



be used in the development of a single questionnaire.  Rather, we



wanted to familiarize questionnaire designers with the techniques,



to encourage their use, and to promote the value of testing



questionnaires in general.



 



      As I said at the beginning of this paper, there is no one



ideal procedure to follow in preparing to field a survey



instrument.  It is generally best to start by talking to



respondents, with or without a draft questionnaire, to find out if



the vocabulary and the intent of questions are understood.  Think



aloud interviews, unstructured individual interviews, or other



techniques that involve in-depth one-on-one discussions with



respondents are extremely helpful here.



 



      These techniques are not limited to the earliest stages of



questionnaire development, however.  A draft questionnaire can be



revised based on think aloud interviews, used in a field test, and



the revised version also used in additional think aloud interviews.



It is an iterative process that can continue as long as you find



problems that need fixing.



 



      Similarly, there is no magic formula for field testing an



instrument.  An informal test followed by a pilot study might be



warranted based on the characteristics of the survey.  Perhaps a



series of informal tests might be considered for some complicated



surveys.  Or informal tests might be followed by a split sample



test.



 



      Depending on the circumstances of a particular survey and the



time and budget allowed for survey development many possibilities



are available.  The important point is that testing facilitates



problem detection, and fixing problems in a questionnaire will



improve the quality of the data that is obtained.



 



 



                            339



 



 



     TECHNIQUES FOR EVALUATING THE QUESTIONNAIRE DRAFT



 



                       Deborah H. Bercini



              National Center for Health Statistics



 



    This paper reviews the section of the report, Approaches to



Developing Ouestionnaires, called "Techniques for Evaluating the



Questionnaire Draft."  How do these techniques differ from "Tools



for Developing Questionnaires" and "Procedures for Testing the



Questionnaire Draft" described in earlier sections of the report?



In many cases, they do not.  When there is a lot of time before a



survey goes into the field, a particular method might be described



as a tool for developing the questionnaire.  When time has run out



or the survey is already in the field, the same method would be



referred to as a technique for evaluating the questionnaire draft.



It doesn't really matter.  In fact, the beauty of some of the



techniques covered in this paper is that they can be adapted for



use anywhere in the questionnaire design process.



 



     What are these techniques?  At first glance, the chapter



headings shown in Figure 1 represent an apparently unrelated



assortment of methods.  However, there is a common thread.  What



links these techniques is that each uses an external source of



information to evaluate the performance of the questionnaire.  In



this case, "external" refers to data that originates outside of the



answers to the questionnaire items themselves.  The first three



techniques rely on the insights of the survey participants, that is



respondents and interviewers.  Next, observer evaluations are



provided by an outsider who is not part of question-response



process.  The last technique, record checks, steps even further



away from the interview or data collection situation by comparing



questionnaire responses to an independent criterion, usually



administrative records.



 



     Approaches to Developing Ouestionnaires predated the emergence



of laboratory or cognitive evaluation methods.  The cognitive



approach has provided a theoretical framework for understanding and



reducing many kinds of response errors.  Although this framework



was not in place in the early 1980's, some of the techniques that



follow are similar, if not identical, to those used in today's



labs.



 



 



                           340



 



 



                          Figure 1



 



            Technique                      Source



 



     Frame-of-Reference Probing          Respondents



 



     Response Analysis Surveys           Respondents



 



     Interviewer Debriefings and         Interviewers



     Interviewer Questionnaires



 



     Observation and Monitoring          Survey designers, etc.



 



     Record Check Studies                Administrative records



 



Frame-of-Reference Probing



 



     Frame-of-reference probing is such a technique.  This method



evaluates the questionnaire by probing for how respondents



understand key concepts, terms, definitions and instructions.  The



probes may be in the form of structured questions developed before



the interview, or ad hoc, spontaneous questioning by an



interviewer.  Although this technique could be applied at any stage



of questionnaire development, the report deals with it primarily in



the context of field testing and the survey itself.  The probes can



be inserted after selected survey questions or they can be grouped



at the end of the interview.



 



     When probing is unstructured, it is usually done by the survey



researchers or questionnaire designers because of their greater



insight into question objectives.  Standardized, structured probes



can be administered by a field interviewer as part of the data



collection process.



 



     The use of frame-of-reference probing requires some planning.



The first decision concerns when in the development process to



probe.  While probing will yield useful information at any time,



clearly it will have the most impact when it is done with early



drafts.  If there are problems with fundamental concepts and key



terms, it makes sense to detect them soon enough to work on a



solution.  If it turns out that respondents have difficulty with an



entire topic or questionnaire approach, then modifications might be



needed in the data objectives, not just question wording.



 



     Structured probes, of course, need to be developed in advance.



These may range from general, all purpose questions such as "What



does so and so mean to you?" to more specific individualized



probes.  Even when probing is going to be of the unstructured or ad



hoc variety, a plan for which questions and which terms to probe is



advisable.  In a field setting, respondents' time is limited and so



are the number of probes that tan be asked.  Based on previous



 



                                341



 



 



testing, researchers are likely to have some notion of potential



trouble spots to cover.  This information can be used to develop a



protocol which specifies the criteria for probing.



 



     No single evaluation technique is comprehensive or perfect.



Each has its strengths and limitations.  The particular strength of



probing is that it can identify problems in the questionnaire that



are often missed by methods that rely on respondents giving overt



indications of difficulty.  Consider the question, "During the past



year, have you had pain in the abdomen?"  This question was tested



in the Questionnaire Design Research Laboratory at the National



Center for Health Statistics.  Most laboratory respondents answered



it readily with a "yes" or a "no."  It was not until interviewers



probed for how respondents interpreted the term "abdomen" that it



was discovered that very few respondents knew exactly where their



abdomens were (1].



 



     Probing also has the potential for identifying the underlying



causes of response problems, not just the fact that a problem



exists.  Returning to the example, the problem was variable



interpretation of a key term.  The underlying cause was lack of



knowledge.  When the cause of a question problem is understood, the



solution is likely to suggest itself.  In this case, the solution



was a respondent flash card that showed an outline of the torso



with the abdominal area shaded in.



 



     Skilled probing is cost-effective.  It can unearth quantities



of information on how the questionnaire is working in a relatively



short time.  However, exclusive reliance on standardized probes



tends not to produce very useful insights.  Specialized probes



require more time to develop but yield more valuable results.



Also, if no ad hoc probes are used, unanticipated problems will be



missed.  The results of unstructured probing are, of course,



subjective and anecdotal and require some skill to interpret.



 



     Today, probing is one of the primary tools of the cognitive or



laboratory approach to questionnaire evaluation.  But its use is



not limited to comprehension issues.  Question laws related to



vague concepts or unfamiliar terms art certainly common, but they



are by no means the only sources of response error.  Probing



techniques can be used effectively to detect question problems that



affect many components of the response process.  These include



recall, estimation, judgement, decision making and motivational



factors (2].



 



     Probing and other intensive interviewing methods have now



evolved into a major and separate phase of the questionnaire design



process that usually precedes the field testing phase.  This



approach gives questionnaire designers freedom to explore the



response process in depth.  Constraints on probing are significant



when it is done during a field interview that was designed for



another purpose.



 



                               342



 



 



Response Analysis Surveys



 



     The response analysis survey (RAS), like frame-of-reference



probing, evaluates the questionnaire from the respondent's



 



perspective.  The report describes it as a technique used to



evaluate mail surveys, especially mail surveys of establishments.



In effect, the response analysis survey is a survey about a survey



in which personal interviews are conducted with a sample of mail



survey respondents.  Interviewers administer a structured



questionnaire which asks questions about how respondents would go



about answering the mail survey questions.



 



     The typical RAS would collect data on how establishment



records are maintained, what kinds of information they contain, how



difficult it is to retrieve this information and so on.  It also



attempts to find out if respondents can understand what is being



asked of them, their willingness to provide the information and



other aspects of respondent burden.



 



     If the RAS is being conducted to evaluate an on-going survey



or to prepare for the next cycle of a periodic survey, researchers



use available data on response errors as a guide when developing



questions for the RAS.  Data collection and analysis proceed as in



a regular survey.  The results are interpreted and then used to



redesign the questionnaire for the next mail survey.



 



     The strengths and limitations of the RAS parallel those of



frame-of-reference probing.  The RAS asks the respondent to analyze



the response task, and, in doing so, both techniques are capable of



detecting covert response problems and their underlying causes.



And the RAS, like probing, loses some of its potential when only



structured probes are used.  On the other hand, a formal response



analysis survey will produce valuable, objective data that will



reliably indicate where questionnaire revisions are needed.



 



     Variations or adaptations of the RAS concept can be used to,



evaluate any self-administered questionnaire, not just mail



questionnaires for establishment surveys.  Laboratories at several



government agencies test self-administered questionnaires using a



combination of observation, think-aloud, and structured and



unstructured probing methods.  The Government Accounting Office,



for example, uses an interesting combination of observation and



laboratory-style probing.  They watch respondents complete the



questionnaire noting all kinds of non-verbal behavior such as



sighs, grunts, head shaking and other signs of impatience, skipped



questions, and so on.  Afterwards, the interviewer goes back to



each of the questions that provoked the reaction and asks the



respondent to elaborate.



 



                                 343



 



 



Learning from Interviewers: Interviewer Debriefings and



      Questionnaires



 



      No assessment of interview survey questionnaires is complete



 without the interviewer's input.  The report chapter, "Learning



 from Interviewers," presents two techniques for gathering



 information from interviewers - interviewer debriefings and



 structured post-interview evaluations.  The latter are



 questionnaires completed by interviewers about various features of



 the survey questionnaire.  Either method can be employed with



 pretest questionnaires or during an on-going survey.



 



      The interviewer debriefing session is a forum in which



 interviewers can relate their experiences in administering a



 questionnaire or data collection procedure.  Their scope and



 formality can vary with the scope and formality of the testing



 operation that they accompany.  In large field tests involving many



 interviewers, a single comprehensive debriefing is usually held



 when interviewing is completed.  With more informal testing, it is



 possible to conduct multiple debriefings throughout the pretest



 period.  Questionnaires can be revised on the spot, tested, and



 then revised again.



 



      Interviewer questionnaires can take several forms also.  They



 can be directed to a specific issue or, problem, such as



 nonresponse.  Or they can consist of questions designed to get at



 suspected difficulties with particular survey items.  Another,



 format is a questionnaire made up of standardized ratings that



 interviewers apply to each survey item.



 



      Any survey planner who ignores what interviewers have to say



 about the questionnaire is taking a great risk.  Interviewers are



 in the best position to comment on how the questionnaire and other



 survey procedures affect respondent cooperation.  For the most



 part interviewer performance is judged on response rates and



 completion rates, so interviewers will naturally be sensitive to



 factors that affect performance in these areas.  Interviewers also



 have excellent insights into the logistics of questionnaire



 administration and are quick to spot things that impede the



 efficient flow of the interview.  If survey planners, for whatever



 reason, do not heed interviewers' major objections, they will pay



 a price.  A questionnaire that interviewers find unnecessarily



 difficult to administer will lead to poor interviewer performance,



 and therefore, lower data quality.



 



      There are also limitations to what can be learned from



 interviewers.  Interviewers are often more adept at making bad



 questions work than they are at finding flaws, unless the question



 is so bad that the interviewer can't figure out how to ask it and



 the respondent can't or won't answer it.  The interviewer's job is



 to get a response and they are good at it.  They are less likely to



 



                                  344



 



notice subtle problems with question wording, interpretation, and



so on, as long as the respondent gives a codable response.



 



     Reliance on post field test interviewer debriefings to detect



question problems is a poor evaluation strategy.  Only those



problems that visibly disrupt the interview will be mentioned.  And



as in any group situation, the most vocal will dominate.  It can be



difficult to achieve concensus as to what the problems are because



interviewers can have varied experiences depending upon the sample



cases they have interviewed.  Using interviewer evaluation



questionnaires to supplement the debriefing can compensate for some



of these drawbacks.



 



     Getting interviewer input need not be confined to the field



situation.  It is possible to ask interviewers to evaluate draft



questionnaires before they ate field tested.  This can be done in



a laboratory setting with "real" respondents or with researcher



respondents.  Although subject to some of the same sorts of



limitations mentioned above, it may be possible for interviewers to



identify some flaws in this way.  It could be especially useful to



ask interviewers to try out questionnaires that have been adapted



to new data collection modes, such as CAPI, for example.



 



Observation and Monitoring



 



     Observation of face to face interviews or monitoring telephone



interviews-evaluates the questionnaire from the perspective of a



third party.  Observers are usually people involved in the survey



planning process, from sponsors and subject matter experts to



questionnaire designers and data analysts.  At NCHS, observation



usually takes place during field pretests, but the same methods



could be used to evaluate on-going surveys.  Most often, an



observation program provides a qualitative, subjective assessment



of questionnaire performance and related communications.  An



infrequently used, but more objective approach, is known as



behavior or interaction coding in which standardized codes are used



to evaluate question performance.



 



     In all cases, observers need some preparation for their task.



Attending the interviewer training helps as well as some coaching



on specific situations or problems to watch out for.  Observer



forms can be useful, provided observers are not so busy recording



minor detail that they miss more significant interactions.  At the



end of the testing period, observers may submit a written report



summarizing their experiences, participate in a debriefing session,



or both.



 



     Sometimes observers can become active participants if they are



also there as frame-of-reference probers.  At the other extreme are



the behavior coders who usually work from taped interviews.



 



                                345



 



 



    The Survey Research Center at the University of Michigan has



recently completed a study on pretesting techniques, among them,



behavior coding.  The goal of the study was to test techniques that



would enhance the usefulness of the traditional field pretest.  The



Michigan study used the coding scheme shown in Figure 2 to identify



interviewer and respondent behaviors that are symptomatic of



problem questions.  Trained coders listen to taped interviews and



apply the appropriate codes to each question.  The numbers and



types of codes for each question are, then tallied.  The benefit of



this technique, according to the study, is that is can provide



objective indicators of flawed questions [3].



 



 



                        Figure 2



 



  Interviewer Behavior              Respondent Behavior



 



Reads question with slight          Interrupts question



changes with answer



 



Reads questions with major          Requests clarification or



changes or does not complete        repeat of question



it



 



                                    Gives qualified but adequate



                                    answer



 



                                    Gives inadequate answer



 



                                    Gives "Don't Know" answer



 



                                    Refuses to answer



 



 



     What are the strengths of the observation technique?  It can



readily be incorporated into existing pretest plant.  Third party



observers can interpret the interviewer-respondent dynamic in a way



that the participants cannot.  And for many survey sponsors and



planners, actually seeing how the questionnaire performs in the



field is the most convincing evidence that changes need to be made.



 



     The limitation of the observation technique is evident from



the name - it can only detect observable questionnaire problems.



And when a problem is observed, the underlying cause may not be



obvious.  Individual observers will have limited experience with



the questionnaire unless they have observed a great many



interviews.  Therefore, agreement on what the problems are may be



difficult to achieve.



 



 



                                  346



 



 



Record Check Studies



 



    Record check studies are used not so much to evaluate question



wording, but to evaluate the validity of the data that is produced



by the questionnaire as a whole.  The accuracy of responses on a



particular topic are checked against an independent criterion,



usually administrative records.  For example, data from a health



survey that asks questions on doctor visits could be compared to



respondents' records maintained by their health care providers.  In



these studies, it is assumed that the administrative record



represents the "truth."  Respondent reports that do not correspond



to record data are counted as errors and a high error rate would



indicate that something is wrong with questionnaire or approach to



data collection.



 



    Record check studies pose numerous logistical challenges.  One



needs to obtain the cooperation of a records source.  Preservation



of confidentiality is often a problem.  The structure and quality



of the record system needs to be studied.  Is it adequate?



Matching criteria must be developed.  Does the record system



support the level of matching that is desired?  Matching



questionnaire data to record data is invariably more difficult than



anticipated, and many discrepancies have to be resolved.  Finally,



the results require thoughtful interpretation.  What are the



implications for the questionnaire?  What is it about the



questionnaire or other aspects of the survey design that are



contributing to response errors?



 



     Record check studies can provide objective evidence that a



questionnaire is collecting the information it is designed to



collect.  However, they can only be used to evaluate questionnaires



on topics for which independent records are available.  Clearly,



there are many types of human behavior that interest researchers



for which no records exist.  Compared to the other evaluation



methods described above, record checks are relatively time



consuming and costly.  But the costs have to be weighed against the



benefits.  For large, expensive surveys where data precision is



critical, evaluation by record check would make sense.



 



     Several variations of the record check study are possible.



Seeding the pretest sample with cases known to possess a target



characteristic is a scaled down version of a record check study.



It is not too difficult to implement if the characteristic is a



simple one and if it is not highly sensitive, having arthritis



versus having AIDS, for example.  Methodological studies can use



other validation sources besides administrative records.  Some



possibilities are respondent diaries, data collected from other



family members on the same topic, biochemical markers, medical



exams and so forth.



 



                                 347



 



 



Conclusions



 



     When there are so many ways to find out if a questionnaire is



performing as intended, there is no good reason not to do it.



Several of the techniques in the report are conducted in



conjunction with field testing, so that time and cost factors are



marginal.  Probing techniques can be applied in so many different



ways and at different levels of intensity that the technique can be



adapted to almost any evaluation objective or questionnaire type.



Laboratory facilities are advantageous, but not essential.



 



     It should be evident that no single technique will tell you



all you need to know about the adequacy of your questionnaire.  An



evaluation program that includes, several different sources of



information on question performance will be the most successful.



 



References



 



[1] Bercini, Deborah H. "Pretesting Questionnaires in the



Laboratory:  An Alternative Approach." (accepted for publication)



Toxicology and public Health.



 



[2] Royston, Patricia N. "Using Intensive Interviews to Evaluate



Questions."  Proceedings of the Fifth Conference on Health Survey



Methods Research, Keystone, Colorado, 1989.



 



[3] Cannell, Charles et. al.  "New Techniques for Pretesting



Survey Questions.  NCHSR #HS 05616, Survey Research Center,



University of Michigan, 1989.



 



 



 



                                348



 



 



DESIGNING QUESTIONNAIRES FOR CATI IN A MIXED MODE ENVIRONMENT



 



                             Gemma Furno



                      U. S. Bureau of the Census



 



 



1. Introduction



 



     The use of computer assisted data collection by Federal



statistical agencies has increased dramatically since Approaches to



Developing Questionnaires (Statistical Policy Working Paper No. 10)



was written in the early 1980's.  Utilization of computer assisted



telephone interviewing, or CATI, is now commonplace in many



agencies.



 



     CATI is an interactive system whereby the questions appear on



a terminal screen and the interviewer keys the answers directly



into the computer.  Branching paths are programmed into the system



and the next appropriate question is automatically presented.



Range and consistency edits can be programmed to allow for on-line



editing of data.  These telephone interviews are conducted from one



or more centralized locations.



 



     At the Census Bureau, CATI is often used for demographic



surveys in conjunction with field personal visit and decentralized



telephone interviews which use a paper and pencil questionnaire.



Typically, some portion of households interviewed previously are



assigned to CATI in this mixed mode environment.  How many are



assigned depends on several factors, such as the sample design



itself and optimum workloads for both the field and the centralized



telephone facility.  Personal interviews are reserved for first



time contacts where a visit to establish rapport has been found



beneficial, and to follow up cases, such as unable to contact and



refusals, that could not be completed on CATI.  Telephone



interviewing from the interviewer's home is often used for



returning cases not assigned to CATI.



 



     This description fits the current usage of CATI in the



national sample of the American Housing Survey, known as AHS.  The



American Housing Survey is conducted every two years.  CATI was



first introduced in 1987 and its use was expanded in 1989, when



approximately 25 percent of the sample was initially assigned to



CATI.  The AHS questionnaire is lengthy and complex, containing



over 125 main items, in addition to the household roster.  The



average interview time is approximately 30 minutes.



 



     This paper describes our experiences collecting data on the



American Housing Survey using both CATI and a paper and pencil



questionnaire from the perspective of CATI questionnaire design.



A summary of several data quality issues will be presented,



followed by a discussion of issues encountered in designing an AHS



 



 



                                 349



 



CATI questionnaire that was comparable to the paper version and



which minimized any problems when the two data sets were merged.



 



 



2. Issues of Data Quality



 



     Computer assisted data collection holds the promise of



improving data quality in several areas.  Those cited in the



literature [2,5,6,8,9,12] that are most directly related to



questionnaire design include the ability to:



 



 1.  control branching paths, thus helping to ensure that the



     correct questions are asked;



 



 2.  tailor question wording to the specific situation, thus



     relieving the interviewer from the burden of choosing



     alternative wordings, and helping to ensure that the



     questions are always worded correctly; and



 



 3.  evaluate the answers given for appropriateness and take



     corrective action through the use of on-line range and



     consistency edits, scripted probing, and dependent or



     reconciliation interviewing (using answers obtained in



     previous interviews to improve present answers).



 



     Intuitively one would think that these capabilities should



improve data quality.  But much research still needs to be done to



prove that it actually does, and to quantify the improvement [6,9].



It clearly has been shown that controlling branching paths does



ensure that the appropriate questions are presented on the screen.



But this alone does not guarantee that the interviewer actually



reads the question as worded, receives an acceptable answer or



enters it correctly [1,5,6,9].  In reality, entries of "don't know"



or "refused" are allowed for most items, range and consistency



edits cannot catch all respondent or interviewer errors, and



scripted probing and dependent or reconciliation interviewing have



practical limitations.



 



     Data from the AHS preedit reject operation illustrate these



points.  In AHS, the field preedit operation is designed to



identify and correct certain clerical, keying and consistency



errors in order to improve control of the sample and the quality of



survey data before it goes through the regular range, consistency



and blanking edits.  Approximately half of these reject reasons



involve consistency checks within the household roster.  The CATI



data was put through the preedit program to help evaluate and



ensure its quality.  The results of the preedit operation for 1989



show that 11.6 percent of the 8,794 completed CATI cases rejected



with an average of 1.20 rejects per case, while 42.6 percent of the



49,279 cases completed on the paper and pencil form rejected with



an average of 1.76 rejects per case.



 



 



                                350



 



    In addition to vastly improving the control of the sample (no



rejects for duplicate records, mistakes in the control number,



sample status etc.), results for CATI also show that most reject



reasons related to the questionnaire were reduced or eliminated.



For example, missing data on several key items such as type of



living quarters, roster line number, relationship code, reference



person, and heating equipment were eliminated due to the automatic



branching feature.  Where consistency checks were programmed into



the CATI questionnaire to identify roster errors such as



inconsistencies between birthdate and age, two spouses recorded for



the same person, unmarried person has a spouse, etc., the



corresponding preedit reject was greatly reduced.  However, where



roster consistency checks were not in place, some errors remained.



Time constraints and the size of the CATI questionnaire have



prevented programming all the appropriate checks.  The practicality



of adding more of these checks will be investigated.



 



     For other reject reasons not showing improvement, we found a



number of explanations.  For example, preliminary review indicates



that for one reject the CATI interviewers accepted "don't know"



entries at a much higher rate than the field interviewers.  The



item asked for the number of units in a multiunit building.  This



may well be related to telephone interviewing in general rather



than CATI per se.  However, now that the problem has been



discovered, better interviewer training and/or a scripted probe for



a "don't know" answer could be added to CATI for this item.  In



another instance, a high rate for a reject reason disclosed a flaw



in the CATI questionnaire, which will be corrected.



 



     Adding more roster checks and a careful review of the other



reject reasons should lead to improvements that will further lower



the number of CATI cases rejecting in the preedit operation,



although some interviewer and respondent errors will inevitably



remain.



 



     Another indicator of quality in the AHS CATI data involves a



reconciliation study conducted for selected items.  In the 1987



interview, these items were tenure, type of basement, number of



bedrooms and bathrooms, heating fuel, heating equipment, rent and



home value.  If the 1987 CATI response failed certain tolerance



limits, compared to the 1985 response, then the answers were probed



at the end of the interview to discover the reason for the



discrepancy.



 



     Of the 6,432 cases completed on CATI in 1987, 54.8 percent



failed the comparison on at least one of the items, triggering the



reconciliation questions [11,13].  For all the items reconciled, an



average of 49% of the respondents reported some plausible



explanation for the discrepancies between the two survey periods.



For example, a half bath was converted to a full bath, a different



type of heating equipment has been installed, local real estate



conditions affected rent or house values, etc.



 



                                 351



 



    However, that left an average of 51 percent of the respondents



reporting that the response was incorrect in either 1985 or 1987,



thus the change between survey years was spurious. (One caveat is



that some of these cases may actually represent real status changes



that were incorrectly classified due to response error in the



reconciliation question itself.)  An interesting result it that



respondents were almost as likely to point the finger at the answer



they had just given in the 1987 interview as the 1985 one.



Forty-nine percent said the 87 answer was wrong compared to



fifty-one percent for the 85 answer.  But the reconciliation



questions did not attempt to ascertain why the 87 response was



wrong - did the interviewer read the question incorrectly, did the



respondent or interviewer misunderstand or did the interviewer



enter it incorrectly?  This result offers another reminder that a



CATI questionnaire may offer the potential to improve data quality



in some areas, but it is not a panacea.  Future AHS reconciliation



studies will try to better ascertain the cause of these errors.



 



3. Issues Encountered When Designing the AHS CATI Questionnaire



 



    When designing a CATI questionnaire to be used in a mixed mode



environment with a paper and pencil questionnaire, the paper form



and its associated procesting system has usually been in use for a



number of years.  In such applications, the CATI questionnaire



generally is expected to conform to yield comparable data and



expeditious processings of data from both modes.



 



    In this situation, the CATI questionnaire has to serve "two



masters".  First, it should satisfy the basic objectives of CATI



questionnaire design.  For example, House and Nicholls stress that



a CATI questionnaire must conform to the general accepted standards



of questionnaire design while functioning as a complex computer



program [7,8,10].  The program must ensure that the questions work



correctly under all circumstances and that minimum demands are made



on hardware resources while maintaining rapid response times.  But



secondly, in a mixed mode environment, a CATI questionnaire must



meet these requirements while providing comparability with an



existing paper and pencil version and minimizing any problems



encountered when the data is processed.  Usually the CATI data is



reformatted, then merged at some point with the data collected on



the paper form and processed through the existing system.  A single



processing system saves time and money and ensures that any complex



edit and data imputation/allocation procedures are consistently



applied, regardless of collection method.



 



A.  Numbering of CATI Questions and System Commands/Instructions



 



    A basic issue of CATI questionnaire design is the numbering



scheme used for the questions and system commands/instructions.



This can have important implications when a complex questionnaire



 



                             352



 



is used in a mixed mode environment.  Two possibilities are to



utilize the actual question numbers, or if different, any



processing code numbers.



 



    The AHS questionnaire contains several sections of duplicate



or parallel items because there are different sections of the



questionnaire for renters and owners.  Within both of these two



major subsections there are several further subsets of parallel



items based on type of housing unit.  All questions on the paper



form have a unique item number, but a duplicate or parallel item



shares the same processing code as only one version of the item



could be asked in an interview.



 



    The question arose as to how to handle these duplicate or



parallel sets of questions in CATI - should the basic question be



programmed only once and the system programmed to alter the



question wording and its universe as appropriate, or should the



design of the CATI questionnaire follow the paper form as closely



as possible?  We chose the latter course for AHS, that is, to



follow the paper form as closely as possible, and thus utilize the



item numbers rather than the processing codes.  There were two



reasons for this.  First, the universes for the duplicate or



parallel items are extremely complicated in AHS.  Entering the



basic question only once and programming the system to alter the



wording and universes as needed would not have saved the CATI



author any time as the system instructions and documentation would



have just become more complicated, prone to error and difficult to



test out.  Secondly, with both CATI and paper questionnaires in use



simultaneously, and separate training materials to be written, our



goal was to move easily from one questionnaire and set of materials



to the other without confusion.



 



    This compatibility between questionnaires proved especially



helpful when it came to writing the specifications and programs to



reformat the CATI output to be merged with the paper and pencil



data.



 



 



B. Question Wording, Fills and Answer Categories



 



     We encountered little difficulty in transferring the actual



question wording to CATI.  The paper and pencil form had already



been adapted for telephone use in the field several years before.



However, a few problems had to be dealt with.



 



     When collecting data under both modes, it becomes difficult to



change the wording close to the start of the survey if you want to



keep the questions as comparable as possible.  For example, the



CATI interviewers found the wording of one question particularly



awkward but it could not be easily changed because the paper form



was already printed.  The sponsor did not feel comfortable changing



the wording on only the CATI questionnaire.  A revision had to wait



 



                           353



 



until the next time the survey was conducted when it was made to



both.



 



     Other situations involved question "fills."  That is, using



information previously obtained to tailor the exact wording to the



situation.  This is one of the jobs that CATI does best, but to



display the question correctly, the system obviously must be



programmed to distinguish among the wording choices.  This



sometimes required that answer categories be expanded.  For



example in Figure 1 below, the paper and pencil version of item



120g (on means of transportation) groups "cars, truck and van" into



a single answer category.  The interviewer substitutes the specific



response in the following question, 120h.  In CATI, three separate



categories are required if subsequent questions are to use the



answer given.



 



 



 



                         Figure I



     Q120g - 120h On The Paper & Pencil Questionnaire



 



Click HERE for graphic.                          



 



 



 



 



 



 



 



 



    Figure 1 also illustrates the situation where a subquestion on



the paper form is imbedded in the middle of the main question with 



the answer categories numbered as if it is one continuous question.



Figure 2 below shows what this series looks like in CATI.  In CATI,



the subquestion appears on a separate screen but the answer



categories are numbered 1 and 2 instead of 2 and 3.  CATI



interviewers are used to seeing the categories in numerical order,



 



                           354



 



starting with one.  It would have been confusing to present the



question on a separate screen with the categories numbered any



other way.



 



 



 



 



 



 



 



 



                           355



 



                    Figure 2



   Q120g - 120h In CATI (Answered For Mary Smith)



 



 



 



 



 



  >Q120g< How did MARY SMITH usually get to work Last week?



 



(MARK ITEM THAT ACCOUNTED FOR GREATEST DISTANCE TO LOCATION



OF JOB AT WHICH PERSCN WORKED MOST HOURS LAST WEEK.)



 



             <1>  Car



             <2>  Truck



             <3>  Van



             <4>  Bus or streetcar



             <5>  Subway or elevated



             <6>  Railroad



             <7>  Taxicab



             <8>  Motorcycle



             <9>  Bicycle



             <10> Other vehicle



             <11> Walked only



             <12> Works at home



 



             ===> 3    (System is programmed to



                        display Q120g 1 if 1-3



                        is answered here.)



 



 



  >Q120g1< Did MARY SMITH drive alone or go with other?



 



             <1> Alone



             <2> Go with others



 



             ===> 2      (System is programmed to



                          display Q120h if 2 is



                          answered here.)



 



 



  >Q120h< How many people including MARY SMITH usually ride in the



van?



 



               <01-97> 1-97



                  <98> 98 or more



 



                ===>7



 



 



 



 



 



 



 



 



                                 356



 



C. The Reformattng Stage



 



    Our processing goal for the CATI questionnaire was to produce



output that could easily be reformatted to look exactly like the



data keyed from the paper and pencil form.  This allowed for



merging the two data sets at the earliest possible opportunity and



running the combined file through the current processing system.



We encountered several situations that affected the reformatting



operation.



 



1. Reformatting the CATI Data



 



    When the CATI question and answer categories closely



corresponded to those on the paper form, reformatting was straight



forward.  However, if it did not, then more complicated reformat



specifications had to be followed.  Some simple examples of more



complicated situations include questions where, as seen in figures



1 and 2, expanded CATI answer categories had to be collapsed back



or separate CATI questions had to be combined to match the compact



style and embedded questions of the paper form.  Although some of



this reformatting was performed in the CATI questionnaire itself,



most was completed in batch mode after data collection.  Performing



most of these tasks later reduced the number of variables in the



CATI questionnaire, an important consideration since the



questionnaire was quite large.  While the Census CATI system does



allow for very large and complex questionnaires, there are system



limits which the AHS questionnaire reached.



 



    Table and grid formats also frequently required reformatting.



The CATI author had to ensure that these questions were programmed



in such a way that the data could be successfully reformatted



later.



 



2. Adding Special Data Items Not Needed for CATI Questionnaire



 



    Not all items on the paper and pencil questionnaire are needed



or are relevant to the CATI questionnaire.  Some are standard



housekeeping input items, such as sample designations or geography



codes that can't be changed during the interview.  Others are



output items that summarize respondent or housing unit



characteristics from previously answered items.  The question



arises as to whether such items should be carried on the CATI input



or output files or merely added in batch mode when the CATI data



are reformatted.



 



    In deciding which course to follow, we gave consideration to



balancing the size and efficiency of the CATI questionnaire with



the difficulty of adding these items later.  The system size



constraint and the ease of taking care of these items dictated that



 



                                 357



 



we add them during reformatting while taking special care not to



overlook relevant items.



 



     As can be seen from these examples, complex questionnaires and



processing needs often require special effort to ensure



comparability of the CATI data to that collected on the paper form.



 



4. Summary



 



     While great improvement in data quality was seen as a result



of automatic branching and on-line consistency checks, there is



still room for improvement in the AHS CATI questionnaire.  It must



also be remembered that the questionnaire can not shoulder all the



burden for improving data quality.  Respondent and interviewer



actions over the telephone, whether in a centralized or



decentralized environment, must also be considered.



 



     Our experiences on AHS showed a large, complex paper and



pencil questionnaire successfully transferred to CATI.  Constant



effort by a number of people, including the specifications writer,



the CATI author, and other programmers was needed to accomplish the



task.



 



     In AHS, it was the CATI questionnaire that was expected to



conform to the paper and pencil form and the current processing



system.  This does not always allow CATI to be used to its fullest



potential but it is a common situation that must be faced when CATI



is added to an already existing paper and pencil survey.  As new



surveys are developed with a mixed mode of data collection planned



from the beginning, different experiences will result.



 



References



 



1. Catlin, G. and Ingram, S. (1988), "The Effect of CATI on Data



Quality:  A Comparison of CATI and Paper Methods", Proceedings of



the Fourth Annual Research Conference, U. S. Bureau of the Census,



pp. 291-299.



 



2. Dillman, D. and Tarnai, J. (1988), "Administrative Issues in



Mixed Mode Surveys," in Telephone Survey Methodology, Robert G.



Graves et al, (editors), John Wiley & Sons, New York, NY.



 



3. Ferrari, P. (1984), "Preliminary Results from the Evaluation of



the CATI Test for the 1982 National Survey of Natural Scientists



and Engineers", unpublished research report, U.S. Bureau of the



Census.



 



4. Ferrari, P. (1986), "An Evaluation of Computer-Assisted



Telephone Interviewing Used During the 1982 Census of Agriculture,



 



                                358



 



unpublished report, Agriculture Division, U. S. Bureau



Census.



 



5. Groves, R. and Mathiowetz, N. (1984), Computer-Assisted



Interviewing:  Effects on Interviewers and Respondents", Public



opinion Quarterly, 48(1B), pp. 356-3691.



 



6. Groves, R., and Nicholls, W. (1986), "The Status of



Computer-Assisted Telephone Interviewing:  Part II - Data Quality



Issues, "Journal of Official Statistics, 2(2), pp. 117-134.



 



7. House, C. (1985), "Questionnaire Design With Computer-Assisted



Telephone Interviewing", Journal of Official Statistics, 1(2), pp.



209-219.



 



8. House, C. and Nicholls, W. (1988), "Questionnaire Design for



CATI:  Objectives and Methods", in Telephone Survey Methodology,



Robert G. Groves et al, (editors), John Wiley & Sons, New York, NY.



 



9. Nicholls, W. and Groves, R. (1986), "The Status of



Computer-Assisted Telephone Interviewing:  Part I - Introduction



and Impact on Cost and Timeliness of Survey Data", Journal of



Official Statistics, 2(2), pp. 93-115.



 



10. Nicholls, W. and House, C. (1987), "Designing Questionnaires



for Computer-Assisted Interviewing:  A Focus on Program



Correctness", Proceedings of the Third Annual Research Conference,



U.S. Bureau of the Census, pp. 95-1ll.



 



11. Nicholls, W. (1989), "The Impact of High Technology on Data



Collection", CATI Research Report No. Gen-1, Computer-Assisted



Interviewing Central Planning Committee, U. S. Bureau of the Census.



 



12. Van Bastelaer, A., Kerssemaklers, F. and Sikkel, D. (1988),



"Data Collection With Hand-Held Computers:  Contributions to



Questionnaire Design", Journal of Official Statistics, 4(2), pp.



141-154.



 



13. Schwanz, D., Montfort, E. and Cannon, J. (1988), "Analysis of



Operational Issues: 1987 AHS-CATI", CATI Research Report No.



AHS-1, Computer-Assisted Interviewing Central Planning Committee,



CATI Research and Analysis Sub-Committee, U. S. Bureau of the



Census.



 



 



 



 



 



 



 



 



                              359



 



                          DISCUSSION



 



                         Carol C. House



            National Agricultural Statistics Service



 



    Dillman and Tarnai (1988) define a mixed mode survey as one



that uses two or more methods to collect data for a single data set



which will be analyzed as a unit.  Familiar examples include face-



to-face first wave interviews followed by telephone or mail on



subsequent waves; and telephone follow-up to a mailed



questionnaire.  Most large Federal statistical agencies routinely



use mixed mode surveys to collect data.  The Furno paper focuses on



the 1987 American Housing Survey which uses a combination of face-



to-face, decentralized telephone, and CATI interviews in its mixed



mode design.  The National Agricultural Statistics Service (NASS)



sometimes incorporates mail, centralized (non-CATI) telephone,



CATI, face-to-face, and decentralized telephone interviews in a



single survey.



 



    Why are survey organizations choosing mixed mode designs over



simpler single mode surveys?  Their objectives appear to be to



reduce survey costs, improve timeliness, and to take advantage of



the relative strengths of different modes of collection.  At the



same time they want to preserve data comparability and integrate



the mixed design into the data collection, data handling and data



manipulation processes that are already in-place in the



organization.  This over simplifies the decision making process,



but it fits most large Federal agencies.



 



    I will use this set of objectives as a basis to evaluate the



design of the CATI questionnaire discussed in the Furno paper.



This paper describes adding CATI to an existing mixed mode survey



featuring face-to-face and decentralized telephone interviewing.



The CATI questionnaire was designed to work effectively in that



specific environment.  The author discusses issues related to



question wording , editing and data processing because a CATI



questionnaire design impacts all of these areas.



 



Cost Reductions and Improvements in Timeliness



 



    The Census Bureau probably achieved most of the gains in these



areas when they originally mixed face-to-face interviewing with



decentralized telephoning.  They may see additional cost savings by



adding CATI to this mix, but any such gains are likely to be



minimal.  However, a specific discussion of cost were beyond the



scope of this paper.



 



    The literature usually asserts that timeliness can be improved



by CATI because the data is immediately entered into a computer



without a separate data entry step.  Secondly, and more



 



                             360



 



importantly, data is edited during the interview and is "clean" by



 the time the data collection is over.  But what actually happens to



 CATI data from the American Housing Survey?  Furno reports that,



 "Our processing goal for the CATI questionnaire was to produce



 output that could be easily reformatted to look exactly like data



 keyed from the paper and pencil form... and [to] run the combined



 file through the current processing system."  Thus the "clean" data



 from CATI is dumped in with "dirty" data from other data collection



 modes and the whole file is run through the standard batch editing



 programs.  This results in no improvements in timeliness.



 



     Her approach is not uncommon.  The goal of easy assimilation



 frequently takes precedence over improvements in timeliness as well



 as several other objectives of the mixed mode design.  This



 decision may be necessary during early experimental uses of CATI,



 but as CATI (and soon CAPI) become ongoing parts of a survey



 organization, we need to find ways to integrate CATI data into data



 processing programs with less duplication of effort.



 



 



 Tapping the Strengths of Different Data Collection Modes



 



     The American Housing Survey's instrument incorporates a number



 of CATI features to improve data quality.  These include



 controlling the branching paths, tailoring question wording to the



 individual respondent, and online editing.  Furno measures the



 gains in some of these areas through two different comparisons:  by



 counting the number of rejects in a subsequent, "pre-edit" program;



 and by conducting a reconciliation study.



 



     The pre-edit program is designed to make simple checks or



 clerical and keypunch errors prior to the data entering more



 sophisticated and complex editing Programs.  Furno measures 43%



 rejects on the paper versions and 12% rejects on CATI.  This



 demonstrates substantial improvements using CATI.  However, one



 wonders why there should be any of these very simple errors in the



 CATI data.  The author indicates that not all of the checks were



 added to the CATI instrument, so this is an area for possible



 improvement in the questionnaire.



 



      The reconciliation study was conducted at the end of the



 interview for selected items, comparing responses with



 corresponding answers obtained during the 1985 survey.  These items



 (such as number of bedrooms in the house) were expected to be



 fairly constant over the two years.  This study uncovered reporting



 errors on both the 1985 and 1987 surveys, with approximately the



 same number of errors occurring each year.  This fact indicates



 that the CATI questionnaire (new in 1987) did not significantly



 improve the data quality in some areas.  More detailed studies of



 this type will possibly uncover the causes of these errors and lead



 to improvements in both the CATI And paper questionnaires.



 



 



                               361



 



    The reconciliation study was kept completely independent of



the main part of the questionnaire.  The original responses on the



survey were not changed based on reconciliation, although CATI



technology would have made it easy to do so.  This brings up a



broader issue to consider for panel survey:  is it appropriate to



use previously collected data to edit or influence current



responses?  Is this practice any less appropriate on mixed mode



surveys where certain modes (CATI) would use this earlier



information and other modes would not?  It is unclear whether using



previously collected information would improve overall data quality



or merely heighten inconsistency and variability in the error



structure of a mixed mode survey.  NASS is struggling with these



issues and we would appreciate reaction and experiences from other



groups.



 



Keeping Data Comparable Across Modes



 



    This was one of the primary objectives of the designers of the 



American Housing Survey's CATI instrument and the Furno paper



concentrates on these issues.  Discussions include the ways to



program questions on CATI that appear in tabular form on a paper



form; handling fills; using consistent answer codes; and handling



last minute questionnaire changes.



 



Integrating A Mixed Mode Design Into Existing Survey Processes



 



    When new technologies or new modes of data collection are



added to an on-going survey it is important to cause as little



disruption to the routine as possible.  This was the situation with



the CATI test on the American Housing survey, and the Furno paper



describes the efforts to which the designers went to make CATI fit



into the existing design.  The CATI version of the questionnaire



was always made to conform to the paper version, and the CATI



processing and editing to conform to, and go through the existing



batch programs.  The disadvantage of this approach is that some



(much?) of the advantages from CATI were lost in the mixed mode



design.



 



    CATI is here to stay in telephone surveys and CAPI is just



arriving.  These technologies will be used routinely in mixed mode



designs.  How do we handle the integration of CATI and paper once



the testing phase is over?  Although it may be reasonable to make



a CATI questionnaire conform to a paper version when early testing



is going on, it is not reasonable to retain that unbalanced



relationship later after 75% to 80% of the Contacts are made with



the CATI version.  This situation can and does happen, because the



test version is implemented operationally with minimal revisions.



It is time to re-evaluate CATI/CAPI technology in mixed mode



designs.  The modes must fit together into a single survey



operation and produce compatible data.  However, we need to look



 



                                  362



 



 



for better ways of integrating existing technologies with the new



so that total quality is optimized.



 



Reference



 



Dillman, Don A. and Tarnai, John, "Administrative Issues in Mixed



Mode Surveys," in Robert M. Groves (ed.), Telephone Survey



Methodology, New York, Wi1ey, 1988.



 



 



 



 



 



 



 



 



                                   363



 



 



 



 



                                364



 



 



                               Session 1l



                     STATISTICAL DISCLOSURE - AVOIDANCE



 



 



 



 



 



 



 



 



                                   365



 



 



 



 



                                 366



 



 



                 DISCLOSURE AVOIDANCE PRACTICES



                       AT THE CENSUS BUREAU



 



                          Brian Greenberg



                    U. S. Bureau of the Census



 



 



I. Introduction



 



    The Census Bureau, as well as other statistical agencies,



collects information about the Nation's population and institutions



and releases this information to the public.  The information is



typically collected under pledges of confidentiality and agencies



are required to release data in such a manner so as not to violate



guarantees of non-disclosure either through design or neglect.  At



the same time, date collection agencies have the responsibility to



make statistical information available for a wide range of uses



that include policy decision making, program analysis, economic



modeling, and many others.  A data collection agency has the



obligation to release as much information to the public as possible



while adhering to pledges of confidentiality given to respondents.



Broadly speaking, the objective is to release as much information



as possible consistent with the requirement that the risk of



disclosure is acceptably low.



 



     There is no known way to quantify the amount of information



released or to quantify level of risk of disclosure.  Finding



methods to relate the levels of information and levels of risk is



an area of very active research at the Census Bureau and at other



statistical agencies in this country and abroad.  In a recent paper



and talk at the Census Bureau 1990 Annual Research Conference



(Greenberg 1990), I discussed disclosure avoidance research



activities at the Census Bureau.  The report focused on the work to



develop data release strategies through the use of tools of



operations research, mathematics, and statistics.  We discussed



research efforts here at the Bureau, studies conducted under Joint



Statistical Agreements, and other cooperative efforts with



researchers on this topic.  In that paper we describe the



mathematical programming methods to design controlled rounding and



suppression routines, the statistical techniques for data



perturbation, and the more probabilistic analysis to attempt to



evaluate risk.  That paper contains an extensive bibliography and



should be regarded as a companion to this one for the understanding



of the underlying mathematics and methods.  Although there will be



some inevitable overlap between this report and the Annual Research



conference paper the focus here will be on practical



considerations in the design of a product for data release and a



description of current programs, planned products, and options



which are available.



 



     The overall theme of this Seminar is Quality of Federal



Statistics.  In addition to the notion of accuracy, other aspects



 



                                 367



 



*of quality are timeliness and completeness.  From the perspective



of disclosure avoidance activities, we address the issues accuracy



and completeness.  We cannot release full and accurate detail on a



public use file because that would exceed any reasonable level of



disclosure risk.  By taking measures to have acceptably low levels



of risk we compromise completeness and/or accuracy.  In designing



a data release strategy we must evaluate the trade-off between



completeness and accuracy and between completeness for one data



attribute at the expense of completeness for another.  To reduce



levels of disclosure risk, one either suppresses information and



collapses categories or introduces noise.  Both these actions can



be thought of as data masking.  Under the first option we reduce



completeness while under the second we reduce accuracy.



 



    Earlier, we introduced the idea of "amount of information"



versus "level of risk" and indicated the need to Optimize amount of



information while maintaining an acceptably low level of risk.  We



can think in terms of accuracy and completeness as components of



level of information and evaluate the trade-off with acceptable



levels of risk (which is much harder to characterize).  This theme



will run through-out the paper, and made explicit or not, this



theme pervades the design of any data release strategy.



 



    In Section II we discuss tabular data, including tables of



amounts which are bated on our economic surveys and censuses and



tables of frequency counts which appear in the Summary Tape Files



(STF's) from the Decennial Censuses.  In Section III, we discuss



public use microdata.  Public use microdata files are released as



standard products from virtually all demographic programs and they



are extensively used by researchers in many areas.  In fact, the



public use microdata files for the Survey of Income and Program



Participation form the major data product from that survey.



Section IV consists of a brief summary.



 



 



II. Tabular Data



 



A. Frequency Counts of Demographic Characteristics



 



    Cross-classified tables of frequency counts of demographic and



housing characteristics constitute one of the major formats for



release of data from the Decennial Censuses.  For example, one such



cross-classification can look like Table 1 below.



 



 



                             368



 



          



 



Click HERE for graphic.                          



 



            Table 1. Block Group 1 - Age by Sex



 



 



    The major disclosure risk in the release of such tables occurs



when a small value appears in a marginal position.  If an



investigator examines a table and knows the identify of the person



or persons having marginal characteristics as indicated, the



investigator could infer other characteristics of the respondent



through the cross-classification.  In so doing, the investigator



would learn of information provided to the Census Bureau in



confidence.  The way to reduce this disclosure risk is to suppress



cells with low marginal values or introduce uncertainty into cell



counts.



 



    Suppression was used for frequency counts from the 1980 Census



and for earlier Censuses.  If a marginal value was below a



specified cut-off, all cells summing to that marginal were



suppressed.  That is, if the cut-off were 10, Table 2 would have



become Table 3.



 



 



 



Click HERE for graphic.                          



 



 



 



 



             



 



 



    In order to prevent deriving the third row in Table 3 by



subtracting the non-suppressed rows from the totals row, at least



one more row must be suppressed.  Suppressed values in Row 3 are



 



 



                              360



 



primary suppressions.  Row 1 was chosen for the complementary



suppressions, denoted by "C", to protect the primary as shown in



Table 4.  There were two problems with this method for disclosure



avoidance.  On one hand, due to the need for complementary



suppressions, there were sometimes large values suppressed as



complementary cells to protect small primary cells.  This was



considered a major draw-back for data users.  The other problem



with this procedure was that it was often difficult to guarantee



geographic complementary suppressions.  For example, if one or



more data cells is suppressed for exactly one county in a state,



then the suppressed value can be derived exactly by subtracting the



value of all other counties from the state total.  To avoid this



from occurring, geographic complementary suppressions are required.,



It was clear that procedures to ensure complete complementary



geographic suppressions would also take their toll in suppression



of even more information.  This realization led to a recognized



need to develop disclosure avoidance procedures for tables of



cross-classified frequency counts along the lines of data



distortion in order to introduce uncertainty into the data.



 



     The method to be used as a disclosure avoidance measure on



1990 Census frequency count tables introduces uncertainty into the



tables by changing some values.  The basic idea is as follows and



I quote, virtually verbatim, from (Greenberg 1990).  For a subset



of records, field values on a record will be replaced with field



values on a different record having the same control



characteristics so that the newly created records will be different



on potentially all characteristics except the controls.  This



method has been called the Confidentiality Edit because of the use



of a hot-deck similar in spirit to the hot-deck used in edit and



imputation procedures.  Given a target record on which some changes



are to be made, based on specified control characteristics the



system matches the target record to another record and "hot-decks"



the remaining non-control variables.  To be a little more specific,



I paraphrase from (Griffin, Navarro and Flores-Baez 1989).  The



Confidentiality Edit selects a small sample of census household



records from the internal census data files and interchanges their



data with other households which have identical characteristics on.



a set of selected key variables but are in different geographic



locations.  The matching and interchanging operations are



controlled on the key variables of number of persons in household;



population characteristics of race, Hispanic origin and age; and on



housing characteristics of units in building, rent/value and



tenure.



 



     Because of the controls described above, census counts for



total persons, and totals by race, Hispanic origin and age, 18 and



above.  These counts provide information required for voting rights



as outlined in Public Law 94-171.  In addition, housing counts by



tenure will not be affected by the Confidentiality Edit.  The



interchange of information on records will be accomplished on the



detail file of records.  The revised records will be used to



 



                               370



 



generate all tables so that there will be no inconsistencies



between tables and the revised records will also be used to produce



other Census products.  Three advantages of Confidentiality Edit



include:  (1) this procedure needs to be implemented only once on



internal files to obtain protection for all Summary Tape File data



products, (2) all data cells can be shown on Summary Tape Files so



there is no interference with data aggregation by users, and (3)



more data values will be available than in 1980.  These procedures



have been evaluated for their impact on data products and details



of the analysis are contained in (Griffin and Thompson 1987) and



(Navarro, Flores-Baez, and Thompson 1988).



 



    For tables of frequency counts for the 1980 Census and



earlier, there was a reduction in completeness through the use of



suppressions to achieve an acceptable level of disclosure risk,



Due to the need for complementary suppressions, the overall effect



of a suppression pattern caused more loss in completeness than



desirable.  For the 1990 Census, by interchanging values on the



detail record file, there will be a loss of accuracy through the



interchange of information between records.  The papers, cited above



contain studies to show that loss of data utility due to this



reductions of accuracy is not significant.



 



 



B. Aggregate Economic Data



 



    The primary method for releasing data from Census Bureau



establishment surveys or censuses is in the form of cross-



classified tables of amounts.  For example, in a given state the



total value of shipments may be cross-classified by SIC and by



county.  A cell is regarded as sensitive (i.e., having an



unacceptable high disclosure risk) if the (N,K) -rule is violated,



that is, if N or fewer respondents account for at least K% of the



total cell value.  Such cells are regarded as primary suppressions



and they are not released.  If only primary cells are suppressed,



their values often can be derived exactly, or closely estimated



through linear analysis using marginal totals.  To prevent this,



complementary suppressions are introduced, and one seeks a set of



complementary suppressions which protects the sensitive cells yet



suppresses as little additional information as possible.



 



     We illustrate these ideas with a few (artificially) simple



examples.  Consider Table 5 in which cell (2,2) is considered



sensitive because it failed the (N,K)-rule.  We place a "P" in



position (2,2) to indicate a primary suppression, and introduce a



set of complementary suppressions, for example, as in Table 6.



 



 



                                371



 



 



Click HERE for graphic.                          



 



   Given a suppression pattern in a table, the values of all



suppressed cells (primary or complementary) can be estimated.  To



indicate how this is done, we return to Table 6 which we rewrite as



Table 6'.



 



From Table 6', we have the system of equations:



 



Click HERE for graphic.                          



 



 



 



 



 



 



   Note that Table 8 and Table 9 both display patterns of



complementary suppressions.  In Table 8 three complementary



suppressions were introduced while in Table 9 four complementary



suppressions have been introduced.  The sum, of complementary



suppressed values in Table 8 is 295 and the sum of complementary



suppressed values in Table 9 is 135.  For the 1982 and 1977



Economic Censuses, the criterion for selecting a set of



complementary suppressions was to suppress as few complementary



 



                              372



 



cells as possible to protect the primary suppressions.  For the



1987 Economic Censuses, we have implemented the criterion of



suppressing the least total value.  Thus, given Table 8 and Table



9 above, the preferred complementary suppression pattern under our



current criterion will be as in Table 9 since less total value



would be suppressed.  In 1982 and before, the preferred pattern



would have been as in Table 8 since fewer cells are suppressed.



 



    The basic disclosure avoidance method for the release of



cross-classified aggregate economic data at the Census Bureau is



cell suppression.  That is, reduction in completeness.  This method



seems to work well, especially as users can estimate the value of



suppressed cells within acceptable limits.  Whether we employ the



criterion of minimizing the number of complementary suppressions or



the total value that was suppressed constitutes a selection of



methods within an overall strategy.  We are currently investigating



how procedures for finding complementary suppressions can be



improved for the 1992 Economic Censuses.



 



 



III.  Microdata



 



    Microdata records ate data records at the respondent level and



the risk in the release of a microdata file is that someone may be



able to discover the identity of a respondent.  The risk can arise



from the presence of highly visible and unique characteristics, or



it may stem from the threat of matching public use microdata files



to other files either privately or publicly held.  For the latter



threat of linking two files, some of the issues are:  what data are



available on both files, how comparably reported are the data, how



up-to-date are they, and how easily accessed are the records?  In



particular, one must ask the cost to an investigator to carry out



such a project.  All these factors contribute to a picture of



overall risk.



 



    The basic strategy for the release of general purpose public



use microdata files at the Census Bureau is to reduce completeness



by restricting the level of detail on the file.  Instead of



releasing exact date of birth, we can release month, or quarter, or



year.  Percentages can be grouped into deciles or quantiles.



Income can be recoded into intervals of size, for example, $4,000



for income up to $100,000 and all income in excess of $100,000 can



be topcoded to read as "$100,000 or more".  Virtually all



quantitative variables on public use microdata files are topcoded



to obscure high visibility respondents and to reduce the likelihood



of successful computer matching by removing outliers.  In



considering reduction in completeness or reduction in accuracy as



disclosure avoidance practices, the Census Bureau tends to strongly



favor reduction of completeness for the release of microdata files.



By so doing, we are better able to maintain a broad range of



utility for the files.



 



 



                                373



 



    For some special purpose microdata files noise has been added



to variables (in addition to topcoding and using categories) in



order to further frustrate the ability for successful computer



matching, see (Greenberg 1990) for a further discussion.  Such



files can be created when we know in advance intended uses so we



can design a noise introduction strategy to suit specified needs.



 



    One of the most important fields oh a microdata record is the



geographic identifier.  Geography is the single identifier which



cuts across all public use microdata files and is a field in which



there is little error.  Under current Census Bureau procedures, no



area having fewer than 100,000 persons in the sample frame can be



identified on a microdata record.  This minimum can be raised for



surveys which have a presumed greater disclosure risk.  This was



the case for the Survey of Income and Program Participation (SIPP)



whose geographic cut-off was set to 250,000 by the Microdata Review



Panel because of the fine level of detail on SIPP and the



longitudinal nature of the survey.



 



    Prior to 1981, each operating division had responsibility for



the confidentiality of any public use microdata sample released by



the division.  At that time, no geographic area could be shown



having fewer than 250,000 residents in the sampling frame.  The



Microdata Review Panel was established in 1981 to review all



proposed new microdata files for release.  No new microdata file



can be released by the Census Bureau without Panel approval.  At



that time, the geographic minimum was reduced to 100,000.  The



Panel is composed of representatives from Data Users Services



Division, Program and Policy Development Office, Demographic



Surveys Division, and representatives from the Associate Directors



for Economic Fields, Demographic Fields, and Statistical Standards



and Methodology.  This Panel make-up reflects broad Census Bureau



concern.



 



    As part of the review process, survey staff seeking release



approval must fill out a disclosure checklist which asks about



identifiable geography, matching potential, topcodes, etc.  The



Panel typically meets with survey staff to discuss problems to seek



a resolution.  The Panel may request additional topcodes, deletion



or recoding of some variables, and other actions to reduce



disclosure risk.  At times the Panel will request cross-tabulation



frequency counts to observe if there are outlying combinations of



values.  The Panel may recommend changes; however, it is more



typical for the Panel to point out problems and leave it to the



survey staff to find solutions based on their understanding of



intended uses of the file.  Survey sponsors attend Panel meetings



to discuss options and assist in the determination of risk and



resolution options.



 



    There are often a number of options available to reduce risk



on a file.  For example, typically one of several variables can be



recoded to reduce the possibility of matching to external files.



 



                               374



 



At times, and depending on perceived user needs, geographic



specificity can be reduced.  That is, one can provide more



potentially identifying demographic characteristics on a national



file (i.e., no subnational geography) than on a file that



identifies a relatively small geographic locale.  By and large, one



must think in terms of trade-offs between the various data items



and their relative completeness.  In a public use microdata file,



it is not possible to provide a very complete and accurate file due



to an unacceptably high level of risk.  Survey sponsors and data



users must contribute to the decision making process in identifying



areas in which some completeness and/or accuracy can be sacrificed



while attempting to maintain as much data quality as possible.



 



    Below we list some options which are currently available to-



enhance data utility with no increase of risk.  If the topcode on



some item, say income, is $100, 000, replace all values- over the



topcode by the mean (or median, etc.) of the topcoded values.



Thus, if the mean of the topcoded values were $130,000, replace any



value in excess of $100,000 by $130,00O.  This is in contrast to



the current practice of replacing topcoded values by the cut-off



(in this case $100,000).  In fact, one can actually provide the



exact distribution of all topcoded values.



 



    Another option we have is for local topcodes.  For the



Metropolitan Sample of major cities from the American Housing



Survey, each city has a different topcode for "home value" based on



(roughly) a three percent upper tail cut-off for that city.  Would



state-level topcodes for such items as income, housing costs, etc.



be desirable for other files?  Would such a strategy provide more



useful data?



 



    The Census Bureau is currently planning for the Public Use



Microdatal Samples (PUMS) for the 1990 Decennial Census.  Current



plans call for a "standard" 1% file and 5% file as were produced



for the 1980 PUMS.  In addition, we are considering another file



having only national geography but containing far more detail than



the other files.  For example, we are considering adding tract



characteristics to each record.  That is, we append to each record



information about the tract of residence; information such as



unemployment rate, percentage of minorities, median home value,



etc.  Such local detail would not be acceptable on a file with more



specific geography, for fear one may be able to identify tract of



residence based on tract characteristics.  In addition, there is no-



reason, a priori, that income topcodes, and other topcodes as well,



cannot be raised to allow more detail for the respective variables



on a national file.  This also represents a trade-off between



various kinds of reduction of completeness -- geographic detail



verses demographic detail -- in which we provide less of the former



to obtain more of the latter.



 



     It is important that users of public release microdata files



contribute to the discussion of methods for the design of such



 



                               375



 



files often options and choices are available, and to the extent



that user priorities are known efforts can be made to accommodate



them.



 



IV. Summary



 



    In this report, we describe methods used by the Census Bureau



to reduce disclosure risk in the release of data products.  We



discuss tabular data and microdata for which the issues are



somewhat different.  In a related paper (Greenberg 1990) we provide



a detail discussion of Census Bureau research efforts in the area



of disclosure avoidance.



 



     In the design of a data release strategy many options are



typically available.  The trade-off between loss of completeness



and loss of accuracy is a theme that runs through much of the



discussion.  Plans are being made for the Public Use Microdata



Samples from the 1990 Census.  It is important that data users



contribute to the planning process by contributing to the



discussion of options and choices by indicating both needs and



preferences.



 



References



 



Greenberg, B. (1990), "Disclosure Avoidance Research at the Census



Bureau," Proceedings of the 1990 Annual Research Conference, Bureau



of the Census, Washington, D.C. (to appear).



 



Griffin, R.A., Flores-Baez, L. and Navarro, A. (1989), "Disclosure



Avoidance for the 1990 Census," Proceedings of the Section on



Survey Research Methods, American Statistical Association,



Washington, D.C., (to appear).



 



Griffin, R.A. and Thompson, J. (1087), "Confidentiality Techniques



for the 1990 Census," presented at the Fall meeting of the American



Statistical Association and Population Statistics Census Advisory



Committees.



 



Navarro, A., Flores-Baez, and Thompsono J. (1988), "Results of



Data Switching Simulation," presented at the Spring meeting of the



American Statistical Association and Population Statistics Census



Advisory Committees.



 



 



                               376



 



 



                THE MICRODATA RELEASE PROGRAM



          OF THE NATIONAL CENTER FOR HEALTH STATISTICS



 



                       Robert H Mugge, PhD



         National Center for Health Statistics (retired)



 



 



    My presentation will be in three parts:  First I shall



describe the microdata release program of the National Center for



Health Statistics (NCHS, or "the Center"); secondly I'll explain



the rules and procedures followed by NCHS in attempting to insure



the confidentiality of the subjects of our data; finally, I shall



discuss some concerns I have for confidentiality protection for



these NCHS data and some suggestions for meeting the problems that



I see.



 



     Let me make clear, that I am not speaking as a staff member of



NCHS, but rather as one who retired from that staff nearly eight



months ago after working in the confidentiality program of NCHS



for quite a few years.  So I am now speaking only for myself and



not on behalf of the Center.  I am told that there have been no



important changes in the Center's data security program since I



left, and Mr. Israel, Deputy Director of the Center, has kindly



reviewed this paper for current accuracy.  But all opinions and



commentary are strictly my own and not necessarily those of the



Center.



 



 



The NCHS Microdata Release Program.



 



     The primary function of the National Center for Health



Statistics is to develop and make available statistical information



on the health of the U.S. population, on the vital statistics of



the U.S., and related matters.  This is clearly stated in the law



authorizing the work of the Center (3).  The Director of the Center



decided many years ago that, carrying this mandate to its proper



conclusion, the Center would make available its statistics in as



full detail as possible for the use of scholars who wish to analyze



these data.  The covering policy statement is this:  "Within



prevailing ethical, legal, technical, technological, and economic



restrictions, it is the policy of the National Center for Health



Statistics to augment its programs of collection, analysis, and



publication of statistical information with procedures for making



available, at cost, transcripts of data for individual elementary



units -- persons or establishments -- in a form that will not in



any way compromise the confidentiality guaranteed the respondent



(6)."



 



 



     Implementing this policy, NCHS has now for a long time, and



with only rare exceptions, made available quite detailed data sets,



known as Public Use Data Tapes, on all of its finished surveys and



data reporting programs, together with full printed documentation



 



                                  377



 



 



(2).  These systems include the National Health Interview Survey;



the National Health and Nutrition Examination Survey; the National



Hospital Discharge Survey; the National Ambulatory Medical Care



Survey; The National Nursing Home Surveys; the National Survey of



Family Growth; several follow-up surveys; annual vital statistics



on births, deaths, fetal deaths, marriages, and divorces; and



various others (4).  Many years ago these files took the form of



boxes of punched cards; now for a long time they have been on one



or more reels of magnetic tape; recently they have been made



available on tape cassettes; and now the Center is moving into a



program of producing the files on CD-Roms.  However, the material



form of the data file is not relevant to the principles involved in



data release.



 



 



Confidentiality Protection.



 



     As noted in the policy statement, the Center is very concerned



that the confidentiality of data subjects in its surveys and



reports be maintained.  From its study of the problem NCHS has



devised a set of rules for protecting the confidentiality of



subjects -- persons and establishments -- whose information is



included; these rules are stated in the NCHS Staff Manual on



Confidentiality (5).  The Center has a Confidentiality Committee,



made up of high level staff, which reviews needs for policy changes



and makes recommendations regarding them to the Director.



 



     The rules followed in the Center for protecting



confidentiality are of two kinds.  One set of rules relates to data



published in tabular form and provides limitations on the contents



of published statistical tables; the other set of rules covers what



may be included in the public use microdata files (5).  But tables



published by the Center are generally limited to what may be



tabulated from the public use files, and, when this is the case,



the former set of rules may be ignored if the rules on the



microdata files are first met.



 



     The rules regarding tabular production of data are designed to



insure that no single cross-tabulation, or combination of cross-



tabulations, may permit disclosure of a confidential characteristic



of any identifiable individual or establishment.  This possibility



may be substantially dismissed in the cases of the Center's large



scale surveys of persons, involving samples representing usually



far less than one one-thousandth of the relevant population,



provided that data are not presented separately for any small



areas, in which a unique individual might stand out.  But special



care must be taken in the reporting of establishments, since these



often involve large proportionate samples, and reporting data for



larger areas -- perhaps even census regions -- may serve to



disclose data on particular institutions.  But in any event, if the



necessary care has been taken in restricting the contents of the



microdata set, and only this microdata set is used in building



 



                              378



 



tables, it fo1lows that there should be no disclosures resulting



from the publication of these tables.



 



    The rules for protecting confidentiality in public use



microdata sets, as set forth in the Manual, are as follows (5, P.



19):



 



 1) All direct personal or establishment identifiers, such as



    name, social security number, or address are purged from



    the file.



 



 2) The file must not contain any other detailed information



    about the subject that could facilitate identification



    and that is not essential for research purposes (such as



    the exact date of the person's birth).  It is often found



    necessary to give certain numerical information-such as



    income, nursing home size, or costs and charges of



    institutions -- only in broad class intervals in order to



    avoid disclosure.



 



 3) Geographic places that have fewer than 100,000 people are



    not to be identified in the file.  (In practice much



    larger places often cannot be identified, such as when a



    State is known to have primary sampling units totalling



    less than 100,000.)



 



 4) Characteristics of an area are not to appear in the file



    if they would identify an area of less than 100,000



    people.



 



 5) Information on the drawing of the sample which might



    assist in identifying a data subject must not be released



    outside the Center.  Thus the identities of primary



    sampling units are not to be made available outside the



    Center.  (I must say in all candor that NCHS seems to



    have lost control of that one, as it turned out that for



    several reasons the PSU identifications have unavoidably



    been made public.)



 



 6) Before any new or revised microdata files are published,



    they, together with their full documentation, must be



    approved by the Director or Deputy Director.  (When I was



    there this responsibility was delegated to me, and I



    reviewed all plans for public use microdata files, to



    make sure they complied with the governing rules and that



    no other data had crept in which might compromise



    confidentiality.  In the Census Bureau there is a high-



    level Microdata Review Panel that reviews plans for each



    public use microdata set release [1]; NCHS did not feel



    that such an expenditure of staff time was necessary in



    its particular situation.)



 



                               379



 



 7) Finally, NCHS required that before anyone outside the



     Center may be provided a public use microdata file, that



     person would be required to sign a statement called a



     "Data Use Agreement."  This statement points out the



     legal requirement that no data obtained by NCHS under its



     mandate may be used for any purpose other than the



     purpose for which it was obtained, i.e., for statistical



     purposes (3, Section 308[d]), it notes that all



     appropriate precautions have been taken to keep the data



     safe from,disclosure, but that there may still be a way



     that the data could inadvertently be used for identifica-



     tion.  In signing the agreement, the user states that



     he/she understands this and gives assurance that the data



     set he/she receives will not be misused, the security of



     the data will be protected, and no attempt will be made



     to reveal identities of data subjects, and, further, that



     if any subject is accidentally identified the user will



     work with the Center to make sure that this



     identification is not used, and procedures will be taken



     to assure that the identification cannot be repeated (2).



 



     NCHS requires signing of the Data Use Agreement, even



     though it may have no force in law, because of its



     information value and its assumed effect in raising the



     sensitivity of data users on the importance of protecting



     the files.



 



     NCHS does not doctor the data in other ways to avoid



disclosure.  It does not substitute any false data, nor does it do



anything like data swapping.  It has been determined that any such



procedure would lessen the value of the files for purposes of



research, and it is felt that this is most undesirable to do, as



the nation has an important stake in getting the best research



possible using these health-related data.  But primarily it has



been considered unnecessary to doctor the data.  The files contain



many errors already -- normal errors in data collection and



processing -- although the Center tries hard to minimize them.  So



the user cannot have absolute confidence in the information,



especially as it relates to individual subjects.  The Center is



reluctant to add additional errors.



 



     Is the system working?  It seems to be.  In all the years it's



been operating the Center has never heard of a case in which a



disclosure has been made through one of the survey files.  That it



comforting, but the comfort is mitigated by the knowledge that the



Center wouldn't necessarily ever hear about it if a survey file



were compromised.



 



     There is also another piece of evidence as to the effect of



the confidentiality program.  I don't think the Center has ever



received a complaint from the public that it isn't protecting



 



 



                          380



 



confidentiality adequately in the data files program.  But there



have been many complaints from researchers that the Center isn't



releasing enough information to them.  Since we felt that, if



anything, the Center erred in being too liberal in releasing data,



the researchers' complaints encouraged me to feel that the Center is



balance may have been about right.



 



 



Commentary



 



     Is this confidentiality-protection program, then, good enough?



There cannot be perfect confidentiality protection if any midrodata,



are to be released.  So each organization must seek to find the



proper balance between the public's needs for data and the



appropriate measures for protecting confidentiality (7, pp. 1-2).



I have described the compromise position reached by NCHS.



 



     In the large surveys it conducts NCHS obtains a great deal of



information, which may be considered independent or dependent



variables for data analysis, on each individual subject (4).  This



mass of data may constitute a fingerprint about an individual;



there may be no one else with this particular set of characteris-



tics.  So if one had another source of such information about



individuals (or establishments) then it would be easy to match them



up and disclose all the new information in the survey on the



identified individual.  Fortunately, no such files exist about



individuals on any mass basis.  There may be similar files in other



sample surveys, but the chance of overlap is so small that the



likelihood can be dismissed.  This is not true about



establishments; there are lists of nursing homes, hospitals, and



clinics that could be used to identify them if the file contains



the right kinds of characteristics.  So great care must be taken in



determining what can be published about characteristics of



establishments included in data files; we've made some last minute



discoveries on certain file-release plans which we hope enabled us



to make these files disclosure-proof.



 



     For those concerned about inadvertent disclosures there is



this consolation:  the vast majority of information in our data



files is quite innocuous.  Much of it is obvious, at least within



certain limits, or already well known by individuals' associates,



such as the person's sex, age, and weight, and various obvious



health conditions.  If you found a friend's file in one of the



surveys you are not very likely to learn through it something you



didn't already know, or, if you did learn something new it would



probably be quite harmless.  So chances are that data subjects are



not likely to sustain any harm or embarrassment from having their



data disclosed.  This, however, would certainly not excuse an



agency from not doing its utmost to keep its promise of confiden-



tiality to the data subject.



 



 



 



                              381



 



      Historically, however, there have always been some sensitive



items in the data files -- information that could cause harm and



embarrassment.  There have been the early and late effects of



venereal disease; there have been some diseases which at times have



carried stigmas, such as leprosy and cancer; and there have been



surveys on social behavior, such as sexual practices, which if the



information got out could cause considerable harm.  And now it



seems that society needs to obtain new information which may carry



threats of individual harm beyond that brought by any data in the



past.  I have two particular examples:  one is AIDS or its



precursor, the HIV virus; the other is information on the sexual



practices of unmarried teen-age girls, as obtained in the latest



cycle of the Family Growth Survey.  (The latter information is



obtained with the approval of parents, but the parents are not



given the information obtained from the girls.)



 



      There is so much concern about protection of the AIDS or HIV



information that, so far, surveys to obtain it are only being done



using procedures which guarantee the anonymity of subjects, even



from the data collecting agency.



 



      Now, with all of NCHS's efforts to protect the confidentiality



of the data, there is one scenario that haunts me.  That is this:



If someone knows well a survey subject and knows the person was in



a particular survey and at some time has access to that survey



file, then he/she could easily use his/her knowledge of the person



to locate the person's file in the data set.  Then all the



information about that person obtained in the survey will be laid



out before them.



 



      I don't know of any reasonable way to avoid this possibility,



beyond what the Center is already doing, especially through the



Data Use Agreement.  I wouldn't like the idea of warning data



subjects not to tell their friends and relatives of their being in



the survey sample, when the Center is trying to get good publicity



for the survey and urging people to cooperate.  Of course, though,



if it is found that a data subject is advertising being in the



survey, that person is out.  (Last year a college professor wrote



in to tell the Center staff how interested she was in having been



included in the Family Growth Survey.  She had told her class all



about it.  The Center wrote back to say that her interest was



appreciated, but her record was being removed from the survey



sample.)



 



      There is, however, one policy response I think the Center



should make to the scary scenario I alluded to.  I think the Center



should lean over backwards to assure that there is as little



sensitive information as possible in the public use data files.  I



think that NCHS should, for example, make sure that there is no



AIDS or HIV-positive information in any survey files published



where there is any possibility of individual disclosures.  By the



same token, the Center should not release in microdata files the



 



                              382



 



Family Growth Survey data on sexual practices of unmarried women;



I personally feel that that entire survey is too sensitive to



justify any microdata release from it.  That will make the



researchers angry with the Center, and the Center should arrange to



do the special tabulations and analyses that the outside



researchers want, within limits of practicality.



 



     Eternal vigilance is the price of good data confidentiality



protection.  But a prime needed ingredient in a successful



confidentiality protection program is clout!  If the head of any



agency producing statistical files is not keeping a close eye on



protective procedures and lending her/his authority to maintaining



a strict program of protection, that statistical program -- along



with all the people depending on it -- could be in big trouble.  I



think the protection program in NCHS has been successful, and that



success owes much to the continuing concern and support it has



received from the Center's Directors.



 



 



References



 



(1) Gates, Gerald W., "Census Bureau Microdata: Providing Useful



Research Data while Protecting the Anonymity of Respondents."  U.S.



Bureau of the Census, Program and Policy Development Office.  Paper



prepared for presentation at the annual meetings of the American



Statistical Association in New Orleans, August 1988.



 



(2) National Center for Health Statistics, Cataloque of NCHS Public



Use Data Tapes.  DHHS Publication No. (PHS) 88-1213.  U.S.



Department of Health and Human Services, Public Health Service,



Centers for Disease Control.  Hyattsville, MD, July 1988.



 



(3)____, Current Legislative Authorities Enacted as of December



1989:  Sections 304, 306, 307, and 308 of the Public Health Service



Act.  U.S. Department of Health and Human Services, Public Health



Service, Center for Disease Control.  Hyattsville, MD, April 1990.



 



(4)____, Data Systems of the National Center for Health Statistics:



Programs and Collection Procedures, Series 1. No. 16.  U.S.



Department of Health and Human Services, Public Health Service,



Office of Health Research, statistics, and Technology.



Hyattsville, MD, December 198l.  More recent versions may be



available in unpublished form from NCHS.



 



(5)____, NCHS Staff Manual on Confidentiality.  DHHS Publication



No. (PHS) 84-1244.  U.S. Department of Health.and Human Services,



Public Health Service.  Hyattsville, MD, September 1984.



 



(6)____, Policy Statement on Release of Data for Individual



Elementary Units and Special Tabulations.  DHEW Publication No.



(PHS) 78-1212.  U.S. Department of Health, Education, and Welfare,



Public Health Service.  Hyattsville, MD, May 1978.



 



                               383



 



(7) Subcommittee on Disclosure-Avoidance Techniques, Federal



Committee on Statistical Methodology, Statistical Policy Working



Paper 2:  Report on Statistical Disclosure and Disclosure-Avoidance



Techniques.  U.S. Department of Commerce, Office of Federal



Statistical Policy and Standards.  Washington, D.C., May 1978.



 



 



 



 



 



 



 



 



                                384



 



                           DISCUSSION



 



                          George T. Duncan



                     Carnegie Mellon University



 



 



      Serving the public, federal statistical agencies must balance



the respondent's need for privacy and the researcher's need for



information.  Ultimately, how the balance is struck should be the



result of the political process, which in the U. S. is complex,



indeed.  What the agencies can contribute to this is the development



of procedures, both administrative and statistical, that for a



given level of privacy protection maximize access to data and that



for a given level of access maximize privacy protection.



 



      Brian Greenberg, of the U.S. Bureau of the Census, and Robert



Mugge, retired from the National Center for Health Statistics, have



given us a clear perspective on the data dissemination policies and



practices of two major federal statistical agencies.  As chair of



the Panel on Confidentiality and Data Access that is co-sponsored



by the Committee on National Statistics and the Social Science



Research Council, I find their work of special importance -- both



to the panel and to all federal data users and providers.



 



      Brian Greenberg, both here and in a recent paper (Greenberg,



1990) describes some disclosure limitation practices at the Census



Bureau.  He properly emphasizes the need in data dissemination for



a tradeoff between "amount of information" and "level of disclosure



risk".  He identifies the open research problem of quantifying each



to be meaningful for disclosure-limited data dissemination.  While



mathematical arguments are useful in this quantification,



essentially the task is a decision-theoretic one that incorporates



the motivations of the stakeholders.  Since these stakeholders are



various -- including individual respondents, other government



agencies, academic researchers, market researchers, commercial



planners, the media, and lobbying groups -- the measures developed



should be multivariate in nature.



 



      He notes in Greenberg (1990) that



 



      The first general purpose public use microdata file



      released by the Census Bureau was the 1 in 1,000 sample



      from the 1960 Census of Population and Housing.  This file



      was released in 1963.  A few years later a public use



      microdata file from the Current Population.  Survey was



      released.  At present, public use microdata files are



      released as standard products from virtually all



      demographic surveys, and they are extensively used by



      researchers in many areas.  In fact, the public use



      microdata files for the Survey of Income and Program



      Participation form the major data product from the



      survey.  The Public Use Microdata Sample from the



 



                                385



 



    Decennial Censuses are becoming increasingly important to



     users, especially researchers in the social sciences, and



     these files are gradually replacing the Summary Tape



     Files for many research applications.



 



    Against this history, his paper focuses on statistical



procedures that the Census Bureau uses to limit disclosure risk,



rather than procedures -- whether statistical or administrative --



for expanding research access.  While the disclosure limitation



aspect is important, I would have liked to have seen more attention



paid to the ways the Census Bureau has actively tried to make



information available.



 



    Much mote than Robert Mugge, Brian Greenberg emphasizes



masking procedures, both for disclosure limitation in tabular data



and for microdata.  He draws a nice conceptual distinction between



the effect of disclosure limitation on data utility through loss of



accuracy and loss of completeness.



 



    For tabular data disclosure limitation, he shows how cell



suppression and the technique of a Confidentiality Edit can be



employed.  For microdata disclosure limitation, he stresses grouping



of quantitative variables, including topcoding.  In masking some



special purpose files, noise is also added to lower the likelihood



of a successful computer match with publicly available files having



identifiers.



 



    Greenberg describes the work of the Microdata Review Panel,



which is broadly representative of key components of the Census



Bureau.  Commendably, the Census Bureau, through work of Gerald



Gates and the Microdata Review Panel, has been open to suggestions



of ways of expanding researcher access to data.



 



    In thinking about Greenberg's paper, certain questions nagged



me.



 



    How useful will researchers find the data after it has been



massaged by the various disclosure limitation procedures?  A



research effort is needed on appropriate ways of analyzing masked



data.



 



    What do researchers need to be concerned about in analyzing



data that has been "Confidentiality Edited"?  Some researchers will



ignore the fact that the data has been altered, and hence produce



misleading conclusions from their analysis.  Researchers need to be



carefully informed about the limitations of standard analyses of



the edited data.  This requires a study of how researchers in



practice respond to various caveats attached to the released data.



 



    Do the thresholds on geography (100,000 persons or 250,000 for



SIPP) have any basis in theory or do they just "feel right" to the



Microdata Review Panel?  In another paper to be presented at the



 



                               386



 



American Statistical Association meetings in August, Brian



Greenberg and Laura Voshell relate the size of geographical units



to the percentage of unit records.  This is a good start but the



direct tie to disclosure risk is not yet made.



 



     Are any special disclosure-limiting procedures used for



longitudinal data?



 



     How can data users best be brought into the decision making



process?  Specifically, how can agencies insure that data users help



identify what in data accuracy can best be spent to buy disclosure



limitation?



 



     How do respondents view the level of disclosure limitation



provided by these procedures?



 



     Robert Mugge has described what is by all accounts a



successful program in microdata release in an important area for



public policy.  As with Brian Greenberg's paper, I would like to



highlight certain aspects of why I think the program is successful



and then focus on some specific concerns that I have for the



future.



 



     I believe the program's success stems from the basic policy



statement of the National Center for Health Statistics:



 



     Within prevailing ethical, legal, technical,



     technological, and economic restrictions, it is the



     policy of the National Center for Health Statistics to



     augment its programs of collection, analysis, and



     publication of statistical information with procedures



     for making available, at cost, transcripts of data for



     individual elementary units -- persons or establishments



     -- in a form that will not in any way compromise the



     confidentiality guaranteed the respondents



 



     The three reasons I see are these:



 



          First, NCHS has taken seriously its mandate to make



     microdata available to researchers.  Its focus is not



     predominately on data collection but equitably on data



     dissemination.  This balance ensures good stewardship, not



     the hoarding of quality data but rather its investment in



     the work of researchers who can advance the public good.



     In thinking about how disclosure limitation related to



     researchers, I should point out that researchers come no



     just from academia, as faculty members at Carnegie



     Mellon, but also from the media, as reporters for the New



     York Times, and lobbying groups, as analysts for the



     American Association of Retired Persons, say.



 



 



 



                                    387



 



          Second, NCHS explicitly recognizes that there are



     constraints on microdata dissemination that have ethical,



     legal, technical, and economic dimensions.  This cues



     where to look for potential problems.



 



          Third, NCHS does not guarantee that identifiability



     is impossible but instead links to the implicit contract



     that it has established with the respondent.  This is both



     realistic and responsible.



 



     In implementing this policy, NCHS has made Public Use Data



Tapes of key surveys available -- in accord with the directions of



a Confidentiality Committee and under rules well stated in the NCHS



Staff Manual on Confidentiality.  So NCHS does in fact deliver the



data, but makes sure that there-is well-identified administrative



oversight and that the policies exist in written form for reference



by agency staff, researchers, and the interested public.



 



     Further, through the "Data Use Agreement" NCHS encourages the



receiver of the data to assume some responsibility for proper use



of the data.  This agreement, while not legally binding, educates



the researcher on the restriction to statistical use of the data,



provides that no attempt will be made to identify data subjects,



and provides for appropriate action in the case of accidental



identification.



 



     In pointing to the future, I would like to fix on a few



concerns:



 



     Why should NCHS not emulate the Census Microdata Review Board?



And indeed go further by including representatives of both the



respondent and the user communities?  Through internal



representation from various areas within NCHS it might further



consistency of application of the policies.  Through external



representation it might foster responsible interaction with both



respondents and researchers.  It might help ensure that respondents



got meaningful information about agency practices and intentions so



that in authorizing how their responses are to be used, they would



be properly informed.  It might help ensure that researchers needs



were addressed and that data quality was not unduly sacrificed in



the name of confidentiality protection.



 



     Why not have the Data Use Agreement be more binding on the



researcher?  Possible mechanisms for this might be legal



requirements (such as legal sanctions or use of binding contracts)



economic incentives (such as use of bonds or returnable license



fees), or administrative practices (such as restrictions on further



access)?



 



     Why not use data masking in certain cases where the data might



not be releasable otherwise?  This requires that enough information



be provided to the researcher that an appropriate analysis can be



 



                                  388



 



carried out.  It also requires that suitable techniques be developed



for the analysis of masked data.



 



    Might not advances in computer technology increase the



prospect of linkage of a record with an identifier with a released



record, even when only a sample is released?  After all, in a



recent JASA article, Bethlehem, Keller, and Pannekoek note that in



a certain region of the Netherlands having 23,485 households



composed of a father, mother, and two children with just a six-



item key of ages (in years) and gender, 16,008 of the households



were unique.  Presumably, a plausible model could be constructed in



which -- with a bit more detail -- a data intruder who matched a



record in the sample uniquely would also place high probability



that it is a unique match in the population.



 



    Should not special administrative procedures be developed for



establishment -- like nursing homes and hospitals -- that cannot



be reasonably assured that they would not be identifiable?  For



example, large hospitals might be asked for authorization to



include their data in a public use file.



 



    Are many variables really innocuous?  Marital status, age, and



weight under certain circumstances are sensitive, let alone the



details of sexual practices required in AIDS-related surveys.



 



     Can an agency depend on naturally-occurring errors to provide



confidentiality protection?  Introducing noise with a distribution,



known to the user may be effective.  Research is ongoing in this



area.



 



     To sum up, I think with the attention of professionals such as



Robert Mugge and Brian Greenberg that the growing tension between



privacy concerns and demand for data access can better be mediated.



To take a clue from Fritz Scheuren, we need DANTOTSU, Japanese for



"choosing the best of the best".  The federal statistical agencies



can then be better stewards of the data our citizens provide to



further the public interest.



 



References



 



Bethlehem, Jelke G., Keller, Wouter, J., and Pannekoe  Jeroen



(1990) Disclosure Control of Microdata.  Journal of the American



Statistical Association 85, 38-45.



 



Greenberg, Brian (1990) Disclosure Avoidance Research at the Census



Bureau. 1990 Annual Research Conference, Bureau of the Census,



Arlington, VA, March 18-21.



 



 



 



 



                                389



 



 



 



 



 



                               390



 



 



                           Session 12



                     FEDERAL LONGITUDINAL SURVEYS



 



 



 



 



 



 



 



 



                                 391



 



 



 



 



 



                               392



 



 



                   FEDERAL LONGITUDINAL SURVEYS



 



                           Daniel Kasprzyk



                      U. S. Bureau of the Census



 



                            Curtis Jacobs



                  U. S. Bureau of Labor Statistics



 



 



I. Introduction



 



      During the 1960's and 1970's, panel surveys surveys in



which similar measurements are made on the same sample at different



points in time -- became a popular tool for social science and



policy research.  Boruch and Pearson (1985) indicate 64 national



surveys of this kind were carried out during that period of time.



The apparent popularity of such survey designs prompted the Office



of Management and Budget's Federal Committee on Statistical



Methodology (FCSM) to form a subcommittee on "federal longitudinal



surveys" during the Spring of 1983 under the chairmanship of



Barbara Bailar and Daniel Kasprzyk.  Maria Gonzalez, chair of the



FCSM, provided organizational and staff support to the



subcommittee.  The subcommittee's goals were very general -- to



identify the strengths and limitations of longitudinal surveys, and



to propose some guidelines for using them more effectively.



 



      The Subcommittee on Federal Longitudinal Surveys was composed



of the following members:  Barbara Bailar (co-chair, Bureau of the



Census), Daniel Kasprzyk (co-chair, Bureau of the Census), Barry



Bye (Social Security Administration), Dennis Carroll (National



Center for Education Statistics), Robert Casady (Bureau of Labor



Statistics), Steven B. Cohen (Agency for Health Care Policy and



Research), Lawrence Ernst (Bureau of the Census), Maria Gonzalez



(Office of Management and Budget), Catherine Hines (Bureau of the



Census), Curtis Jacobs (Bureau of Labor Statistics), Inderjit



Kundra (Energy Information Administration), and Bruce Taylor



(Bureau of Justice Statistics).



 



      This paper follows the general outline of the working paper



developed by the OMB subcommittee.  We discuss the advantages of



longitudinal surveys, managing longitudinal surveys, some



activities related to longitudinal survey operations, estimation,



some persistent issues in longitudinal surveys, and data user



issues.



 



 



II. Definitions



 



      Terminology in this area of social science research has not



been standardized.  Kish (1987) describes longitudinal studies as



a generic term referring to a wide variety of studies done over



time.  Duncan and Kalton (1987) prefer to use the word



 



                                393



 



 



"longitudinal" in the context of data; thus, permitting



longitudinal data to be collected in either a panel or cross-



sectional (retrospective) survey.



 



    The subcommittee chose to combine two components, design and



data, into the definition of longitudinal survey adopted for the



report.  The distinguishing features of a longitudinal survey are:



1) repeated data collection for a sample of observational units



over time; 2) the linkage of data records for different time



periods to create a longitudinal record for each observational



unit; and 3) the principal analysis was to be based on the data



collected over time.



 



    The subcommittee's definition is more restrictive than that



adopted by Duncan and Kalton or Kish, since longitudinal surveys



are those in which the sample unit is followed, microdata



assembled, and longitudinal analysis included as part of the



estimation plan.



 



 



III.  Advantages of Longitudinal Surveys



 



    A longitudinal survey is usually needed to measure and study



micro-level dynamics -- changes in attitudes, changes in prices,



changes in economic well-being, for example -- or to improve the



measurement of certain important concepts (Pearson, 1989).  Some



advantages for obtaining repeated measurements on the same sample



unit over time are: 1) multiple interviews of the same sample unit



reduce sampling variability on estimates of changes; 2) a matched



longitudinal data set provides a better measure of components of



individual change; that is, measures of gross change for the unit



at two, points in time; 3) a longitudinal survey is capable of



obtaining a wider range of variables from each sampled element than



is possible from a repeated survey of cross-sections; 4)



longitudinal surveys with relatively short reference periods may



reduce telescoping errors that occur when respondents misplace the



timing of the occurrence of events; 5) longitudinal surveys with



relatively short reference periods can be used to produce



aggregated data for a longer time period -- a year, for example.



While longitudinal surveys are advantageous, they do not solve all



data collection problems.  In fact, they create some additional



problems which will be discussed later.



 



 



IV. Managing Longitudinal Surveys



 



    Managing large, complex longitudinal surveys has much in



common with managing large, complex cross-sectional surveys.



Successful project management techniques and the issues surrounding



the successful execution of a project should not be related to the



design of the project.  There are, however, nuances in the case of



longitudinal surveys that are important to recognize.  They are:



 



                                394



 



 



1) inordinately high expectations for the project; 2) budget



planning; 3) content, procedural, and methodological innovations;



and 4) changes in the data collection organization.



 



     Expectations associated with longitudinal data collections



typically run high.  The set of analysts interested in the data set



as a vehicle for answering-their own research questions is often



broad and diffuse.  The sum of these expectations as well as the



project staff's expectations, almost by definition must exceed what



is achievable in the short run.  Grasso and Kohen (1978) make this



point concerning The National Longitudinal Surveys (NLS);



similarly, Duncan and Morgan (1984) admit that judged by the



expectations for the Panel Study of Income Dynamics (PSID) the



investment in the study could not have been profitable.



 



     Long range planning and budget planning play an important role



in the development of a longitudinal survey.  A long range planning



document laying out the budget, analysis plan, instrument



development plans, staffing plans, survey procedures, and



anticipated products is one way to assist senior agency officials



in understanding the need for the project; it also provides a



baseline document for the survey.



 



     Another aspect of the management of longitudinal survey



operations is the persistent tension in maintaining the status quo



versus making corrections and alterations to the instrument and



processing system.  A serious analysis of the trade-offs from the



cost as well as analytic point of view ought to be made before



making a change.



 



     Long-term longitudinal surveys, such as the NLS and the



surveys sponsored by the National Center for Education Statistics,



can be spread over a decade or more.  These, surveys when contracted



to private sector survey research organizations usually have



periodic recompetition for the contract.  However, a change in data



collection organization can be very traumatic to the longitudinal



survey project if not properly planned for.  A very detailed level



of documentation of methods is required to ease the transition, if



it should be necessary.



 



 



V. Longitudinal Survey operations



 



     The differences between field and processing operations in one



time cross-sectional surveys and longitudinal surveys are created



by the time dimension.  For example, time enters in the selection



of new units into the sample, in identifying and matching the same



sample unit from round to round of the survey, in following sample



units from one interview to the next, and in the way longitudinal



products are released.  We discuss these below.



 



 



 



                                  395



 



 



A. Maintaining the Composition of the Sample



 



    The composition of the sample may be expected to change across



waves for many reasons.  Respondents may refuse to participate,



they may not be home, they may die or may be institutionalized, or



may go abroad.  To reduce the effects of these problems, some



continuing panel surveys routinely introduce new sample units at



certain points in time within a panel.  These designs are called



rotating panel designs.



 



    Other longitudinal surveys, such as the Panel Study of Income



Dynamics (Duncan, Juster, and Morgan, 1986), argue that the



representativeness of the sample of the entire population of



families and individuals can be maintained over time through rules



that allow families and individuals to enter the sample with known



selection probabilities.



 



B. Following Individuals Over Time



 



    The issue of whom to follow in a longitudinal survey and the



intensity at which one follows individuals over time is directly



related to the analytic uses of the data, the amount of time



between interview rounds, and the budget of the survey.  Analytic



uses should drive the operational decisions of whom and how far to



follow an individual.  If the basic sampling unit and unit of



analysis is the individual, then the following rules consist of



following all individuals originally selected into sample.  These



are generally called cohort studies.



 



    Another design, labelled by Cox and Cohen (1985) as a



longitudinal household design, consists of the individual as the



basic sampling unit.  The dwelling unit is sampled and all



individuals living in the dwelling unit are selected into sample at



the first round of interviewing and are interviewed in subsequent



rounds whether or not they reside at the original sample address.



In order to develop household and family estimates for the dwelling



units, data are obtained from all individuals living at the address



of the person originally identified as a sample individual.



 



    Tracing is directed toward obtaining the current address of



the survey respondent.  Some people move great distances and are



difficult to trace; others may not want to be traced.  The



operations of the survey organization must establish a set of



information sources that are capable of providing current address



information for individuals who move.  Several surveys obtain



information from the respondent at the end of each interview on the



name and phone number of a person who will always know the sample



person's whereabouts.



 



    Other information sources can be developed by the interviewer



through the sample person's friends, relatives, and other contacts



 



                                396



 



established through the respondent, such as neighbors employers,



and directories (Burgess, 1989).  The mode of interview may effect



the types of tracking techniques used.  Personal visit surveys will



use mail or telephone tracking as well as place heavy reliance on



the interviewer for creative solutions to finding respondents.



Telephone surveys are likely to rely on the telephone for tracking,



but rarely send staff into the field (Cantor, 1989).



 



     An operational concern in tracing respondents is the



additional costs incurred by field staff.  White and Huang (1982)



have estimated that during a one year time period (4 interviews) of



the Income Survey Development Program (ISDP) Panel, the number of



interviewing hours increased by 7% and the number of miles charged



by the interviewer increased by 22%, due to the cost of following



movers and interviewing additional households.  However, NCES found



that per-unit tracing costs for NCES' High School and Beyond Survey



were approximately 20% less than the cost of base year sampling,



indicating the potential economies of longitudinal surveys (Office



of Management and Budget, 1986).



 



 



C. Linking Analysis Units Between Waves



 



     What is the point in conducting a longitudinal survey if data



from successive interview rounds can not be successfully brought



together for analysis?  Obviously linking or matching variables



must be created to permit the merging of data over time.



Complications arise when consideration must be given to multiple



units of analysis.



 



     Surveys which are intended to follow individuals, regardless



of their association with the sampled household location or address



simply assign an independent and unique person identification



number to each individual.



 



     The Survey of Income and Program Participation (SIPP) is



illustrative of surveys requiring linkage variables to allow



analysis at various levels -- household, family, person, and event.



The SIPP has a complicated variable as described by Jean and



McArthur (1984, 1987) which ensures that the identification number



remains constant regardless of changes in address and household



composition.



 



     The National Medical Care Expenditure Survey (NMCES) took an



altogether different approach by using the identification variable



for internally matching the rounds of interviews and providing the



public with matched rounds of data.  As a consequence, round-to-



round matching was unnecessary for the public.



 



 



 



 



 



                                   397



 



 



D. Operational Changes over Time



 



      Changes in the administration and operation of longitudinal



surveys seem to be inevitable; these changes, however, are likely



to make comparisons difficult to assess and interpret.  One needs



to recognize that aspects of the survey design and data collection



that change during the course of the survey may influence results.



This is not to say that one should never make any changes; rather,



one needs to be aware of the consequences of actions taken and



attempt to measure the effects of such changes.



 



E. Operational variations in Longitudinal Data Products



 



      As with any complex study, many variations are possible in the



processing and development of longitudinal data products.  Three



illustrations give a sense of the options available.  The main



Panel Study of Income Dynamics (PSID) data files contain



information gathered since the beginning of the study in 1968 and



are updated on an annual basis.  Thus, each wave of data is



released together with the data previously made available.



 



      Because the length of the survey is predetermined, the



National Medical Expenditure Survey (NMES), prefers to wait to the



conclusion of the panel -to develop its longitudinal products.  This



survey program uses the multiple interviews as a vehicle to revise



data or fill in data not completely reported in earlier interviews.



 



      The SIPP, on the other hand, first releases individual wave



public data files to allow researchers the opportunity to analyze



each wave as a separate cross-sectional data set and to provide the



ability to develop their own multiple interview data set.  A



longitudinal data file for the entire panel (32 months) is released



as a separate product only after all the individual wave products



are released.



 



VI. Estimation



 



      Three issues stand out in considering estimation from the



longitudinal point of view:  1) defining the longitudinal



universe; 2) defining longitudinal unit concepts; and 3) the



treatment of missing data.



 



A. Defining the Longitudinal Universe



 



      The target population of a longitudinal survey must deal with



the consequences of birth, death, and mobility during the life of



the study.  Unlike cross-sectional studies that fix the population



at a specific point in time and inferences made only about the time



the sample was drawn, some longitudinal studies may be concerned



 



                                 398



 



with drawing inferences about a nonstationary target population



whose composition is changing over time.



 



     Judkins et al (1994) describe three methods for defining a



longitudinal universe.  One method selects a specific time during



the course of the study as the point that defines the universe.  If



the universe is defined at the time of sample selection, it is



called a "cohort" study.



 



     A second method of defining a longitudinal universe looks at



more than one point in time.  Several time points are selected,



each one defining a universe.  The entire set of units defined by



these different cross-sectional universes is included in the



longitudinal universe.



 



     A third method of defining a longitudinal universe includes



only units common to all selected time periods; that is, in this



approach one includes only those elements which were members of all



cross-sectional universes.  This universe contains only those units



which did not enter or exit the survey universe and as a



consequence is a static universe.



 



B. Defining Longitudinal Unit Concepts



 



     Some longitudinal surveys, the prime examples being the NMCES



and the SIPP, have undertaken the task of conceptualizing annual



units of analysis using subannual data.  Longitudinal analyses of



a sample of households, families, or establishments must deal with



the problems brought on by changes in the composition of these



units.  When a household or family splits up as a result of a



divorce or separation, which of the two units is the same as the



original unit?  Dicker and Casady (1982) and McMillen and Herriot



(1985) discuss this topic for the NMCES and SIPP respectively.



 



     Statistical estimation of longitudinal concepts is discussed



by Ernst (1989) and Folsom, LaVange, and Williams (1989).  Note



that the acceptance of such concepts is not universal.  Duncan and



Hill (1985) argue that defining these concepts is unnecessary since



all relevant analysis can be done at the individual level.



 



C. The Treatment of Missing Wave Data



 



     In cross-sectional surveys, nonresponse is categorized in two



ways:  unit (total) nonresponse and item nonresponse.  In



longitudinal surveys, however a third type of nonresponse exists --



wave nonresponse.  Wave nonresponse occurs when a sample unit does



not respond in one or more waves of a longitudinal survey.  In this



situation, considerably more data are missing compared to the item



nonresponse situation; however, considerably more data are



available for use in nonresponse compensation strategies.



 



                              399



 



Solutions to the issue are not clear cut.  Weighting and



imputation, the methods use to compensate for nonresponse, have



their own advantages and drawbacks (Kalton, 1986; Lepkowski, 1989).



 



     Cox and Cohen (1985), Kalton and Miller (1986), and Mulvihill



and Lawes (1980) have conducted empirical investigations into the



relative quality of imputation and weighting as nonresponse



compensation procedures in panel surveys.  They found little



difference when cross-sectional estimates were of interest.  They



show, however, that some forms of imputation are clearly inferior



when longitudinal analysis is of interest.  Singh, Huggins, and



Kasprzyk (1990) advocate imputation for a restricted set of missing



data patterns.  This point of view is consistent with that



expressed by Lepkowski (1989) where consideration is given to



combined strategies of imputation and weighting.



 



 



VII.  Persistent Issues in Longitudinal surveys



 



     Longitudinal surveys theoretically offer the opportunity to



measure change at the individual level as well as the opportunity



to improve the overall measurement of data that are difficult to



collect.  In practice, two kinds of nonsampling error issues arise



that play a significant role in longitudinal surveys: nonresponse



and conditioning.  A third issue, the role of nonsampling error in



the measurement of gross flows, remains a complex and persistent



problem for longitudinal surveys.



 



 



A. Attrition



 



     A major concern as a longitudinal survey ages is the loss of



representativeness of the sample due to nonresponse.  Typically,



the largest nonresponse occurs in the first several interviews with



the wave-to-wave change in sample loss decreasing during the panel.



Frequently it is not clear what the nature and character of sample



loss is.  See Kalton, Kasprzyk, and McMillen (1989) for



illustrations of sample loss in selected longitudinal surveys.



 



     The picture of nonresponse that we typically see is varied and



likely dependent on factors such as, the frequency of interviews,



difficulty in following or tracing respondents, sample composition,



the length of the longitudinal survey, quality of the field staff,



content of the questionnaire, and the efforts made to retain the



sample.  In general, nonresponse rates in longitudinal surveys



increase over time as one would expect, but the rate of increase



declines or stabilizes over time.  However, we would not minimize



the importance of the observation that cumulative overall



nonresponse rates can be substantial over the length of a panel.



 



     Analytic difficulties can occur if the nature of the



nonresponse problem is not well understood.  Too often, little is



 



                               400



 



done to describe the problem.  Descriptive studies such as those



done by McArthur (1988) and McArthur and Short (1985) for the SIPP



provide some insight in understanding differences between



respondents who participate in all waves with those who miss one or



more interviews.



 



    Other studies aim to assess whether the current wave sample



differs systematically from the original sample.  See Rhoton



(1986) for example.  Another approach to understanding whether



responses in later waves have a potential bias is to compare



distributions of responses of subsequent respondents and



nonrespondents to responses to questions asked earlier in the



panel.  This approach was taken by Petroni and King (1988) to study



the effect of SIPP's cross-sectional nonresponse adjustment,



variables in accounting for attrition in later waves.  Finally,



another approach was taken by the PSID.  Becketti et al (1983)



identified a particular analysis (e.g. regression analysis.) of data



obtained from earlier waves and included variables indicating



subsequent response status to provide evidence that nonresponse



bias was not present in the PSID.



 



 



B. Time-In-Sample Bias



 



     Time-in-sample bias refers to the concept that individuals'



responses to the survey instrument may change due to the length of



time an individual has been in the survey (that is, the number of



times interviewed).  Evidence of this bias has been found in



estimates of unemployment, where higher rates of unemployment are



observed among individuals in sample for the first or second time



(Bailar, 1989).  Other surveys have observed a time-in-sample



phenomenon.  See Neter and Waksberg (1964) and Woltman and Bushery



(1975).  In essence, this effect may occur because the early



interviews in a longitudinal survey change either the respondents'



behaviors or the way they answer the questions.  Similarly the



interviewers' behavior and approach to the respondents may change.



 



     In practice, this bias is difficult to estimate because of a



variety of changes taking place between rounds of a longitudinal



survey, especially attrition.  Unfortunately, even documentation of



the existence of this bias is very difficult, requiring either a



rotating panel design in which fresh replicate samples are added to



the panel or an independent replicate sample implemented



specifically to address this issue.  Bailar (1989) and Kalton,



Kasprzyk, and McMillen (1989) review several studies.



 



 



C. Measurement Error



 



     One of the presumed benefits of longitudinal surveys is their



theoretical ability to measure change at the individual level.  The



difficulty, however, is that change measures are very sensitive to



 



                                 401



 



individual measurement errors.  Kalton, Kasprzyk, and McMillen



(1989) identify aspects of panel surveys that may lead to



measurement error: 1) simple response variability; 2) wave-to-wave



changes in respondents; 3) changes in data collection mode; 4)



wave-to-wave changes in interviewers; 5) wave-to-wave changes in



questionnaires; 6) changes in the interpretation of questions over



time; 7) wave-to-wave changes in coders; 8) imputation; 9) time-in-



sample-bias; 10) matching interviews across time.



 



     Any one of the above or all in some combination may make the



measurement of gross change problematic.  A reporting error in the



data at one point in time, corrected at another point in time, can



lead to spurious measurements of change.  Analytical difficulties



in this type of analysis can be mitigated somewhat by sensitivity



of the data collection organization to the problems; for example,



detailed field edits, and proper documentation of amputations and



identification of nonrespondents can help analysts in understanding



their results.



 



 



VIII.  Data User Issues



 



    As discussed above, many statistical and measurement issues



occur in the development of large-scale national surveys.  In



particular, many of these issues are exacerbated in longitudinal



surveys in which repeated observations are taken on the same unit



at several points in time.  We believe the myriad of issues and



their consequences places responsibility on the sponsors of such



activities to provide substantially more documentation and guidance



on the nature and extent of errors, both sampling and nonsampling



errors.



 



    It is impossible to control and determine the effects of all



the various sources of error; nonetheless, most of us, ourselves



included, can make greater efforts at conducting evaluation of the



quality of survey-data and documenting indications of nonsampling



error that are likely to make a difference in longitudinal data



analysis.  Developing a "quality profile" summarizing in a



convenient form what is known about the sources and magnitudes of



errors in estimates should be done periodically for large



multipurpose longitudinal surveys.  See, for example, the SIPP



Quality Profile (Jabine, King, and Petroni, 1990).  Similarly,



knowledge of the existence of both methodological and substantive



research should be made available to the user community.  The NLS



has done this by publishing a bibliography of known research



(Center for Human Resource Research, 1989).



 



    Sponsors of longitudinal surveys should make available data



quality evaluations in whatever form deemed appropriate.  Duncan



and Hill (1989) it a formal refereed article assessed the



representativeness of the PSID sample and compared their survey



measures with program aggregates and aggregates from the Current



 



                                  402



 



Population Survey.  The SIPP now includes evaluations in the



technical documentation of the file, if such evaluations are



available when the file is released; otherwise, the evaluations are



issued as "User Notes" after the release of the file.



 



    Finally, several years ago the Social Science Research Council



(SSRC) sponsored a research conference whose aim was to foster



substantive analyses while providing information on the comparative



strengths and weaknesses of several longitudinal data sets (U.S.



Bureau of the Census, 1990).  Conferences of this type, where



program sponsors encourage comparative analysis of their data set,



help engage the policy and research community to fully appreciate



the strengths and weaknesses of each data base.  One hopes that



improved understanding of the data will result in better analysis.



Sponsors of large complex surveys ought to be encouraged to foster



more of these kinds of exchanges.



 



 



Endnote



 



    The Subcommittee enjoyed the benefits of the discussion of



individuals who played active roles in several large Federal



longitudinal surveys.  Through these discussions, OMB Statistical



Policy Working Paper 13 emerged.  The paper we developed for the



seminar on the "Quality of Federal Data" summarized in a rather



lengthy fashion Statistical Policy Working Paper 13.  Because of



page constraints, the paper above is a considerably condensed



version of the paper prepared for the seminar.  The long version of



the manuscript is available from Daniel Kasprzyk, Statistical



Methods Division, U.S. Bureau of the Census, Washington, D.C.



20233.



 



 



References



 



Bailar, B.A. (1989), "Information Needs, Surveys, and Measurement



Errors," in Panel Surveys (D.  Kasprzyk, G.J. Duncan, G. Kalton,



M.P. Singh, eds.), John Wiley and Sons: New York, 1-24.



 



Becketti, S., W. Gould, L. Lillard, and F. Welch (1983), Attrition



from the PSID, Santa Monica, California:  Unicon Research Corp.



 



Boruch, R.F. and R.W. Pearson (1985), The Comparative Evaluation of



Longitudinal Surveys, Social Science Research Council, New York.



 



Burgess, R.D. (1989), "Major Issues and Implications of Tracing,"



in Panel, Surveys (D.  Kasprtyk, G.J. Duncan, G. Kalton, M.P. Singh,



eds.), John Wiley and Sons:  New York, 52-74.



 



Cantor, D. (1989), "Substantive Implications of Longitudinal Design



Features,.  The National Crime Survey as a Case Study," in Panel



 



 



                                   403



 



Surveys (D.  KasprZyk, G.J. Duncan, G. Kalton, M.P. Singh, eds.),



John Wiley and Sons: New York, 25-51.



 



Center for Human Resource Research (1989), NLS Annotated



Bibliography: 1968-1989, Center for Human Resource Research, The



Ohio State University, Columbus, Ohio.



 



Cox B.G. and S.B. Cohen (1985), Methodological Issues for Health



Care Surveys, Marcel Dekker, New York.



 



Dicker, M. and R. Casady (1982), "A Reciprocal Rule Model for



Defining Longitudinal Families for the Analysis of Panel Survey



Data," Proceedings of the Social Statistics Section, American



Statistical Association, 532-537.



 



Duncan, G.J. and D.H. Hill (1989), "Assessing the Quality of



Household Panel Data:  The Case of the Panel Study of Income



Dynamics," Journal of Business and Economic Statistics, 7, 441-452.



 



Duncan, G.J. and M. Hill (1985), "Conceptions of Longitudinal



Household: Fertile or Futile," Journal of Economic and Social



Measurement, 13, 361-375.



 



Duncan, G.J. and G. Kalton (1987), "Issues of Design and Analysis



of Surveys Across Time," International Statistical Review, 55,97-



117.



 



Duncan, G.J., F.T. Juster, and J.N. Morgan (1986), "The Role of



Panel Studies in a World of Scarce Research Resources" in Survey



Research Designs:  Toward a Better Understanding of Their Costs



and Benefits (R.F. Boruch and R.W.,Pearson, eds.).  Lecture Notes



in Statistics No.38, Springer Verlag, New York, 94-129.



 



Duncan, G.J. and J.N. Morgan (1984), "Behavioral Research with the



Panel Study of Income Dynamics in Retrospect and Prospect,"



Vierteljahrshefte zur Wirtschaftsforschung, Dunker and Humblot,



Berlin, 415-427.



 



Ernst, L.R. (1989), "Weighting Issues for Longitudinal Household



and Family Estimates," in Panel Surveys (D.  Kasprzyk, G.J. Duncan,



G. Kalton, and M.P. Singh, eds.), John Wiley and Sons:  New York,



139-159.



 



Folsom, R., L. LaVange, and R.L. Williams (1989), "A Probability



Sampling Perspective on Panel Data Analysis," in Panel Surveys (D.



Kasprzyk, G.J. Duncan, G. Kalton, M.P. Singh, eds.), John Wiley and



Sons:  New York, 108-138.



 



Grasso, J. and A. Kohen (1978), "The National Longitudinal Surveys



Data Processing Systems," in The Survey of Income and Program



Participation:  Proceedings of the Workshop on Data Processing (D.



 



                              404



 



Kasprzyk, ed.), Office of the Assistant Secretary for Planning and



Evaluation, Department of Health and Human Services, II-33-II-53.



 



Jabine, T., and K. King, and R. Petroni (1990), Survey of Income



and Program Participation:  Quality Profile, U.S. Bureau of the



Census, Washington, DC.



 



Jean, A. and E. McArthur (1987), "Tracking Persons Over Time," SIPP



Working Paper No. 8701, U.S. Bureau of the Census.



 



Jean, A.C. and E. K. McArthur (1984), "Some Data Collection Issues



for Panel Surveys with Application to the Survey of Income and



Program Participation," Proceedings of the Section on Survey



Research Methods, American Statistical Association, 745-750.



 



Judkins, D.R., D.L. Hubble, J.A. Dorsch, D.B. McMillen, and L.R.



Ernst (1984), "Weighting of Persons for SIPP Longitudinal



Tabulations," Proceedings of the Section on Survey Research



Methods, American Statistical Association, 676-681.



 



Kalton, G. (1986), "Handling Wave Nonresponse in Panel Surveys,"



Journal of Official Statistics, 2, 303-314.



 



Kalton, G., D. Kasprzyk, and D.B. McMillen (1989), "Nonsampling



Errors in Panel Surveys," in Panel Surveys (D.  Kasprzyk, G.J.



Duncan, G. Kalton, M.P. Singh, eds.), John Wiley and Sons:  New



York, 249-270.



 



Kalton, G., and M. Miller (1986), "Effects of Adjustments for Wave



Nonresponse on Panel Survey Estimates," Proceedings of the Section



on Survey Research Methods, American Statistical Association, 194-



199.



 



Kish, L. (1987), Statistical Design for Research, John Wiley and



Sons:  New York.



 



Lepkowski, J. (1989), "Treatment of Wave Nonresponse in Panel



Surveys," in Panel Surveys (D Kasptzyk, G.J. Duncan, G. Kalton,



M.P. Singh, eds.), John Wiley and Sons:  New York, 348-374.



 



McArthur, E. (1988), "Measurement of Attrition through the



Completed SIPP 1984 Panel:  Preliminary Results," Internal Bureau



of the Census memorandum to D. Kasprzyk, March 4, 1988.



 



McArthur E. and K. Short (1985), "Characteristics of Sample



Attrition in the SIPP," Proceedings of the Section on Survey



Research Methods, American Statistical Association, 366-369.



 



McMillen, D.B. and R. Herriot (1985), "Toward a Longitudinal



Definition of Households," Journal of Economic and Social



Measurement, 13, 504-509.



 



 



                                  405



 



Mulvihill, J. and M. Lawes (1980), "Imputation Procedures for LFS



Longitudinal Files," Statistics Canada Internal Memorandum.



 



Neter, J. and J. Waksberg (1964), "A Study of Response Errors in



Expenditure Data from Household Surveys," Journal of the American



Statistical Association, 59,18-55.



 



Office of Management and Budget (1986), Federal Longitudinal



Surveys (Statistical Policy, Working Paper No. 13), National



Technical Information Service, PB86-139730.



 



Pearson, R. (1989).  "The Advantages and Disadvantages of



Longitudinal Surveys," Research in the Sociology of Education and



Socialization, Vol. 8, 177-199.



 



Petroni, R.J. and K.E. King (1988), "Evaluation of the survey of



Income and Program Participation's Cross-Sectional Noninterview



Adjustment Methods," Proceedings of the Section on Survey Research



Methods, American Statistical Association, 342-347.



 



Rhoton, P. (1986), "Attrition and the National Longitudinal Surveys



of Labor Force Behavior:  Avoidance, Control, and correction,"



IASSIST Ouarterly, 10(2).



 



Singh, R., V. Huggins, and D. Kasprzyk (1990), "Handling Wave



Nonresponse in Panel Surveys," paper presented at the Conference on



"Survey Design, Methodology, and Analysis," University of Essex,



Colchester, England, July 4-7, 1990.



 



U.S. Bureau of the Census (1990), Individuals and Families in



Transition:  Understanding Change Through Longritudinal Data.  Papers



presented at the Social Science Research Council Conference in



Annapolis, Maryland, March 16-18, 1988.  U.S. Bureau of the Census.



 



White, G.D. and H. Huang (1982), "Mover Follow-Up Costs for the



Income Survey Development Program," Proceedings of the Section on



Survey Research Methods, American Statistical Association, 376-381.



 



Woltman, H. and J. Bushery (1975), "A Panel Bias Study in the



National Crime Survey," Proceedings of the Social Statistics



Section, American Statistical Association, 159-167.



 



 



 



 



 



 



 



 



                                    406



 



    THE ADVANTAGES AND DISADVANTAGES OF LONGITUDINAL SURVEYS



 



                         Robert W. Pearson 



                  Social Science Research Council



 



 



Introduction



 



     Longitudinal surveys have existed for some time in the social



sciences.  A quick scan of research would find them employed at



least as early as 1928, when Stuart Rice studied the changing



presidential preferences of Dartmouth college students (Rice 1928).



Perhaps more readily recalled are Theodore Newcomb's classic



studies of the effects of a liberal environment at Bennington



College on young women from conservative families (Newcomb 1943).



Panel designs were further extended when Paul Lazarsfeld an



colleagues studied the 1940 U.S. presidential campaign through a



stratified random sample of about 2,400 Erie County, Ohio, citizens



(Lazarsfeld, Berelson, and Gaudet 1944).



 



     Longitudinal studies became especially prominent in the 1960s



and 1970s in the United States as the federal government turned its



attention and resources to a domestic public agenda in which



research and evaluation played an increasing part.  As the



technology of data collection, storage, and analysis developed, so



too did the call for and subsequent investment in longitudinal



surveys.  In the United States, for example, some 13 national



longitudinal surveys were conducted in the 1950s while 64 surveys



of this kind were carried out in the following two decades (Taeuber



and Rockwell 1982).  Panel studies quite simply permitted the study



of change that other study designs (principally, cross sectional



surveys) could not.  These surveys were asked to evaluate the



effects of social programs and to unravel the processes by which



individuals change.  The surveys facilitated the development of



several fields of inquiry, including -- but not limited to -- labor



economics, developmental psychology, voting behavior, and



evaluation research.  Conversely, theoretical and conceptual



developments within these fields called for the use of longitudinal



surveys.



 



 



                                   407



 



     The love affair with longitudinal data appears to have been



short lived, however.  This earlier affection has been replaced



with an increasing appreciation of the limits of longitudinal



surveys.  For example, the editors of a volume on longitudinal



analysis of labor market data would begin the volume provocatively



by saying,



 



     Longitudinal data are widely and uncritically regarded as



     a panacea.  Given the substantial cost of collecting such



     data, it is surprising that so little attention has been



     devoted to justifying the expense.  The conventional



     wisdom in social science equates "longitudinal" with



     "good," and discussion of the issue rarely rises above



     this level (Heckman and Singer 1985, p. xi).



 



     Similar questioning can be found in other fields of research.



For example, Hirschi and Gottfredson assert in their review of



research on the relationship between age and crime that "Funding



agencies seem convinced by researchers that the longitudinal study



is necessary for the proper study of crime" (Hirschi and



Gotttfredson 1983, p 582).  They argue instead that the causes of



crime are similar across age cohorts and that cross-sectional



designs are likely to produce more knowledge per dollar of research



than are longitudinal designs, which Hirschi and Gottfredson



believe to be relatively more costly to conduct (Greenberg 1985;



Hirschi and Gottfredson 1983, 1985; Murray and Erickson 1987).



 



     The recent concern with longitudinal or panel surveys stems in



part from the substantial investment in such data made during the



past 20 years.  There is also a suspicion that several important



panel studies have reached or have gone beyond their maximum



usefulness.  Members of the policy and research communities now



discuss these limitations as well as their comparative advantages



in the reflective mood that was catalyzed by reductions in the data



collection and social science budgets of the early part of the



Reagan administration (Pearson 1985).



 



     The purpose of this chapter is to review several of the



strengths and weaknesses of longitudinal surveys that have emerged



from these discussions.  The chapter will make special note of the



manner in which these research designs have been oversold on one



hand and underused on another.  The chapter will discuss the



several advantages and disadvantages of these instruments of social



observation and draw attention to several claims about these data



collection strategies that appear to be not well established, even



if widely believed.



 



     The principal point of the chapter is a relatively simple one



-- which survey design is most appropriate for a particular purpose



is a complicated function of a large number of factors.  These



include, but are not limited to, the use to which the data are put,



the cognitive capacities and interests of respondents, legal and



 



                               408



 



ethical restraints on the study of human subjects; the nature and



quality of theories or assumptions about social processes and



behavior; and the inferential abilities of different research



designs.  Unfortunately, these simple points are too often ignored.



The research literature and the decisions concerning the choice of



research designs appear to have become increasingly interested in



choosing one rather than another design.  Too little attention is



paid to their fruitful combination, both within the survey research



tradition -- the focus of this chapter -- and between this



tradition and more qualitative research approaches.



 



 



The Advantages and Disadvantages of Longitudinal Surveys



 



     Discussions of the advantages and disadvantages of a



particular research designs are difficult to conduct in the



abstract.  This is so for several reasons.  First, the discussion



needs to be framed in a comparative perspective.  Is the question



one of the relative advantages and disadvantages of one



longitudinal panel vs. another?  Is it the relative merits of



longitudinal vs. other designs?  The former question faces



secondary analysts of existing survey data.  The latter question --



the principal focus of this chapter -- confronts those who sponsor



and design research.



 



     These are often two distinct, though overlapping, levels of



concern.  Users of such data may find several surveys that are



ostensibly relevant to a given topic, but have few tools for



judging the equivalence of their measures.  It is difficult to



confirm, validate, and replicate research results across surveys.



Paralleling the users' concerns, those who fund or design surveys



must consider which studies to initiate, maintain, or terminate,



and for what reasons?  What combination of ongoing data collection



programs will meet the present and future needs of research and



public policy?  Clearly, legitimate replication may be hard to



distinguish from unnecessary redundancy.  Although reliance on a



single data source invites biased or inconclusive results,



investments in similar or equivalent data series are likely to



yield diminishing returns.  Put briefly, each additional instrument



may not lead to an equally valuable increment in knowledge.



 



     Second, discussions of the advantages and disadvantages of



longitudinal surveys (and other research designs as well) are



difficult because their evaluation depends on a variety of



conditions.  These conditions include:



 



o      The questions one wishes to answer.



 



o      The skills and analytic competences of the investigator



       or the "user friendliness" of the data.



 



 



                                409



 



   



o      The sample size, target population, substantive content,



       and design of the survey.



       



o      The timeliness of the survey.



 



o      The quality of the information.



 



o      The documentation and dissemination of the data. 



 



     The evaluation of longitudinal surveys as well as other survey



research designs also depends on subtler factors.  For example,



there are substantial costs associated with gaining a working



knowledge of the structure (and anomalies) of a large data set,



costs that are not entirely transferable to another survey.  These



impediments to use are frequently confronted by analysts because



many data collection programs do not devote resources to the



creation of adequate documentation, data-based management



structures, or the creation and distribution of users' access



utility programs or constructed variables (David 1980, 1985).



 



     Many analysts use several different longitudinal data sets in



their research.  But when they do, they are often aided by



students, research assistants, and computational facilities that



minimize the costs of doing so.  That is to say, the use (and



usefulness) of these, and other relatively large data sets or



instruments cannot be considered apart from a wider set of



instrumentalities which include students, assistants, training



programs, computational and analytical technologies, instructional



materials, and the availability of research funds for secondary



analysis.



 



     Standards or guidelines for the conduct of longitudinal



surveys exist (Bailar and Lanphier 1978; Boruch and Pearson 1988).



These guidelines cannot be used, however, a priori to compare one



longitudinal survey to another because such evaluation relies



heavily on the uses to which the results or findings of the studies



are to be put.  If the findings of a study are known before the



data are collected, there would be little need to conduct the



study. (For similar conclusions concerning the intractability of



judging the relative value of different data, see David and Peskin



1984.)



 



     Comparisons of longitudinal with other research designs are



difficult because many of the advantages and disadvantages of panel



designs are shared by other designs.  Nonresponse, confidentiality,



data access are problems or concerns that face each (Boruch and



Cecil 1979).  Moreover, some disadvantages or difficulties posed by



longitudinal surveys are also part of their strength.  For example"



how one defines and measures such ever-changing phenomena as the



"family" is a problem that accompanies the increased ability to



conceptualize and measure these dynamic phenomena (Koo 1985; Citro



and Watts 1985 Citro, Hernandez, and Moorman 1986).



 



                                 410



 



      Equally important, some problems or disadvantages can be



avoided or minimized if anticipated and if appropriate quality



control mechanisms are built into the technology.  For example,



sample attrition of panel members can be reduced if sufficient



attention and resources are devoted to collecting information from



sample-respondents about friends or relatives who are likely to 



know where a respondent may move between waves of an interview.



The effects of attrition can be monitored and, through imputational



or weighting algorithms, compensated for during the analysis of the



data.



 



      The comparison of different survey research designs are often



inappropriate because they tend to criticize one research design



while more or less explicitly extolling the virtues of an



alternative, as if their discussion was part of a debate in which



it was important that one type of research design "win", while



others "lose".  We should instead begin by agreeing that different



designs can in principle be combined to take advantage of their



relative merits and to overcome their relative disadvantages.  One



ought to ask what combination is most effective or efficient for



answering one's questions rather than which one research design is



best.



 



      Although comparisons of the relative advantages of



longitudinal surveys should be made cautiously, research and



experience suggest that longitudinal surveys have several generic



advantages and disadvantages that are relatively well established.



 



      The advantages of longitudinal designs include, for example:



 



o      The development of reliable measures of individual



       change.  (Retrospectively collected data are subject to



       telescoping, memory decay, etc.)  Similarly, these



       designs permit the measurement of subjective phenomena as



       current states rather than as recalled states.  (Consider



       the difficulty of asking a respondent to rate his or her



       health or happiness four years ago.)



 



o      The development of concepts that are characteristically



       dynamic rather than static.  (The burgeoning



       multidisciplinary research on life-course perspectives



       owes part of its vitality to the creation and



       distribution of panel studies.  See, for example, Baltes



       1979; 1983.)



 



o      Better descriptions of the dynamics of change.  (The



       typical episode of family poverty or welfare receipt has



       been shown htrough panel data to be considerably briefer



       than was assumed in studies using repeated cross-



       sectional surveys.  See, for example, Dunca et al. 1984;



       Corcoran et al. 1985; and Duncan, Hill, and Hoffman



       1988.)  Similarly, longitudinal designs permit the



 



 



                                  411



 



      estimation of individual levels and rates of transition



       between states or conditions for which cross-sectional



       data may only provide gross or aggregate measures of



       group change.



 



o      The ability to conduct analyses that control for



       unmeasured attributes of individuals, thus improving the



       ability to distinguish between the influence of enduring



       individual differences (e.g., race and gender) and the



       influence of having previously experienced the condition



       that is under investigation (e.g., previous unemployment



       leading to current unemployment).



 



       The disadvantages of panel designs include:



 



 o     Nonresponse bias (especially through panel attrition) may



       be high and analytically troublesome.  (Respondents for



       whom subsequent interviews cannot be completed may differ



       in analytically important ways from those who remain in



       the survey.)



 



 o     Response and learning effects (i.e., "panel effects") may



       prejudice responses.  (People who are interviewed about



       their voting behavior tend to vote more frequently



       thereafter.)



 



 o     Errors in the measurement of variables (and the



       correlation of these errors) and changes in the accuracy,



       reliability, and validity of such measures may spuriously



       create the appearance of change.



 



 o     Panel data, unless regularly refreshed or augmented, may



       provide useful or accurate estimates of the population



       from which the original sample was drawn, but not from



       the current population, which may be of interest.



 



 o     Panels always involve a moving target.  Panel surveys of



       families, for example, must cope with movement into and



       out of families, the formation of new ones, and the



       dissolution of old.



 



      Let us consider in more detail the first of the listed



advantages and discuss several features often included in such



lists that are not well established. (For several discussions of



the strengths and weaknesses of longitudinal surveys see, for



example, Ashenfelter and Solon 1982; Boruch and Pearson 1988;



Duncan, Juster, and Morgan 1984; Duncan and Kalton 1985; Fienberg



and Tanur 1987a; and Subcommittee on Federal Longitudinal Surveys



1986.)



 



      The limits of retrospection.  The repeated observations of



longitudinal surveys permit an investigation of change in phenomena



 



                                   412



 



that can be measured in the present.  They rely less than, say, a



single cross-sectional survey design on the memory of respondents'



prior conditions.  This principal limitation of the ability of



cross-sectional research designs to assess individual change is one



of the major relative advantages of longitudinally designed



studies.



 



    Increasing evidence and recent theoretical developments in



cognitive psychology and survey methodology question, in more



sophisticated ways than in the past, the trustworthiness of



retrospective -- or memory-based -- responses to survey questions.



Some research has found that certain kinds of memory-based data are



flawed not only by temporal confusion and forgetting, but are



systematically influenced by the respondent's current emotional



state and beliefs about life and self.



 



    Memory is basically reconstructive (cf.  Bartlett 1932).  And



this reconstruction often involves the "top down" processing of the



past that includes the development or use of scripts and narratives



about the self or society, as well as the organization of details



about the past.  These scripts, schemata, self narratives, or,



stereotypes define more or less coherent sets of beliefs around



which more detailed images are actively (although not necessarily



consciously) organized or distorted.  If by virtue of sharing a.



common culture, the respondent's schemata or theories of self or



society are the same as those of the questioner (e.g., that adult



mental distress follows from childhood problems), then research



that relies on retrospective questioning techniques typically found



in cross-sectional surveys may be systematically biased in the



direction the questioner expects.  The resulting "theory



validation" of retrospective studies may simply be the result of



widely -- even if only implicitly -- shared cultural stories, 



narratives, stereotypes, or folklore whose accuracy is unknown and



unprovable (Dawes and Pearson 1987).



 



    Several studies substantiate-this conclusion.  In two separate



but similar experiments, for example, Conway and Ross (1984)



examined randomly selected participants in a program designed to



improve study skills and a control group of nonparticipants who



indicated a desire to participate in the program but who were



placed on a waiting list.  Participants and control group members



were questioned both before the beginning of the study skills



program and at its conclusion.  At both times, they were asked to



assess their own study skills (e.g, how much of their study time



was well spent, how satisfactory were their note taking skills,



etc.) and the amount of time they studied.  At the second interview



they were also asked to recall what they reported during the first



session concerning skills and study time.



 



     At the initial interview, participants and control group



members did not significantly differ on any measure of skill, study



time, or on additional information about grades on a psychology



 



                                 413



 



examination taken prior to the study skills program.  Nor did these



two groups differ in their recall of hours spent studying; there



was a slight tendency for subjects in both conditions to recall



studying less than they initially reported.



 



    Recall of skills produced marked differences, however.



Program participants recalled their study skills as being



significantly worse than they initially reported.  On the average,



waiting list subjects recalled their study skills as being



approximately the same as those they reported initially (p. 743).



Participants in the study skills program appeared to exaggerate



their improvement in a direction consistent with their theories of



what ought to be -- taking a course should improve skills -- but



they did so by retrospectively derogating their initial status.



They did not exaggerate their current skills, but reconstructed



their memory of the past to combine: (1) a theory that they should



have improved because of the instruction and (2) a relatively



accurate assessment of their current level of skills.  In both



studies, the study skills program did not have a significant effect



on academic performance, as measured by subsequent psychology



examinations or average grades for the semester.  The recall of



past events and conditions were in error, and these errors were in



a direction that was consistent with what the students thought that



the past should have been as a result of their current conditions



and prior participation in a study skills program.



 



    Survey research has become increasingly aware of the



distortions and misrepresentations of the past that are engendered



by retrospective questions (cf., Turner and Martin 1984, p. 296;



Sudman and Bradburn 1982, pp. 43-51; Schuman and Kalton 1986, pp.



644-647).  In a recent validation study of employment-related



information, for example, Mathiowetz (1986) and Duncan and



Mathiowetz (1985) found that when a firm's employees were asked (in



July 1983) whether they had been unemployed at any time during 1981



and 1982, 15 percent were in error concerning 1981 and seven



percent were in error concerning 1982.  Validation studies of



retrospective reports have also observed substantial error in the



recall of hospitalizations (Cannell, Fisher, and Baker 1965) and of



victimizations (Turner 1972).



 



    Obviously, longitudinal, research designs themselves may rely



upon recall, as well as other cognitive processes.  Their



dependence on retrospective accounts is in part a function of the



length of time between waves of a panel study and the need to



measure experiences prior to the first interview.  Panel designs



have the advantage of providing opportunities to employ bounded



recall techniques (asking respondents to recall events since the



last interview) and to use information provided in a previous



interview to reinstate prior context and to provide cues to



facilitate their recall.  The relative advantage of longitudinal



surveys in this regard is often grounds for choosing this design



over cross-sectional surveys.



 



                                 414



 



    Costs: A red herring.  The list of strengths and shortcomings



of longitudinal research provided above is not exhaustive.  But it



excludes their cost as a relative disadvantage.  One can find



numerous references to the expensiveness of longitudinal designs



(cf., Murray and Erickson 1987, p 109), a belief that appears to be



widespread.  Unfortunately, this belief is not well established And



the limited attempts to empirically assess the relative costs of



longitudinal and cross-sectional surveys have shown under certain



assumptions that longitudinal surveys may be less expensive than



repeated cross-sectional surveys (Duncan, Juster, and Morgan 1984).



No one can argue that surveys such as the PSID, HS&B;, and NLS72 are



relatively expensive instruments to create and maintain.  But these



costs are largely a function of the number or special character of



sample members required by the study; not necessarily their



longitudinal design.



 



    Surely, longitudinal surveys require the added expenses of



tracing and tracking respondents as they move between waves of



interviews, costs that are unique to a research design that follows



subjects through time.  Locating and securing the cooperation of



sample respondents during an initial interview, however, reduces



the costs associated with drawing new sampling frames or screening



households for the desired universe of sample members.  These



features also permit the use of relatively less expensive modes of



administering subsequent waves of the, survey (e.g., phone, mail



back questionnaires) than may be required in cross-sectional



national samples.



 



    Evaluating the relative costs of longitudinal surveys depends



a great deal on what one chooses to compare them to.  In this



regard, we are faced with the difficulties posed by the proverbial



comparison of apples and oranges.  It is only suggestive -- but



nonetheless in opposition to the belief about their expense -- that



the average field cost of completed interviews of the 1987 General



Social Survey of NORC was $400.  The average costs 



of each completed interview of the 10th wave of the Youth Cohort of the



National Longitudinal Survey of Labor Market Experience



was $333 (Carter 1987).  Similarly, the total cost of the first year of



interviews of the National Post Secondary Student Aid Study of 1987



was $7.2 million; its first year follow-up is currently estimated



to cost $3.0 million (Carroll 1987).



 



 



                                 415



 



     The comparative costs of different survey designs compound the



difficulty of simultaneously weighing their advantages and



disadvantages.  A relatively ambiguous attitude toward panel



surveys in assessing the effects of job training programs, as



suggested for example by Heckman and Robb (1985), could be turned



on its head by altering assumptions about the comparative costs of



these different designs.  Indeed, if one assumes relatively equal



expenses, or cost advantages to panel designs, it would be prudent



to select panel rather than cross-sectional designs (holding a



great many other factors constant) because panel designs permit the



use of a wider variety of statistical and theoretical assumptions.



The application of a wider range of assumptions provides a useful



means of testing how sensitive conclusions are to different



assumptions.  That is to say, one can Analyze a panel study as if



it was a repeated cross-sectional design, but not vice versa.



 



     The largest costs of such studies lie in the creation and



maintenance of the organization that is required to collect the



data and in the burden which such surveys impose on their



respondents.  This commitment of resources is largely fixed and



shared with other large survey-based designs; whether panel,



experimental, or cross-sectional.  Ongoing instruments of data



collection are more likely to present opportunities for linkage or



augmentation with side studies, experiments, topical modules than



are "one-shot" data collection programs, which single cross-



sectional surveys can often be, an advantage to which we return



below.



 



     Causal Inference: An oversold advantage.  Longitudinal survey



designs do not, however, as is often incorrectly claimed, permit



unequivicable inferences about causation.  Surely, the temporal



dimension of longitudinal surveys provides strong priors for



assuming that a leads to or causes b if a is observed to occur



before b.  But there are several dangers in making such strong



causal inferences from panel designs.  One's anticipation of future



events can influence current behavior, for example.  And selection



biases (i.e., people found in a program often differ in



unmeasureable or unmeasured ways from nonparticipants) invariably



trouble the estimation of program effects.



 



     Heckman and Robb (1985), for example, examined three survey



designs and associated econometric techniques to determine whether



one was "better" than the others in assessing the consequences of



a public policy interventions, e.g. a, youth employment training



program.  They compared (1) single retrospective cross-sectional,



(2) repeated cross-sectional, and (3) panel designs and their



corresponding analytical techniques.  Heckman and Robb showed that



each design and corresponding analytical technique requires



untestable assumptions in evaluating the earnings effects of



participation in training programs.  Their research argued that



many of the assumptions of cross-sectional analytical techniques



were no more or less justifiable than those upon which the panel



 



                                416



 



designs were based, although some assumptions could be -- although



too infrequently are -- the object of independent study.



 



    Although panel studies permit one to trace spells and



transitions and to order conditions in sequences that suggest



causation, longitudinal surveys cannot do so without the aid of



assumptions.  This point is forcefully illustrated by Lord's



paradox (1967, 1968, 1973) and its discussion by Holland and Rubin



(1986).



 



     Attempts to understand how different models generate the same



data, or how similar models can generate or represent different



data, have produced a greater sensitivity to the problems of making



causal inferences from the research designs of panel studies (and



other observational designs such as matched comparison groups and



cross-sectional surveys).  Fraker and Maynard (1985, 1987), for



example, analyzed data from several sources to compare the



estimated earnings effects from participating in an employment and



training program.  They compared estimates of training effects



derived from (1) control groups of the National Supported Work



Demonstration program that were selected in accordance with



experimental research designs and (2) comparison groups constructed



from the Current Population Survey.  Matched comparison designs



involve the creation of samples of respondents (typically drawn



from such surveys as the Current Population Survey) who are similar



in important respects to the participants in a program that one



seeks to evaluate.  These research designs are common in program



evaluations in part because information about program participants



is regularly collected at their enrollment or discharge from the



program.  Experiments in which a number of eligible individuals are



randomly precluded from participating in a program (and later



compared to those who are allowed to enroll) on the other hand are



at times difficult to conduct or proscribed by ethical and legal



considerations.



 



     Fraker and Maynard's comparisons of experimental versus



nonexperimental estimates of training program effects on annual



earnings showed that comparison group procedures and analytical



models produced estimates of large negative effects on the earnings



of youth both during the program's employment period and after.



The experimental design revealed estimates of program earnings for



youth that were modestly positive during the program, and



negligible thereafter.  Comparisons of the effects of training on



AFDC recipients revealed similar positive effects between the



experimental and matched comparison designs.  The differences in



results between unemployed youth and AFDC recipients suggest that



the greater earnings and employment variability of youth may result



in more biased selection into the employment program, which in turn



makes the task of defining a comparison group and an analytical



model more difficult.  Corroborative evidence to this work can be



found in LaLonde (1986).



 



                                    417



 



    The implication of these marked differences in results is that



the longitudinal and cross-sectional designs (or other



nonexperimental designs) alone do not permit one to unravel the



many causes and consequences of social and economic change or of



program interventions.  Perhaps more disturbing to those who must



rely on such data, these research designs may produce the wrong



answer when the behavior of the population under study is



undergoing considerable change (as are the employment activities of



youth).  Experimental designs are superior to panel designs in



making causal inferences.



 



    Rarely is the use of or investment in data made "on purely



statistical grounds" alone, however.  In addition to costs, choices



are constrained by legal, ethical, and administrative



considerations (Riecken et al. 1974).  Considerable experience



(much from studies of states and municipalities) has produced a



greater appreciation of many of the difficulties of importing



laboratory-oriented experimental designs into the field.  It is



often difficult, for example, to sustain the separation of



treatment and control groups in the field.  Moreover, some of the



problems associated with the design and implementation of panel



studies, such as attrition, apply equally to experimental designs



(Betsey, Hollister and Papagiorgiou 1985).  Experiments, are useful



for assessing the relative differences among program variations on



a common set of outcome variables.  But experimental designs have



their own scientific and administrative shortcomings.  For example,



treatments are often limited to a narrow set of variables and to



specialized samples, and so their results may be of limited



generalizability.  Moreover, they are of ten difficult to administer



and require substantial managerial skills to conduct.  These



limitations, among others, have retarded the use of experimental



designs in the social sciences and in the evaluation of government



programs.  On the other hand, the design, implementation, and



analysts of field experiments is possible, and some evidence exists



of a renewed interest in them (c.f. Maynard 1987; Bloom, Borus, and



Orr 1987; and Cottingham and Rodriguez 1987).



 



    Coupling experimental and longritudinal designs:  a not fully



realized potential.  Longitudinal surveys are often an appropriate



technology for describing the timing, duration, and sequence of



individual change.  And they are often better in this regard than



alternative nonexperimental observational research designs because



of the problems these alternatives confront when relying on



retrospective measures of past conditions.  Under certain



conditions, longitudinal surveys appear to be no more expensive to



conduct than repeated cross-sectional surveys.  But their ability



to draw causal inferences has in general been overdrawn.  Although



temporal order provides prima facie evidence for causation, it is



insufficient.



 



    Increasingly, the research community is considering the fusion



of longitudinal and experimental survey designs in which randomly



 



                                 418



 



assigned treatments or interventions are given to some members of



an ongoing longitudinal survey.  Coupling experiments and



longitudinal surveys capitalizes on the strongest merits of each



design.  That is, one obtains both the information produced by



national probability samples -- often conducted over a considerable



length of time -- and the information produced by smaller



comparative experiments in which causal inferences are more



appropriately deduced.  Insofar, as the experiments can be adjoined



systematically, their generalizability will be enhanced.



 



    Joining experiments to ongoing longitudinal surveys also



permits one to use the experiments to calibrate estimates of



program effects that are derived entirely from the longitudinal



survey.  That is, the biases engendered by using estimates that are



based on longitudinal data can be assessed, and periodically



corrected, through controlled experiments.  Thus, longitudinal



studies are likely to be more policy-relevant and less ambiguous



with respect to biases in estimating program effects.  Experiments



are likely to benefit from their greater generalizability, lower



costs, and more manageable administration.



 



    There is no doubt about the need for social experiments in



understanding change (Berk et al. 1985).  The National Academy of



Sciences' Committee on Youth Employment Programs, for example,



examined major studies to understand whether one could draw firm



conclusions about program effects from earlier research.  The



committee concluded, among other things, that longitudinal surveys



ate no substitute for randomized experiments when the object is to



estimate the effectiveness-of youth employment programs.  Moreover,



they urged the use of randomized experiments for this purpose



(Betsey, Hollister, and Papagiorgiou 1985).



 



    Coupling randomized designs to longitudinal surveys can also



be traced to a technical advisory committee for employment program



evaluation appointed by the U.S. Department of Labor.  The DOL



sought to learn whether analyses of manpower programs based on



conventional longitudinal surveys lead to adequate estimates of



program effects.  Adequacy was assessed, for example, by comparing



estimates of effects based on longitudinal surveys against



estimates based on randomized trials.  The conclusion of this



exercise was that the two estimates are not always in accord.



Indeed, they differ remarkably depending on what population is the



subject of inquiry (Fraker and Maynard 1985, 1987).



 



    Obviously, changes in standard practices that are suggested



here would introduce costs and difficulties, at least until their



implementation permitted organizations to identify and remedy the



problems that naturally arise with any new technology.  And surely,



there are a number of programs that could not be evaluated through



such a coupling of designs because of the nature of the



intervention or the limited number or location of possible



respondents even in relatively large national longitudinal surveys.



 



                               419



 



Randomly varying policy responses to violent domestic disputes



 could not be comfortably grafted onto High School and Beyond



 (HS&B;), for example.  Unfortunately, the development and



 Application of this general strategy has yet to adequately tested.



 



 Summary and Conclusion



 



      Longitudinal surveys are an important technology for the



 measurement of individual change and development.  Considerable



 resources have been devoted during the last two decades to their



 creation and maintenance.  These instruments of social observation



 have contributed a great deal to the development of several fields



 of inquiry, and promise to continue to do so.



 



      Recent years have seen a growing restlessness with these



 research designs, however.  Their limitations especially those



 related to causal inference -- are increasingly recognized,



 although their relative strengths have continued to argue for their



 use as important new instruments of data collection.  Their support



 and criticism is the healthy consequence of the continual scrutiny



 that a principal tool of social analysis should undergo.



 



      Their relative strengths, however, have not yet been



 systematically and regularly coupled with the strengths of another



 research design -- experiments.  The promise of combining these



 methods and of moving beyond the discussion of the strengths and



 weaknesses of a particular research design still lays before us.



 



 References



 



 Ashenfelter, O. and G. Solon. 1982.  Longitudinal labor market



 data:  Sources, uses, and limitations.  A paper presented at a



 conference sponsored by the National Council on Employment Policy,



 An Assessment of Labor Force Measurements for Policy Formulation,



 Washington, D.C. (June).



 



 Bailar, B. A. and C. M. Lanphier.  1978.  Development of Survey



 Methods to Assist Survey Practices.  Washington, D.C.:  American



 Statistical Association.



 



 Baltes, P. B.  1979.  Life-span developmental psychology:  Some



 converging observations on history and theory.  In Life-Span



 Development and Behavior.  Vol. 2. eds.  P. B. Baltes and 0. B.



 Brim, Jr. New York: Academic Press.



 



Click HERE for graphic.                          



 



 



                                  420



 



Bartlett, F. C. 1932.  Remembering:  A Study in Experimental and



 Social Psychology.  Cambridge:  Cambridge University Press.



 



 Berk, R. A. et al. 1985.  Social policy experimentation.



 Evaluation Review 9:387-429.



 



 Betsey, C., R. Hollister, and M. Papagiorgiou.  1985.  Report of



 the Committee on Youth Employment Programs.  Washington, D.C.:



 National Research Council.



 



 Bloom H. S., M. E. Borus, and L. L. Orr.  1987.  Using random



 assignment to evaluate an ongoing program:  The National TTPA



 Evaluation.  A paper presented at the annual meeting of the



 American Statistical Association, San Francisco (August 17-20).



 



 Boruch, R. F. 1975.  Coupling randomized experiments and



 approximations to experiments in social program evaluation.



 Sociological Methods and Research. 4:31-53.



 



 Boruch, R. F. and H. W. Riecken. eds. 1975.  Experimental Testing



 of Public Policy.  Boulder, Colorado:  Westview Press.



 



 Boruch, R. F. and J. S. Cecil. 1979.  Assuring the Confidentiality



 of Social Research Data.  Philadelphia, PA:  University of



 Pennsylvania Press.



 



 Boruch, R. F. and R. W. Pearson. 1988.  Assessing the Quality of



 Longitudinal Surveys.  Evaluation Review In press.



 



 Cannell, C. F., G. Fisher, and T. Bakker. 1965.  Reporting of



 hospitalizationin the Health Interview Survey, Vital and Health



 Statistics, Series 2, No. 6. Washington, D.C.:  U.S. Government



 Printing Office.



 



 Carroll, D. 1987.  Personal communication, December 8, 1987.



 



 Carter, W. 1987.  Personal communication, December 7, 1987.



 



 Citro, C.F., and H. W. Watts. 1985.  Patterns of Household



 Composition and Family Status Change.  Paper presented to the



 American Economic Association, New York, New York.



 



 Citro, C. F., D. J. Hernandez, and J. E. Moorman.  1986.



 Longitudinal household concepts in SIPP.  Paper presented to the



 American Statistical Association, Chicago, Illinois, May 30.



 



 Conway, M. and M. Ross. 1994.  Getting what you want by revising



 what you had.  Journal of Personality and Social Psychology



 47:738-748.



 



 



 



 



                                 421



 



Corcoran, M. E., G. J. Duncan, G. Gurin, and P. Gurin.  1985.  Myth



 and reality:  The causes of persistence of poverty.  Journal of



 Policy Analysis and Management.



 



 Cottingham, P. and A. Rodriguez.  1987.  The experimental testing



 of the Minority Female Single Parents Program.  A paper presented



 at the annual meeting of the American Statistical Association, San



 Francisco (August 17-20).



 



 David, E. L. and H. M. Peskin.  1984.  Theory of an optimal



 database.  Review of Public Data Use 12:45-53.



 



 David, M. 1980.  Access to data:  The frustration and utopia of



 the researcher.  Review of Public Data Use 8:327-337.



 



Click HERE for graphic.                          



 



 Dawes, R. M. and R. W. Pearson. 1987.  The efffect of the present



 on retrospective data: Measuring the then, now.  New.York:  Social



 Science Research Council, mimeo.



 



 Duncan, G. J., F. T. Juster, and J. N. Morgan. 1984.  The role of



 panel studies in a world of scarce research resources.  In The



 Collection and Analysis of Economic and Consumer Behavior Data: In



 Memory of Robert Ferber, eds.  S. Sudman and M. A. Spaeth.



 Champagne, Illinois:  Bureau of Economic and Business Research.



 Duncan, G. J., R. Coe, M. E. Corcoran, M. Hill, M. S. Hoffman, and



 J. M. Morgan.  1984.  Years of Plenty, Years of Hope.  Ann Arbor:



 Survey Research Center, University of Michigan.



 



 Duncan, G. J. and G. Kalton. 1985.  Issues of design and analysis



 of surveys across time.  A paper presented at the centenary session



 of the International Statistical Institute, Amsterdam.



 



 Duncan, G. J. and N. A. Mathiowetz. 1985.  A validation study of



 economic survey data.  Ann Arbor, MI: Institute for Social



 Research, mimeo.



 



 Duncan, C. J., M. S. Hill, and S. D. Hoffman.  1988.  Welfare



 dependence within and across generations.  Science. 239:467-471.



 Fienberg, S., B. and J. Tanur.  1986.  From the inside out and the



 outside in:  Combining experimental and sampling structures.



 Technical Report No. 373, Carnegie Mellon University (December).



 ____.  1987a.  The design and analysis of longitudinal surveys:



 Controversies and issues of costs and continuity.  In Designing



 Research With Scarce Resources, eds.  R. F. Boruch and R. W.



 Pearson.  New York:  Springer-Verlag.



 



                                 422



 



____.  1987b.  Experimental and sampling structures:  Parallels



diverging and meeting.  International Statistical Review 55:75-96.



 



Fraker, T. and R. Maynard.  1985.  The use of comparison group



designs in evaluation of employment related programs.  Princeton,



N. J.:  Mathematica Policy Research, mimeo.



 



____.  1987.  The Study of comparison group designs for



evaluations of employment-related programs.  The Journal of Human



Resources 22: 194-227.



 



Greenberg, D. F. 1985.  Age, crime and social explanation.



American Journal of Socilogy 91, 1-21.



 



Heckman, J. J. and R. Robb, Jr. 1985.  Alternative methods for



evaluating the impact of interventions.  In Longitudinal Analysis



of Labor Market Data, eds.  J. J. Heckman and B. Singer, 156-246.



New York:  Cambridge University Press.



 



Heckman, J. J. and B. Singer. eds. 1985.  Longitudinal Analysis



of Labor Market Data.  New York:  Cambridge University Press.



 



Hirschi, T. and M. Gottfredson.  1983.  Age and the explanation of



crime.  American Journal of Sociology 91, 359-374.



 



____.  1985.  Age and crime, logic and scholarship:  Comment on



Greenberg.  American Journal of Sociology 91, 22-27.



 



Holland, P. W. and D. B. Rubin. 1986.  Research designs and causal



inferences:  On Lord's Paradox.  In Survey Research Designs:



Towards a Better Understanding of Their Costs and Benefits. eds.



R. W. Pearson and R. F. Boruch, 7-37.  New York: Springer-Verlag.



 



Koo, H. 1985.  Short-term change in household and family



structure.  Paper presented to the American Statistical



Association, Las Vegas, Nevada.



 



LaLonde, R. 1986.  Evaluating the Econometric evaluations of



training programs with experimental data.  American Economic Review



76 (4):604-20



 



Lazarsfeld, P. F., B. Berelson, and H. Gaudet. (1944) 1960.  The



People's Choice:  How the Voter Makes Up His Mind in a Presidential



Campaign.  2nd ed.  New York:  Columbia University Press.



 



Lord, F. M. 1967.  A paradox in the interpretation of group



comparisons.  Psychological Bulletin 68:304-305.



 



____.  1968.  Statistical adjustments when comparing preexisting



groups.  Psychological Bulletin 72:336-337.



 



 



 



                                423



 



 ____.  1973.  Lord's paradox.  In Encyclopedia of Educational



 Evaluation.  Anderson, S. B. et al.  San Francisco:  Jossey-Bass.



 



 Mathiowetz, N. A.  1986.  Episodic recall and estimation:



 Applicability of cognitive theories to survey data.  Paper



 presented at a Seminar on the Effects of Theory-Basedf Schetas on



 Retrospecitve Data, June 26-28, New York:  Social Science Research



 Council.



 



 Murray, G. F. and P. G. Erickson.  1987.  Cross-sectional versus



 longitudinal research:  An empirical comparison of projected and



 subsequent-criminality.  Social Science Research 16, 107-118.



 



 Newcomb, T. M. (1943) 1957.  Personality and Social Change:



 Attitude Formation in a Student Community.  New.York:  Dryden.



 



 Pearson, R. W.  1985.  The changing fortunes of the U.S.



 statistical system, 1980-1985.  Review of Public Data Use



 12:245-269.



 



 Rice, S. A. 1928.  Quantitative Methods in Politics.  New York:



 Knopf.



 



 Riecken, H. W. et al. 1974.  Social Experimentation.  New York:



 Academic.



 



 Schuman, H. and G. Kalton. 1986.  Survey methods.  In The Handbook



 of Social Psychology, 3rd ed. eds.  G. Lindzey and E. Aronson,



 635-697.  Reading, MA:  Addison-Wesley.



 



 Subcommittee on Federal Longitudinal Surveys, Federal Committee on



 Statistical Methodology. 1986.  Federal Longitudinal Surveys.



 Washington, D.C.:  Office of Management and Budget.



 



 Sudman, S. and N. M. Bradburn.  1982.  Asking Questions.  San



 Francisco,.  Jossey-Bass.



 



 Taeuber, R. and R. C. Rockwell. 1982.  National social data



 series: A compendium of brief descriptions.  Review of Public Data



 Use 10:23-111.



 



 Turner, A. G. 1912.  The San Jose methods test of known crime



 victims.  Washington, D.C.:  National Criminal Justice Information



 and Statistics Service, Law Enforcement Assistance Administration,



 U.S. Department of Justice.



 



 Turner, C. F. and E. Martin. eds.  1984.  Surveying Subjective



 Phenomena, Volume 1. New York:  Russell Sage Foundation.



 



 



 



 



 



                                    424



 



          LONGITUDINAL ANALYSIS OF FEDERAL SURVEY DATA



 



                             Patricia Ruggles



                         Joint Economic Committee



 



 



I. Introduction



 



    Longitudinal panel data provide a unique opportunity to



examine patterns and sources of economic and demographic change at



the individual and family level.  These data are relevant to a host



of policy issues, from the assessment of welfare program



participation to an understanding of patterns of health care usage



or of the determinants of retirement.  Many policy issues require



some understanding of the factors that lead up to a particular



event, or of the consequences that stem from it.  Without repeated



observations of the individuals concerned, however, such factors



and consequences can only be inferred.  Thus, our increasing store



of longitudinal panel data holds the potential for major



breakthroughs in our understanding of the basic determinants of



economic and demographic change as they affect individuals and



families over time.



 



    Unfortunately, however, many of our longitudinal data sets



have been somewhat under-used by researchers so far, especially



compared to similar cross-sectional surveys.  To some extent this



under-usage may simply stem from the fact that many of these data



sets are still fairly new -- researchers need a chance to become



familiar with the opportunities offered by these new sources of



information.  A more fundamental problem, however, is that to an



analyst whose primary research experience is with cross-sectional



microdata, a longitudinal panel of microdata on families and



individuals can be rather intimidating.



 



    The purpose of this paper is to provide some guidance to users



and potential users of longitudinal data sets who are trying to



sort out appropriate approaches to the problems of analyzing



longitudinal panel data.  This paper does not attempt to offer any



new insights into the methodologies available to estimate the



determinants of change (or stability) in a given variable or set of



variables over time, nor are the theoretical issues underlying



these methodologies addressed in any detail.  Instead, the paper is



designed to be a much more basic "how to" guide, focusing on the



most fundamental choices that must be made by the analyst in



undertaking a project involving the use of longitudinal data to



examine the economic circumstances of families and individuals.



 



    The major focus of this paper is on specific methods of making



comparisons across time, with emphasis on matching the outcome



measures and statistical techniques chosen to the basic research



 



 



 



                                425



 



question being asked.  For many policy issues



fairly simple outcome measures may be perfectly appropriate, but it



is important to understand the measurement implications of alternative 



choices in order to avoid misinterpreting one's results.



 



 



II. Making Comparisons Across Time



 



    The major purpose of a longitudinal research file is of course



to facilitate the analysis of change over time.  There are three



major types of time-related analysis that are commonly carried out



with such files, and there are some specific methodological issues



that pertain to each.



 



 



Comparing Two Points in Time



 



    The simplest type of time-related analysis the comparison



of data from two discrete points in time -- does not actually



require a complete longitudinal data file at all.  The major



advantage of this type of analysis is that it is relatively simple



to implement and can often yield a great deal of useful



information, particularly for questions that focus on rates of



turnover in a specific variable.  This method is very commonly used



with many different longitudinal data sets -- several examples of



such analyses can be found for PSID data in the Institute for



Social Research's volume of PSID research results entitled Years of



Poverty, Years of Plenty, for example.  Other examples include Alan



Fox's study using RHS data which examined income changes at



retirement, and the SIPP-based study produced by Jack McNeil and



his colleagues at the Census Bureaus that considered how many of



those poor in 1984 were still poor in 1985.



 



    The major drawback of this method of making comparisons across



time is that the outcome variables are sometimes quite sensitive to



the specific time periods chosen-for analysis, and there is no way



for the analyst to determine this if only two points in time are



examined.  Further, such comparisons are valid measures of change



among those who already have a given characteristic, but cannot be



used to determine the distribution of durations of a particular



state among all those who enter it.



 



 



 



 



         



                                426



 



     For example, using this method we can tell what the total



remarriage rate for all divorced women is over a given period of



time, but we cannot determine the average amount of time that women



spend between marriages, because we do not know when those who were



already divorced at the time of the first observation got divorced,



and we have no distribution of remarriage probabilities by duration



of divorce to use in forecasting future remarriage rates for those



who have not yet remarried.  Indeed, we cannot even determine if



the remarriage rate is sensitive to the amount of time that has



elapsed since the divorce.  In other words, to the extent that the



determinants of changes in state are themselves time-related, they



may be difficult to observe if one must rely on simple "before and



after" comparisons.



 



 



Examining Transition Events



 



     A second approach to making comparisons across time,



therefore, is to examine transitions between two states directly.



By focusing on the transition itself one can more closely examine



its association with other factors that may not be observable in a



simple before and after comparison.  This is helpful both in



considering the effects of the transition on other variables and in



estimating a causative model of the determinants of the transition



itself.



 



     To illustrate this point, let us reconsider the analysis of



divorce discussed briefly above.  If the analyst is interested not



only in the determinants of the divorce transition, but also in its



impacts, a simple comparison of two points in time may be doubly



misleading.  For example, family income may dip temporarily at the



time of divorce as the family changes from one household to two.



Eventually, however, as the two households make post-divorce



adjustments in employment and living arrangements, income is likely



to recover at least somewhat.  Estimates of the impact of the



divorce on income and poverty status for the various family members



may be quite sensitive to both the unit definition used to compute



income (as discussed in the last section) and, to the specific



timing of two income observations compared to the divorce itself.



 



     In a case like this, examination of income or poverty status



over a longer period leading up to and then following the



transition will give a better picture of its actual impacts.  For



this type of examination it is necessary to have a longitudinally



linked file with the transition, flagged, but if such a file is



available a descriptive analysis of this type is quite



straightforward to perform.  Similarly, the transition flags



 



 



      



 



                                 427



 



themselves can be used as explanatory variables in a larger model



 of change over time as it affects some other variable.  The recent



 paper by Suzanne Bianchi and Edie McArthur on the impacts of



 marital disruptions on children's economic status illustrates a



 transition analysis of this type.



 



       Considering the determinants of a given transition is also



 facilitated by the availability of a linked longitudinal file.  For



 example, probit-type regression models can be used to examine the



 probability that a given transition will take place, subject to the



 various other characteristics of the cases in question.  In



 analyzing divorce, for example, one might want to consider the



 impacts of the spouses' employment statuses in the period before



 the divorce on the probability that they will become divorced.  In



 other cases, a broader set of dependent variables may be necessary



 -- those leaving a given state may have more than one alternative



 option.  The work by Alan Gustman and Thomas Steinmeier on



 retirement probabilities as observed in the RHS offers a good



 example of a fairly complex application of this type of transition



 analysis.



 



       With a linked longitudinal file, the conditional probability



 of a given event such as divorce or retirement can be calculated



 fairly easily for specific population subgroups, and/or conditioned



 on specific events, using readily available software packages such



 as SAS.  Again, however, such an approach can be misleading if the



 determinants of the transition in question are themselves time-



 related -- if for example, the previous duration of the marriage or



 even the length of the unemployment spell are important



 determinants of the probability of divorce.



 



       These duration-related issues, then, are potentially



 problematic with either a straightforward comparison of data from



 two points in time or with a more sophisticated analysis of



 specific transitions.  Although it is sometimes possible to



 shoehorn duration-related information into one's transition



 analysis - one could create separate dummy variables for short and



 long unemployment spells in the above example, for instance - this



 is a rather ad hoc approach that is likely to leave many unanswered



 questions.  In addition, in many cases one is interested not only



 



 



             



                                 428



 



in the transition event itself, or even in its impact on other



events, but also in the expected duration of the new state that it



creates.  One wishes to know, for example, how long someone who



enters poverty may be expected to remain poor, or how long someone



who loses a ]ob may be expected to remain unemployed.  Questions of



this type require some type of duration analysis.



 



 



Analyzing Data on Duration



 



    There are many possible approaches to questions of duration,



and alternative approaches can produce quite different and even



seemingly contradictory statistics.  The confusion generally



results from differences in the population to which the duration



estimate applies.  The two major possibilities are cohort-based



estimates, which typically apply to all those observed in a given



state at a point in time, and spell estimates, which apply to all



those observed to enter the state within a given span of time.



 



    To illustrate these possibilities, consider the case of



welfare program participation.  A point-in-time or cohort-based



estimate of welfare durations will ask a question like "How long



have those who are currently receiving welfare been on the



program?"  This question has been phrased retrospectively, but it



can also be put in a prospective form:  "How long are those



currently on the program likely to remain on in the future?"  In



either case, the base population being considered is all those on



the program at a given point in time.  Such estimates are therefore



relatively easy to line up with cross-sectional estimates of the



total population on welfare, which are of necessity also point-in-



time estimates.  Estimates of this type are very useful for a



number of purposes -- for example, estimating the future costs of



the current welfare caseload (although obviously to get total costs



one would also have to account for new welfare entrants).



 



     One useful way to think about estimates of this type is as an



examination of the experiences of a particular cohort -- a group



that all happened to be in a given state at a given point in time.



The NLS, for example, is designed with just such applications in



mind.  It is possible to use these data to examine the subsequent



experiences of several distinct demographic cohorts selected at



specific points in time -- teenagers, men nearing retirement, women



in their middle years.  It is even possible, with the new youth



cohort to link up families across generations, and to relate young



women's experiences to those of their mothers, as Peter Gottschalk



has done recently for welfare recipients, for example.  A similar



type of application using PSID data is Frank Levy's path-breaking



1977 paper on the "underclass," which traced the subsequent



experiences of a cohort of those in poverty in 1967.



 



 



          



 



                                      429



 



        Cohort-type analyses are very useful for many policy



  questions, but it is important to be aware of their limitations in



  applying them to policy analyses.  Specifically, because they apply



  only to those in the state at a given time, such analyses are



  sometimes difficult to generalize to the population as a whole, or



  even to the experience of all those who may pass through the state



  over a period of time.



 



        What a point-in-time estimate cannot do, in other words, is



  answer questions like "How long will a typical person entering



  welfare stay on the program?"  Such a question refers not to the



  population on the program at a point in time, but rather to the



  population entering the program.  Although that may seem like a



  subtle distinction, in fact these two populations are likely to be



  very different if there is any significant variation at all in



  spell durations within the population as a whole.  Those who are on



  welfare at a point in time are likely to have much longer spell



  durations, on average, than the typical entrant, because those with



  longer spell durations are more likely to be in the welfare



  population at any particular point in time.



 



        To see this point, consider a very simple example.  Suppose



  the population of interest consists of 13 people, one of whom is in



  the state under consideration for one year, and twelve of whom are



  in that state for one month each.  Further suppose that these



  twelve one-month spells are distributed so that one occurs in every



  month of the year.  At any given point in time, therefore, the



  total population in the state being considered will consist of two



  people, one who is in a one-month spell, and one who is in a twelve



  month spell.  A point-in-time analysis conducted any time after the



  first month will therefore conclude that 50 percent of the



  observable population reports a spell of more than one month.  An



  analysis based on all entrants observed during the year, however,



  will find that only one-thirteenth of the population reports a



  spell of more than one month.  Clearly, if the reasons for these



  differences in estimates are not well understood, they could lead



  to very different conclusions about the prevalence of long spells.



 



        Many of the most useful and interesting questions that can be



  addressed using a longitudinal database are questions that relate



  to duration.  In any type of duration analysis, however, it is



 



 



              



 



                                   430



 



necessary to be sensitive to the issue of censoring.  Inevitably,



there will be some spells that start before the beginning of the



observation period or that end after the panel has come to an end.



Further, there will be some cases that join the panel with a spell



already in progress or leave the panel before one has ended.  These



spells cannot simply be ignored, since of course longer spells are



more likely than short ones to be censored and ignoring this



problem will therefore produce biased estimates. 



 



     An alternative approach that unfortunately is fairly often



used by analysts who have not completely thought through the



problem of spell censoring is to mix together all one's



observations over a given span of time, whether they apply to



completed spells or to those that are only partially observed.



This produces results that are confusing and even potentially



misleading, since it is easy to misclassify spells that are only



partially observed as short spells, producing misleading estimates



of average spell durations.



 



     The measure of the "persistently poor" produced by Duncan et



al. using PSID data is an example of this approach, and illustrates



some of its problems.  In this study, the base 



population was defined as all those in the population during the ten



year observation period -- not just those in poverty in a particular



year, as in Levy's study.  Duncan et al. then defined the



"persistently poor" as those poor for at least eight out of the ten



years.  They went on to calculate the proportion of the total



population that was "persistently poor" simply by dividing the



number of people observed in poverty for at least eight years by



the total population observed.



 



     The problem with this approach is that some people who are



poor for less than eight years during the observation period are



nevertheless in the midst of spells of poverty that will total



eight or more years -- but unfortunately some of those years happen



to fall outside the observation-period.  Thus the true number of



individuals in the sample who were actually poor for at least eight



out of ten years (at least some of which fell in the sample period)



cannot be estimated using these data.  Estimates of the proportion



of those observed who experience long poverty spells will be



understated, because some spells that appear short are in fact



longer, but they simply haven't been completely measured.  At the



same time, however, because these estimates mix together people who



were poor in different years, they also cannot be used to predict,



say, what proportion of those poor in a given year will still be



poor eight years later.



 



     A preferable approach to the problem of estimating spell



durations when some observations are censored is to use some sort



 



 



       



 



                                  431



 



of survival analysis technique.  Under this methodology, a survival



 function for a given type of spell is estimated based on the



 cumulative distribution of observed spell durations.  In other



 words, in order to compute the probability that a spell of welfare



 participation, for example, will end in its sixth month,



 conditional on its having lasted for the first five months, one



 must include all cases known to have lasted at least five full



 months, whether or not their eventual disposition is known.



 



 



Click HERE for graphic.                          



 



 



 



     By including all spells -- even those whose endings will



 eventually be unobserved -- for as long as information on their



 status is available, systematic biases related to spell duration



 will be minimized.  At the same time, censored spells are



 essentially treated as if they had the same distribution of



 durations as spells with otherwise similar characteristics whose



 endings are observed.  Under this methodology, censored spells do



 not pull down the estimated median spell duration, for example, as



 they do when the problem of censoring is not recognized.  It is



 worth noting, however, that this approach assumes that censored



 spells are not systematically different from uncensored spells



 (except in ways fully captured in the X vector of explanatory



 variables), and that spells that occur at the beginning of the



 observation period are not systematically different from those



 starting nearer the end.  To the extent that external events -- for



 example, legislative changes or changes in the state of the economy



 affect spell durations over time, analysis techniques that pool



 spell observations across the period at a whole may be misleading.



 



 



 



 



                                432



 



    This approach does allow the contribution of a variety of



factors -- either fixed (e.g., sex and race) or time-varying (e.g.,



employment status) -- to the conditional probability of exit (or of



survival) to estimated -- these factors are simply included in the



X vector of explanatory variables described above.  This approach



is very popular as a general method of analyzing spell durations



and their determinants, and models of this type can be implemented



in SAS as well as in other easily-obtained statistical packages



(although typically the analyst is required to assume some specific



underlying form for the distribution of exit probabilities).  Only



data sets that provide a reasonably continuous record for a



reasonably large sample of individuals entering the state being



examined can be used with this approach, however, which limits its



usefulness with smaller or less focused data sets or those in which



data has been collected in an intermittent pattern.



 



 



III.  Conclusions



 



     In summary, the many new sources of longitudinal data on



incomes and family structures that have become available in the



last decade offer exciting research opportunities to the policy



analyst, but they bring with them their own unique measurement



problems.  Because these data sources are both, more complex and



less familiar than are cross-sectional databases covering such



topics, analyzing them can present some challenges.  For analysts



willing to address these challenges, however, there are useful



solutions, and these data can be used to provide important new



insights into the processes underlying economic and demographic



change.



 



     Indeed, as discussed briefly in the various examples of



measurement problems and their solutions given throughout the



paper, important applications of longitudinal analysis to policy



issues have already been carried out in many areas.  A few examples



include Bane and Ellwood's analysis of poverty spells and of AFDC



participation using the PSID; the work by Bruce Vavricek and Ralph



Smith of the Congressional Budget Office on spells of unemployment



insurance recipiency as observed in the SIPP; several Social



Security Administration-sponsored studies on retirement behavior as



observed in the RHS; and Peter Gottshalk's work on



intergenerational transmission of dependency as observed in the



NLS.  Projects are now underway to address a whole host of



additional issues, including patterns of health insurance coverage,



multiple program participation for low-income beneficiaries, and



earnings and employment patterns for the working poor.



 



     The work that has been done so far and the work that is now



underway represent major advances in our understanding of these



issues, but there is much further analysis that could be done with



our existing longitudinal survey data.  To some extent, this



expansion will simply take time analysts need to become more



 



                                433



 



familiar both with the surveys themselves and with appropriate



techniques for analyzing and interpreting these data.  Already,



however, there is beginning to be a large literature on the



applications of duration analysis, in particular, to economic and



demographic data, and this literature can only be expected to grow



over the next several years as additional data become available and



additional issues are explored.



 



     What can statistical agencies, and data producers in



particular, do to help the analyst undertaking this type of study?



In my view, these agencies could support longitudinal analysis



efforts in two major ways.



 



     First, data producers do not always produce files that are



highly amenable to longitudinal analysis, even when such analysis



is the primary mission of a particular data-collection effort.



Understandably, when a new survey such as the SIPP comes out a



great deal of effort is devoted to the early cross-sectional files,



since analysts are anxious to see how these new data line up with



data from famillar cross-sectional surveys.  In addition, the early



waves of any survey will be ready for analysis long before the



survey itself has been completed and edited longitudinally, and



data producers are understandably anxious to get these first



products to the users as fast as possible.



 



     Once a survey has been in regular production for some period



of time, however, it would make sense to lessen the emphasis on



cross-sectional files and to increase efforts to produce reasonable



longitudinal data in a reasonably timely fashion.  We already have



excellent cross-sectional data on family incomes and labor force



status, and unless the survey in question is clearly adding to our



store of available cross-sectional data on a particular topic,



cross-sectional applications should receive less attention.  In



particular, the level of effort devoted to activities such as



cross-sectional imputation that have no application in the



longitudinal context should be reduced. instead, greater research



efforts should be devoted to continuing problems like longitudinal



editing and the development of reasonable longitudinal imputation



procedures.



 



     The second way in which statistical agencies could support



longitudinal analysis would be to undertake more of it themselves.



Data producers typically publish at least some cross-sectional



information from the files they produce, and in some cases -- the



CPS publications in the Census P-60 series, for example, come to



mind -- these tables themselves provide important information on



which policy-makers come to rely.  It ought to be possible for the



Bureau of the Census and other data producers to publish similar



information, but of a longitudinal nature, using the longitudinal



databases that they now produce.



 



 



                               434



 



    The  assumptions underlying survival analyses might be



difficult to explain in such a context, but basic information on



the experience of a given cohort, for example, is fairly easy to



explain and to interpret.  For instance, one could look at how many



of those becoming unemployed in a given period were still



unemployed one, two, or more months later; how many of those on



welfare or in poverty at a given point in time were still in that



state x months (or years) later; and so forth.



 



     Similarly, one could examine the transitions between states



more directly, along with the characteristics of those experiencing



the transitions.  One could ask, for example, what proportion of



those leaving unemployment in a given year find jobs, and what



proportion leave the labor force?  Does it differ for men and



women, blacks and whites, old and young workers?  For that matter,



one could ask who becomes unemployed, and how does the incidence



differ by demographic characteristics?  Or, for example, what about



those who enter welfare programs in a given year -- what is the



incidence of entry for those in different categories?  What happens



to those who leave welfare in that year?  Do they get married?  Do



they get jobs?  How many of those gaining jobs are still employed



six months later, or a year later?  Similar questions could be



asked about the incidence and impacts of many other transitions,



from divorce to retirement to the birth of a child.



 



     The longitudinal analysis issues outlined above represent only



a small proportion of those that could be undertaken -- but the



point here is that there is a great deal of fairly straightforward



longitudinal analysis that would be very helpful to policy-makers,



and that is not now being done in any systematic way.



 



     Some very useful reports have been issued, of course -- for



example, the Census Bureau's P-70 series includes some longitudinal



analysis from the SIPP, although so far such applications have been



relatively limited in both quantity and scope.  Again, many of



these surveys, especially the SIPP and the NMCES, are still fairly



new, so perhaps it is not surprising that their producers have not



yet developed a complete, systematic schedule of reports examining



basic longitudinal issues.  Nevertheless, devoting more attention



to their own longitudinal analyses would probably be the most



important step data producers could take to support this type of



research, and could also increase substantially the useful



information that we are able to obtain from these surveys.



 



 



 



References



 



Allison, P.D. "Discrete-Time Methods for the Analysis of Event



Histories," in S. Leinhardt (ed.), Sociological Methodology 1982,



San Francisco:  Jossey-Bass, 1982.



 



 



                              435



 



      Bane, Mary Jo and David T. Ellwood.  "The Dynamics of Dependence:



the Routes to Self-Sufficiency."  Report prepared for the U.S.



Department of Health and Human Services.  Cambridge, Mass.:



Harvard University, 1983.



 



Bane, Mary Jo and David T. Ellwood.  "Slipping Into and Out of



Poverty:  The Dynamics of Spells." Journal of Human Resources,



Winter 1986, 21(l), pp. 1-23.



 



Bianchi, Suzanne, and Edith McArthur.  "Family Disruption and



Economic Hardship:  The Short-Run Picture for Children."  Paper



presented at the annual meeting of the Population Association of



America, May 1989.



 



Blank, Rebecca.  "How Important is Welfare Dependence?"  Working



Paper No. 2026.  Cambridge, Mass.:  National Bureau of Economic



Research, Sept. 1986.



 



Citro, Constance F., Donald J. Hernandez, and Roger A. Herriot.



"Longitudinal Household Concepts in SIPP:  Preliminary Results."



SIPP Working Paper Series No. 8611.  Washington D.C.:  U. S. Bureau



of the Census, 1986.



 



Cox, B. and S. Cohen.  Methodological Issues for Health Care



Surveys.  New York:  Marcel Dekker, 1985.



 



Duncan, Greg J. (ed.). Years of Poverty, Years of Plenty.  Ann



Arbor, Mich.:  Institute for Social Research, 1984.



 



Duncan, Greg J., Richard D. Coe, and Martha S. Hill.  "The Dynamics



of Poverty," in G. Duncan, ed., (op. cit.) 1984, pp. 33-70.



 



Ernst, L., D. Hubble, and D. Judkins.  "Longitudinal Family and



Household Estimation in SIPP."  Proceedings of the Survey Research



Methods Section.  Washington D.C.:  American Statistical



Association, 1984.



 



Fox, Alan.  "Work Status and Income Change, 1968-72:  Retirement



History Study Preview."  Social-Security Bulletin, 1976.



 



Gottschalk, Peter.  "The Intergenerational Transmission of Welfare



Participation:  Facts and Possible Causes."  Paper presented at the



annual meeting of the Association for Public Policy Analysis and



Management, November 1989.



 



Gustman, Alan L. and Thomas L. Steinmeier.  "A Structural



Retirement Model."  Econometrica, May 1986, pp. 555-584.



 



Levy, Frank.  "How Big is the American Underclass?"  Working Paper



0090-1.  Washington, D.C.:  The Urban Institute, 1977.



 



 



                             436



 



McMillen, David B. and Roger A. Herriot.  "Toward a Longitudinal



Definition of Households."  SIPP Working Paper Series No. 8402.



Washington DC:  U.S. Bureau of the Census, 1984.



 



McNeil, John, Enrique Lamas and Cynthia Harpine.  "Moving Into and



Out



of Poverty: Data from the First SIPP Panel File."  Proceedings of



the Social Statistics Section.  Washington DC:  American



Statistical Association, 1988.



 



Office of Management and Budget, Statistical Policy Office.



Federal Longitudinal Surveys.  Statistical Policy Working Paper No.



13. Washington DC:  OMB, May 1986.



 



Ruggles, Patricia.  Drawing the Line:  Alternative Poverty Measures



and Their Implications for Public Policy.  Washington DC:  Urban



Institute Press, 1990.



 



Ruggles, Patricia.  "Welfare Dependency and Its Causes:



Determinants of the Duration of Welfare Spells."  Paper presented



at the annual meeting of the American Economic Association, Dec.



1988.



 



Ruggles, Patricia and Roberton Williams.  "Longitudinal Measures of



Poverty:  Accounting for Income and Assets Over Time."  Review of



Income and Wealth, Sept. 1989, 35(3), pp. 225-244.



 



Ruggles, Patricia and Roberton Williams.  "Transitions In and Out



of Poverty."  Paper presented at the annual meeting of the American



Economic Association, Dec. 1986.



 



Short, Pamela Farley, Joel C. Cantor, and Alan Monheit.  "Dynamics



of



Medicaid Enrollment."  Inquiry, Winter 1984, 25(4), pp. 504-516.



 



Tuma, Nancy B. and Michael T. Hannan.  Social Dynamics:  Models



and Methods.  New York:  Academic Press, 1984.



 



Vavrichek, Bruce and Ralph E. Smith.  Family Incomes of



Unemployment Insurance Recipients and the Implications for



Extending Benefits.  Washington DC:  Congressional Budget Office,



1990.



 



Williams, Roberton.  "Poverty Rates and Program Participation in



the SIPP and the CPS."  Paper presented at the annual meeting of



the American Statistical Association, August 1986.



 



Williams, Roberton and Patricia Ruggles.  "Determinants of Changes



in Income Status and Welfare Program Participation."  Paper



presented at the annual meeting of the American Statistical



Association, August 1987.



 



                                 437



 



                          DISCUSSION



 



                         Michael Brick



                          Westat, Inc.



 



 



Pearson



 



  Pearson's paper is an excellent guide to federal agencies on



the merits of choosing between various alternatives in designing a



survey to meet specific policy relevant objectives.  He argues



quite persuasively that the important design question is not



whether cross-sectional or longitudinal is better, but which



combination of designs is most effective to answer the policy



questions.  Another important issue that Pearson raises is the



underutilization of experimentation in longitudinal surveys.  I



strongly agree with him in that experiments are needed if causal



modelling is a goal.



 



  Along these same lines, longitudinal surveys offer a rich



environment for experimenting with a wide variety of other issues



such as memory and recall.  Some items could be collected in the



baseline of a longitudinal study and then ask the respondent to



recall this information in a later followup.  Some examples that



might be interesting are income from previous years, grades while



in school, even opinion and attitudes.  These types of experiments



might help support some of the cognitive research theories or open



the door to new and more realistic theories.



 



  Pearson's listing of the advantages and disadvantages of



longitudinal files is very useful which can and should be used to



help improve design decisions.  I have a few quibbles about the



list that may offer a slightly different perspective.  The first



issue is the placement of nonresponse as a disadvantage.  Although



it is true that attrition is typically a bigger concern in



longitudinal files, the availability of additional covariates to



reduce nonresponse bias may partially offset this disadvantage.



However, until we devise and implement these methods effectively in



large-scale longitudinal files, the nonresponse problem will remain



a disadvantage.



 



   In many ways comparing cross-sectional and longitudinal



nonresponse problems is fraught with many of the same difficulties



associated with cost comparisons.  If you are accomplishing



something that cannot be done reliably in any other way, then you



do have an "apples and oranges" comparison.  When the cost of



survey or the problem associated with nonresponse is discussed, the



alternatives that satisfy the same objectives must be clearly



specified.  Pearson is correct that general statements or



conventional wisdom can lead to poor design decisions.  More



complete models of the errors and costs for longitudinal



 



 



                              438



 



alternatives to cross-sectional surveys are needed to help unravel



these questions.



 



  My main complaint with the list of advantages and



disadvantages is that the discussion of response errors is too



limited.  If response errors create spurious estimates of change,



then the major advantages (the first three of the four advantages



he lists) of longitudinal files are reduced or eliminated.  I'll



return to this point after commenting on the paper by Kasprzyk and



Jacobs.



 



 



Kasprzyk and Jacobs



 



   The paper by Kasprzyk and Jacobs is a welcome insight into the



many and varied issues that are peculiar to longitudinal survey



design, operations and analysis.  Their even-handed treatment of



the differences that are encountered in large-scale longitudinal



surveys obviously reflects many hours of wrestling with the real



problems in this setting.



 



   In their discussion of the advantages and disadvantages of



longitudinal surveys, they mention that the net change can be



estimated more precisely because of the positive correlation that



can often be expected in the variables over time.  While this is



true, the practice in many federal longitudinal surveys has not



taken advantage of this correlation properly.  In some cases only



cross-sectional estimates of variances are ever computed.  In other



cases, correlations are estimated only for a very few statistics



and then a generalized correlation is proposed for all other



variables.



 



   Since the more precise estimation of net change is really



probably the greatest advantage that a longitudinal survey has,



this practice needs to be re-examined.  If generalized correlations



are to be used, then it is important to put greater efforts into



their production and distribution.  For example, in a recent survey



Westat conducted for the National Science Foundation estimated



correlations over a two year period that ranged from -0.10 to



+0.65.  sampling errors for estimates of net change are not



difficult to measure and should be included as a routine product in



a federal longitudinal survey.



 



   On a different issue, Kasprzyk and Jacobs note that in some



longitudinal surveys efforts are made to avoid presenting data



containing obvious errors.  While this seems like a reasonable



objective, it can actually result in poorer quality data.  For



example, if an error is made in the baseline period and all later



data are verified against it for consistency, then new problems



could be created.  "Correcting" the errors in an edit program could



be simply a way to suppress the problem so that users do not "see"



it.  It is still a real problem.  The presence of earlier data may



 



                               439



 



encourage "over-editing" longitudinal survey data, creating false



impressions of data quality, and increasing errors in computed



statistics.



 



   In the discussion of longitudinal weighting and imputation,



Kasprzyk and Jacobs review a number of important statistical



issues.  As they note in their discussion there is not a universal



agreement on these issues.  Re-iterating a previous comment, I think



that imputation should play a much larger role than weighting in



estimation from longitudinal files.  The information obtained in



different data collection waves should be used for more efficient



estimation than is possible from simple weighting adjustments.  Of



course, the imputation of longitudinal files is also much more



complex, and methods for handling imputation in large longitudinal



surveys are not very advanced.  This is a challenge for producers



and analysts of longitudinal files.



 



 



 



Click HERE for graphic.                          



 



 



 



 



The Importance of Measurement Error in Longitudinal Surveys



 



   There are four concerns about measurement errors that I think



are very important to designers and analysts of longitudinal



surveys.  These concerns are:



 



-    Measurement errors are the most crucial problem facing



   longitudinal surveys



 



-    Measurement errors result in biased estimates of gross



   change and the ability to measure gross change is a prime



   goal in many longitudinal surveys



 



-    Measurement errors are a much greater problem in



   longitudinal surveys than in cross-sectional surveys



 



-    Changes in survey processes are required if the potential



   of longitudinal surveys is to be realized



 



   The concern over measurement errors in longitudinal surveys is



not new.  Errors in estimates of gross change have long been



recognized as having biases which reduce their usefulness.  Efforts



have been made to address these problems from both the design and



the analytic perspective.



 



   A simple hypothetical example may help to understand the



problem.  Figure 1 shows values of a characteristic (e.g.,



 



                                440



 



participation in a program, unemployment, health coverage) for a



sample of units.  The two extreme columns show the true values at



times 1 and 2.  Measurement error results in the values shown in



the adjacent columns being actually observed.  The observed values



then give rise to the observed change or transition values shown in



the center column.



 



    First, notice that measurement error has not greatly distorted



the cross-sectional estimates for either time 1 or time 2 (the



values in error are shown in bold).  Therefore the estimate of



level and the estimate of the net change between times 1 and 2,



which could be measured with either a longitudinal or cross-



sectional survey, are not greatly affected by the measurement



error.



 



    On the other hand, the impact of measurement error on the



gross change is dramatic.  Ten units are observed to have changed,



while the true number that changed is only 4.  One of the most



important and distinguishing features of many longitudinal surveys



is the ability to produce estimates of gross change, but



measurement error can seriously distort these estimates.



Measurement error can have a profound impact on estimates of



transitions, spells, durations, and flows.



 



 



Click HERE for graphic.                          



 



 



 



 



    It is instructive to examine the reasons why measurement error



causes so many more problems in longitudinal than cross-sectional



surveys.  The truth-by-survey table for a cross-sectional survey is



a useful way of working with measurement error for qualitative



variables. (See Table l.a and l.b)



 



    The net bias is the difference of two margins from the table



(a+b) - (a+c), or simply b-c.  The goal is to have zero or at least



a small net bias.  The conditions for zero net bias (i.e., when



b=c) are given in the Appendix of the Bureau of Census (1985).



Using their notation, let



 



    Pr(observed value = No / true value = Yes) = q



 



and



 



    Pr(observed value = Yes / true value = No) = f.



 



Then the net bias equals zero if Pq = (1-P)f, where P is the true



proportion of the population with the characteristic.



 



                               441



 



   If the two error rates are approximately equal and P>0 and



q>0, then the net bias will get smaller as P approaches .50.  If



P=.02, then the ratio of q:f must be 49:1 for the net bias to equal



zero.  This merely points out the inter-relationship between the



net bias and the size of the estimate.  For estimates of rare



characteristics, measurement error is likely to be more



problematic.  Of course, the distribution of the two error rates is



also of great importance.



 



   If we extend the example to a second observation time, we



encounter the same problem but the impact is larger.  First, note



that the net bias for the two observation periods is equal if the



probabilities of error are the same between times 1 and 2 and the



proportion with the characteristic does not change.



 



 



Click HERE for graphic.                          



 



 



 



 



 



   The point of this simple exercise is to show that in a



longitudinal setting the net bias for gross change involves the sum



of four differences, while for estimates of level there is only one



difference.  The problem is naturally greater in trying to measure



gross change, which is often one of the main objectives of a



longitudinal survey.



 



   As I noted earlier, the problems of response errors in



longitudinal surveys have been addressed from both a design and



analysis perspective.  The work of Bye and Schechter (1986), Chua



and Fuller (1987), and Poterba and Summers (1984) are some



excellent examples of the analytic approach.  Marquis and Moore



(1989) offer additional insight using data from records, and



highlight the need for designing the surveys and instruments



better.



 



   I suspect that if Dr. Deming were to become involved in this



issue he might say that longitudinal surveys offer new challenges



and we must change the way we do business.  In longitudinal surveys



we can no longer accept the errors and expect others to buy our



products.  We must concentrate on the survey process, identify the



major sources of variability, and take steps to eliminate them from



the system.  If we fail to take these types of actions, then it is



 



 



                                442



 



likely that it will be harder and harder to support longitudinal



surveys in the future.



 



 



 



 



 



 



 



 



                             443



 



                          



Click HERE for graphic.                          



                             444



 



                           



Click HERE for graphic.                          



                             445



 



References



 



Bye.  B.V. and Schechter, E.S. (1986), "A Latent Markov Model



Approach to the Estimation of Response Errors in Multiwave Panel



Data," Journal of the American Statistical Association, 81,



375-380.



 



Chua, T.C. and Fuller, W.A. (1987), "A Model for Multinomial



Response Error Applied to Labor Flows," Journal of the American



Statistical Associatign, 82, 46-51.



 



Marquis, K.H. and Moore, J.C. (1989), "Some Response Errors in the



SIPP- With Thoughts About Their Effects and Remedies," Proceedings



of the Section on Survey Research Methods of the American



Statistical Association.



 



Poterba, J.M. and Summers, L.H. (1984), "Adjusting the Gross



Changes Data:  Implications for Labor Market Dynamics," Proceedings



of the Conference on Gross Flows in Labor Force Statistics.



 



U.S. Bureau of the Census (1985), "Evaluating Censuses of



Population and Housing," Statistical Training Document, ISP-TR-5.



 



 



 



 



 



 



 



 



                              446



 



                          DISCUSSION



 



                      Marilyn E. Manser



               U. S. Bureau of Labor Statistics



 



 



  The papers by Patricia Ruggles and Robert Pearson, on which I



was invited to comment, both provide helpful insights into the



usefulness of longitudinal data.  I find myself in agreement with



them, for the most part.  What I primarily will do is reinforce



points which I think are particularly important and discuss other



points on which my perspective may be a little different.



 



   Let me begin with a fundamental question suggested by this



session's papers:  what is the definition of a longitudinal survey



vs. a cross-section survey?  Although I do not have a clear answer



to this I want to raise it for thought.  Pearson defines a



longitudinal survey as one "in which repeated observations are made



of the same individual subjects."  In his paper with Robert Boruch



(1988), the Current Population Survey (CPS) is included in the



description of longitudinal surveys.  In contrast, the OMB



Statistical Policy Working Paper 13, "Federal Longitudinal



Surveys," excluded rotating panel surveys such as the CPS, the



Consumer Expenditure Survey, and the National Crime Survey because



there was no explicit plan for longitudinal analysis incorporated.



Ruggles never explicitly defines what she means by a longitudinal



survey, but is clearly using CPS as an example of a cross-sectional



survey.  Alternative definitions have of course been considered



elsewhere.  One possible design-based definition could include a



requirement that in order to be called longitudinal a survey must



follow movers -- on this basis, CPS would not be called



longitudinal even if a specific plan were developed to make use of



its longitudinal aspects. (CPS permits, for example, a variety of



longitudinal studies of labor market situations, although at



present there are other problems with the quality of longitudinal



estimates based on CPS besides the fact that movers are not



followed.)



 



   It is important to note also that a purely design-based



criterion would be less than fully satisfactory -- problems could



prevent the design from being implemented.  For instance, budget



cuts could prevent any follow-up after the first round.  Less



drastically, it is important to ask, if following movers is viewed



as important, what proportion of movers are actually found?  It



would be useful to have this information produced regularly on



longitudinal surveys.



 



   



 



                               447



 



   To my knowledge, few major ongoing program efforts depend on



truly one-shot surveys, which seem to be what Pearson is calling a



cross-sectional survey.  Most statistical surveys are used to



produce at least aggregate estimates of change, even if that was



not the primary purpose in mind when they were designed.  For



example, for many analyses of the economic situation one is really



interested in whether the unemployment rate is high or low compared



to other periods.  Uses of data to construct aggregate measures of



change and arguments for improving aggregate cross-sectional



estimates can both justify a statistical design including a



rotating panel, even when no explicit longitudinal analyses are



planned.  But in addition there may be cost implications to one-



shot surveys, making them less cost effective than what I will call



"mixed surveys", rotating panel surveys which fail stringent



definitions of a longitudinal survey but are not truly one- shot.



In any case, for a relatively small additional effort to improve



longitudinal aspects of mixed surveys such as CPS, it may be



possible to improve analytic possibilities enormously.



 



   Both the Pearson and the Ruggles papers focus on household



surveys.  But establishment surveys such as the Census's Annual



Survey of Manufacturers and BLS's 790 survey typically go back to



the same sample units repeatedly.  Such surveys offer tremendous



opportunities for increasing understanding of economic phenomena if



the problems in making use of their panel aspects can be overcome.



 



I. Advantages and Disadvantages of Longitudinal Surveys



 



   A major focus of Pearson's paper is on weighing the advantages



and disadvantages of longitudinal surveys.  This is a valid and



useful discussion, but note that the disadvantages all center on



measurement problems.  No one has successfully argued, to my



knowledge, that non- experimental cross-section surveys are more



useful for the analytic purposes for which non-experimental



longitudinal surveys are designed.  Further, given that it is too



burdensome to obtain the needed information with retrospective



questions, which in any case would entail severe recall problems,



there is really no alternative to longitudinal surveys for many



types of analyses.  Longitudinal surveys are extremely important and



should be given greater use than they have received in the past but



much more research is needed on measurement problems.



 



   One section of Pearson's paper argues for coupling



longitudinal and experimental designs.  Surveys conducted to



collect and analyze data on social experiments have typically been



longitudinal.  The point made here is a recommendation to conduct



policy-related experiments with individual respondents to a



general-purpose longitudinal survey.  Clearly this could be a cost-



effective way to collect data if the individuals on whom the



experiment were being conducted were no longer to be counted as



part of the original survey.  But otherwise I would be extremely



 



                                448



 



troubled by doing this.  The whole purpose of policy-related



experiments is often to influence outcomes, and even if that is not



the intent outcomes are still likely to be influenced.  If this



occurs then responses of the sample members to a wide range of



survey questions are no longer representative of the population as



a whole.  In contrast, conducting survey methodological experiments



which can be assumed not to have a measurable impact on outcomes



can be useful or necessary in some instances.



 



   Pearson also makes the related point that joining additional



questions to an ongoing survey can be valuable.  I am in



wholehearted agreement with this.  One thing that the Department of



Labor's (DOL's) National Longitudinal Survey (NLS) program has done



that has, in my view, been very beneficial to a variety of



government agencies as well as to the outside research community is



to accept funding from other agencies to collect information of



interest to them which also enhances usefulness of the data for



analyzing labor market behavior.  For example, the National



Institute for Child Health and Human Development has added blocks



of questions on child care use to the NLS Youth survey.  Because of



the importance to us and to others of joining data collection needs



from other agencies to our survey, we have developed a general



policy to preserve the integrity of the basic data.



 



 



II. Longitudinal Analysis



 



    As implied by her title, Patricia Ruggles' paper focuses on



analysis of longitudinal data, primarily econometric analysis using



the micro data.  It has been in the micro area that the major use



of longitudinal survey data has occurred.  For instance, as



documented by Frank Stafford (1986), much of what we know about



labor economics has come from longitudinal surveys, primarily NLS



and the Panel Study of Income Dynamics.  Research using these data



continues, on topics such as the impact of private sector training



on future earnings, low wage jobs and their impacts, and the labor



supply behavior of women during pregnancy and shortly after birth



of the child.  Similarly, as more experience with the Survey of



Income and Program Participation (SIPP) accumulates, we are likely



to see many useful micro studies that will impact the way the



research community thinks about issues such as spells of dependence



on various programs.



 



    But where longitudinal household surveys have not made a large



contribution yet, it seems to me, is in short-term analysis of



current data: for instance, a series of reports on a topic such as



how have the transition rates out of poverty changed since last



year.  However, I understand that the Census Bureau has recently



released two P-70 reports using SIPP to analyze transitions.  I



believe that development of a series of current analytic reports in



addition to long-term econometric research studies from NLS is an



extremely important goal for this program, but the program has



 



                                449



 



never in the past included this dimension.  In general, I think



that carrying out both long-term econometric studies and shorter-



term, more current, tabular analyses is important and that they are



complementary.  However, as all three papers in this session note,



further research on weighting problems is greatly needed.



 



  Much of the Ruggles paper considers the complexity of use of



longitudinal data sets, for which they are often criticized.  She



recognizes, and this is important, that these complexities



necessarily come along with the richness of the data sets that are



responsible for the "exciting research opportunities" that they



provide.



 



   While it is true that existing longitudinal data sets are hard



to use, this is the case because of the vast amount of information



they contain, particularly after several rounds of interviews have



taken place.  (For instance, she notes that SIPP provides a choice



of accounting periods for income measures and this choice can make



a difference to the analysis.  This is in contrast to CPS where



income is available for only one accounting period.  With SIPP,



richness of choice creates complexity.  Unless there is widespread



agreement about what measure to use, summary income measures



provided on its files by a statistical agency would presumably not



suit some of its users.)



 



   Another related point in Ruggles' paper is the recommendation



that agencies append a myriad of transition flags to person



records.  While this is feasible, again, in general, unless there



is a large set of users with a particular need or an ongoing agency



use of the data for a particular purpose it will probably be the



case that a particular user will not find all aspects of a publicly



available file ideally suited to his or her particular use.  Limited



resources can often be used, however, to respond to needs affecting



a number of users.  For instance, many users of NLS data use the



event history information on jobs -- very rich data that were



initially very difficult to use.  As a result, a Workhistory data



tape was developed which contains weekly arrays of labor force



status, usual hours worked per week, and dual job information.



This Workhistory data tape makes analyses using this information



considerably easier.



 



   As she notes in her introduction, Ruggles' focus is "almost



exclusively on the application of longitudinal analysis to



questions concerning patterns of family income, expenditures,



and/or demographic change."  Because of this focus, she devotes



considerable attention to efforts to construct a longitudinal



family definition.  This is a major problem for analyses of topics



such as transition on and off of means-tested government programs.



This problem, too, exists because of the richness of this type of



longitudinal data.  It is well-known that examining income-levels



by family type using CPS is plagued by the fact that the family



structure information relates to a different period than the income



 



                              450



 



measure.  Because a longitudinal data source entails problems for



analysis does not necessarily mean that analysis of a similar topic



would be preferable using other, easier-to-use data.



 



  Note also that this problem of family status definition is not



a central problem in analysis of many types of longitudinal issues



-- it is the focus here that makes it one.  For example, studies of



labor supply behavior, work experience, earnings growth, and so on



focus on the individual.  NLS follows individuals primarily because



of its focus on labor force related information for people in



groups of particular interest to DOL.  Similarly, the Department of



Education focuses on the individual in its longitudinal studies



which focus primarily on educational experiences and outcomes for



youth.



 



   Ruggles' point that just reweighting the way it is typically



done does not necessarily solve a problem due to nonrandom



attrition is an important one.  This suggests one reason, among



others, for why a microeconometric analysis may be preferable to



looking only at tables:  as she points out, it is possible to



include people who are in the sample only for some of the periods



in a micro study.  But in general, use of tabular and econometric



analyses can be complementary.



 



   In her section on analyzing data on duration, Ruggles provides



a useful discussion of some of the pitfalls in this type of study.



Restricting an analysis to a particular age/sex cohort is not



problematic if the interest is in a particular group.  It is using



a variable that represents a choice to select a sample for analysis



that causes all the types of problems she discusses in this



section.



 



   In her conclusion, she makes two recommendations for



statistical agencies.  One is to lessen the emphasis on cross-



sectional files.  Her point that it is important not to use data



with cross-sectional amputations for a longitudinal analysis is an



important one.  But surveys may have multiple purposes so that



avoiding cross-sectional amputations entirely, especially given



needs for issuing timely data, would not be a possibility in many



cases.  The second recommendation is for more longitudinal analysis



by statistical agencies.  I agree that this is important, even



though longitudinal studies and tables may be difficult to explain



in many cases as she notes.



 



   In conclusion, let me note that as part of the major ongoing



joint BLS/Census CPS Redesign effort, attention to longitudinal



issues is planned.  Tables of gross monthly flows between labor



force states are presently produced regularly but are not



officially published because the estimates are not of sufficient



quality.  Efforts are planned to improve longitudinal aspects of



the survey and to research adjustment techniques to improve the



gross flows tables.  In addition, if funding permits, plans are to



 



                               451



 



      conduct a separate CPS-like longitudinal survey which would follow



movers and keep people in the sample longer.  This survey, in



addition to supporting improved analysis of short-run changes in



labor force behavior, would permit research on a multiplicity of



survey-related topics.



 



 



References



 



Boruch, R. F. and R. W. Pearson, "Assessing the Quality of



Longitudinal Surveys," Evaluation Review, Vol. 12, February 1988,



pp. 3-58.



 



Stafford, F., "Forestalling the Demise of Empirical Economics:  The



Role of Microdata in Labor Economics Research,"  in 0. C.



Ashenfelter and R. Layard, eds., Handbook of Labor Economics, Vol.



1. Amsterdam:  North-Holland, 1986, pp. 387-423.



 



 



 



 



 



 



 



 



                             452



 



             TOWARDS AN AGENDA FOR THE FUTURE



 



 



 



 



 



 



 



 



                             453



 



                            454



 



               TOWARDS AN AGENDA FOR THE FUTURE



 



                     Stephen E. Fienberg



                  Carnegie Mellon University



 



   My remarks this afternoon will focus on a few key themes that



emerged in various sessions over the past two days.  I will attempt



to use these themes to point towards elements of an agenda for the



future of the federal statistical system, not just the future of



the Federal Committee on Statistical Methodology that oversees the



OMB Statistical Policy Working Paper Series around which the



seminar has been centered.



 



On Quality



 



   George Hanuschak in the session on survey quality profiles



recalled the words of one of the present-day quality gurus, to the



effect that we should build quality into the system, not just



inspect for the lack of it after the fact.  A variant on this is



the theme that we need to build quality and evaluation into our



data collection processes.  The traditional notion of coming back



several months later to check on the answers provided by a survey



respondent seems at odds with the notion of ongoing change and



improvement.  For example, consider two components of the 1990



Census, the group quarters censuses of college and university



campuses and the special homeless component - - S-night - -



program.  In neither case can one expect to return a month or so



after the enumeration to check on information recorded.  Thus a



careful census quality program would have some built-in evaluation



mechanism for these components.



 



   At Carnegie Mellon University we have a new Statistical Center



for Quality Improvement which we operate jointly with the



statisticians at the University of Pittsburgh, and my colleagues



associated with this center are fond of referring to the three



generations of statistical approaches to quality.  The first of



these is the basic univariate control chart generation of



technology associated with the names of Shewart, Deming, and others



and based on ideas that were found in the literature in the 1920s



and 1930s.  The second generation was linked to the introduction of



careful experimentation specifically designed for the industrial



setting, e.g., response surface methodology and EVOP, and



introduced in the 1950s and 1960s.  The recent interest in Taguchi



methods is rooted in large part in basic fractional factorial



design ideas.  we are just beginning to see the emergence of the



third generation of quality techniques which focus on statistical



methods for the analysis of complex multivariate data using high



speed computation and computer graphics.



 



 



                               455



 



   Based on what I know about quality efforts in the federal



statistical agencies, and what I heard described at this seminar,



I would describe the current state-of-the-art as being focussed on



the first generation of quality ideas, univariate in approach,



lacking careful and systematic experimentation, and devoid of



techniques rooted in the modern world of computing.  Yet there are



ample opportunities for moving quickly into the second generation



by utilizing ideas on the embedding of experiments in surveys



(e.g., see Tanur and Fienberg, 1988, 1989).  The simplest of the



embedded designs (the split ballot experiment) is often recommended



for use (as it was in the session here on questionnaire design) but



rarely analyzed properly.  Indeed, as we look to the widespread



exploration of ideas and concepts coming out of the cognitive



laboratories, the federal agencies must take seriously the second



generation ideas of embedded experiments.



 



On the Need for Integration



 



   Some of the recent advances in methods for data collection and



analysis appear as add-ons, off to the side of the main enterprise.



In the spirit of the Total Quality Management movement we have



heard in several sessions about the need for Integration of the



components of survey design, and in a larger sense for the



integration of thinking across agencies.



 



   I am reminded of an academic story.  As a dean at Carnegie



Mellon, I sit on the university promotion and tenure committee and



get the opportunity to review cases from diverse disciplines.  A



few years ago, we were reviewing the case of a physicist for tenure



and his file contained a number of letters describing his



experimental work as brilliant, innovative, or outstanding.  As we



looked over his curriculum vitae, we noted that he had no



individually-authored papers but only appeared as one of a cast of



thousands on each paper.  Finally, one committee member asked the



presenter of this case, what was so distinctive about the



candidate's work in high energy physics that merited the laudatory



comments.  The response was: "He focuses the beam."



 



   Now many of you have roles in federal data collection that are



akin to that of the physicist's beam-focussing.  These are



important and often crucial roles, but their value needs to be



understood in the broader integrative setting, both by you in your



work and by those who are looking towards quality improvement more



broadly.



 



 



On the Statistical Policy Working Paper Series



 



   While many of the sessions at this seminar were based directly



on papers from the OMB Statistical Policy Working Paper Series,



others have been on collateral advances in methods and data quality



 



                               456



 



assessment.  Bob Groves began the seminar by noting three



perspectives on the goals that the series should have.  These were



to serve as:



 



    (a) reports in the "state-of-the-art" of federal practice,



 



    (b) vehicles for agency cross-fertilization,



 



    (c) prods to new developments.



 



   Many of the nineteen papers issued to date have succeeded



admirably in categories (a) and (b), and they have changed how work



is done across agencies.  Others have had only limited impact.  But



I think that we could agree with Groves that few of the papers were



prods to major new methodological developments.  Perhaps the



Federal Committee on Statistical Methodology that oversees the OMB



Statistical Policy Working Paper Series needs to be more daring in



its choice of topics in the future.  New topics need not be rooted



in ongoing work in specific agencies nor do they need to be ones on



which the committee agrees.  For how else can we achieve a major



shift or revolution in methods and quality?



 



    At the same time I should note the need for attention to and



support for the committee's activities on the part of senior



administrators in the statistical agencies.  If staff do this work



only in their spare time we can expect to see few major



methodological advances.



 



 



Shifts of Paradigm for Federal Statistics



 



    Fritz Scheuren has been talking both at this seminar and in



recent years about the need for a paradigm shift in how we do



federal statistics.  I believe that he is correct in this claim



although I do not think that many people understand what he and the



philosophers of science mean by paradigm shifts.  I commend those



of you who have not read Thomas Kuhn on scientific revolutions to



do so as his ideas often get mangled in the translation.



 



    Kuhn talks about the day-to-day orderly change and incremental



knowledge approach to science which gets radically altered and



reorganized by the introduction of a new set of ideas and a new



paradigm such as that associated with the work of a Newton or an



Einstein.  Now when a paradigm shift occurs, the past tools and



perspectives are not all discarded.  Rather they are looked at in



a different way and accorded a different place in the hierarchy of



importance.  What we also see is the introduction of dramatically



different measurement methods, with markedly changed error



profiles.



 



    Up through the present day the federal statistical system has



been based in large part on tools developed many years ago, more



often than not in the 1930s and 1940s.  This is especially true in



survey design and census taking.  With the technological revolution



 



 



                                457



 



of the 1970s and 1980s, one might have expected to see a paradigm



shift in statistics in the agencies, but the computer and its



effects have been forced into the old paradigm instead of being the



trigger to a reorganization of our thinking.



 



   The last decade has been a difficult one for statistical



agencies, but perhaps the problems that the agencies have



encountered during this period should spur us to rethink what we do



and how.  We should be asking if tools like CAPI, CATI, distributed



computing networks, major new analytical statistical methods, and



the cognitive-statistical laboratory may be the vehicles to major



changes.



 



 



Impediments to Major Change



 



   Perhaps the biggest impediment to change is the bureaucracy in



which most of you work.  A piece of this is the attitude:  "We've



always done it that way."  This is related to the theme I would



label as "The Agency is the Data."  The purpose of collecting



statistical data is not an end unto itself, but rather a means to



a social or policy goal.  The aim of the federal statistical agency



then should be to serve these broader goals well, rather than to



collect data insulated from outside input and protected from



outside scrutiny.  We need to move towards making our data



relevant; to measure what is of importance, albeit poorly, instead



of measuring what current methods are designed to be good at, even



if it is of marginal interest.



 



   I'd like to tell a parable about of the National Goodness



Survey which was mandated late one night in conference by Congress



as an amendment to a foreign aid bill.  The federal methodology



coordinating committee was asked to propose a design for this new



survey at one of its meetings, and each of its members was asked to



come back to the next meeting with a proposal for the design:



 



(a) The representative from the Bureau of the Census returned



   with a household survey design that resembled the Current



   Population Survey, and she noted that surely goodness



   resided in household locations, just as unemployment



   does.



 



(b) The representative from the Energy Information



   Administration noted that goodness was likely to flow



   from reservoirs in the group and thus proposed a design



   modelled on their survey of natural gas reserves.



 



(c) The Bureau of Labor Statistics representative suggested



   that we couldn't ignore the component of goodness that



   was due to business establishments, and proposed a



   separate survey based on their new establishment list.



 



                               458



 



  But she also offered the auspices of the BLS cognitive



   laboratory for testing ideas on goodness consumption.



 



(d) The Bureau of Justice Statistics representative noted



   that his agency didn't actually conduct its own surveys



   and referred the committee to the representative of the



   Census Bureau for how this should be done.



 



(e) The representative from the National Center for Health



   Statistics suggested that goodness was a manifestation of



   physical well-being and urged that the new survey be a



   supplement to the National Health Interview survey.



 



(f) Finally, National Center for Education Statistics



   proposed that we ask the state superintendents for public



   schools to report on the fostering of goodness in the



   educational process, and that we develop a new



   standardized test that could be administered annually to



   measure the acquisition of goodness skills.



 



(I leave it as an exercise for the reader to describe how the



representatives from BEA, DoD, IRS, and NASS responded.)



 



   Now part of the problem with my parable lies with the approach



taken by each of the agency statisticians who, instead of asking



what the concept of goodness is all about and how could one measure



it, looked to analogues close at hand and let the standard methods



he or she was familiar with frame all of the answers to the crucial



unasked questions.  Perhaps a survey is the wrong tool for the task



of measuring goodness.  The other problem arises from the fact that



no agency has a monopoly on statistical methods or the ability to



design new surveys, not the Bureau of the Census, not BLS, not even



the small band of statisticians in OMB who must approve the design.



New projects the federal statistical system is likely to face in



the next decade likely will require innovative thinking and true



interagency collaboration.



 



   The example given by Judy Lessler of measuring the quality "of



Flowing Waters" is illustrative of the point I am trying to make.



The Research Triangle Institute (RTI) statisticians put this



problem of measuring the quality of the nation's flowing waters



back into the traditional survey domain of a frame with a



population of units (river reaches) to be sampled.  The approach



was ingenious and some might even call it innovative.  But in so



describing the problem of measuring the quality of the nation's



waters Lessler missed the opportunity to note a point that I am



sure the RTI statisticians discussed, namely that many radically



different frameworks are possible for looking at this issue, and



only some of these fit neatly into a traditional sampling approach.



 



 



 



                                459



 



  Thus one of the messages I bring you today is that we all must



learn to question the appropriateness of traditional statistical



frameworks and institutional dogma.  This is especially true as we



move into some of the more fascinating new domains of federal



statistics, e.g., related to the environment, as well as in



considering different ways of collecting data, for censuses, and



especially for longitudinal surveys.



 



  What we do know about longitudinal surveys is that they should



not be a simple pasting together of waves of cross-sectional



surveys.  What we do not know is how to design such surveys except



by faulty analogy to traditional cross-sectional methods.



Traditional concepts of frames and survey coverage suddenly become



elusive, shifting over time.  For longitudinal surveys we need to



rethink what data we collect, when, and how.  And we need to have



a more flexible set of analytical tools that allow the data to be



viewed from multiple perspectives.  Technology may well help here



with problems associated with sample attrition and the followup of



movers.



 



Some Advice



 



  I'd like to end with a bit of advice and encouragement about



what you can do to improve the quality and appropriateness of the



statistical work in your own settings.  Your challenge is to keep



yourself from being isolated, to prevent yourself from accepting as



infallible the data collection and analysis methods you currently



use in your job, and to look beyond the walls of your organization.



 



(a) Ask "why" more often that you have in the past.



 



(b) Dare to have new ideas or suggest the exploration of



  someone else's new ideas.  Innovative ideas have a long



  gestation period and only a small fraction of them



  actually work in practice.



 



(c) Insist on careful evaluation and documentation of what



  you are doing.



 



(d) Don't be afraid to say that you don't know or you don't



  understand.  Such statements are often not a sign of



  ignorance but rather indicators of wisdom.



 



(e) Hang in there.  Your jobs are difficult and most of you



  are doing them well.  The nation depends on your efforts.



 



References



 



 



 



 



                             460



 



Fienberg, S. E. and Tanur, J. M. (1988).  From the inside out and



the outside in: combining experimental and sampling structures.



Canadian Journal of Statistics, 16, 135-151.



 



Fienberg, S. E. and Tanur, J. M. (1989).  Combining cognitive and



statistical approaches to survey design.  Science, 243, 1017-1022.



 



Kuhn, T. (1970) The Structure of Scientific Revolutions. (Second



Edition, Enlarged) University of Chicago Press, Chicago.



 



 



 



 



 



 



 



 



                             461



 



               TOWARDS AK AGENDA FOR THE FUTURE



 



                      Margaret E. Martin



 



 



  I have been given a "Where do we go from here?" assignment to



help in focussing the experience of the Federal Committee on



Statistical Methodology (FCSM) on future directions.



 



  So who is "we" and where is "here"?  I have chosen to consider



"we" as something broader than the FCSM itself -- perhaps the



coordinating role of the Statistical Policy office, perhaps that



amorphous entity, the federal statistical system in general.



 



  Where is "here"?  It seems to me "here" is an amazingly



distant and productive way from the starting point when the FCSM



was founded--19 "state of the art" reports ago.  The productivity



of the FCSM's part-time, interagency subcommittees has been



outstanding.  Much credit belongs to Maria Gonzalez.



 



  Some notion of expectations is essential in order to assess



past progress and future progress.  What can such committees



accomplish?  We do not usually look to such groups to produce major



breakthroughs in statistical theory, nor to engage in detailed



technological applications or experiments.  Rather, it seems to me,



an interagency committee might be expected to perform one or more



of the following functions:



 



1) exchange knowledge, techniques or experience among



  committee members to enhance the quality of the member



  agencies' own operations;



 



2) provide "state of the art" reports to encourage best



  practice among a broader group;



 



3) recommend areas for improvement and needed directions for



  research; and



 



4) obtain consensus on such issues as -- defining problems



  and the priorities among them, developing or changing



  classifications or other concepts, and setting



  statistical standards.



 



  I am uncertain how much the various subcommittees have served



to fulfill the first objective -- that of exchanging knowledge



among the subcommittee members -- especially upon hearing informal



comments that much subcommittee work is report drafting and



criticizing undertaken by individuals on evenings and weekends,



rather   than   exchanges   at   committee meetings.        In    such



circumstances, the interplay among participants that         sometimes



leads to unexpected and happy outcomes is not encouraged.  Perhaps



this is a point that needs more consideration and possible



 



                              462



 



development in the future.  Suggestions that arose in the opening



session yesterday for more followup on subcommittee reports might



lead to more continuing and profitable interactions among



subcommittee members.



 



   The FCSM has fulfilled admirably the second objective I



listed, that of providing state of the art reports to encourage



best practice among a broader group of agencies both within and



outside the Federal Government.  Robert Groves reported yesterday,



for example, that he has used some of the reports in training



future survey statisticians.  The record of the FCSM in meeting



this objective is outstanding.



 



   Many of the reports meet the third objective of recommending



areas for improvement and needed directions for further research --



although the record here is more spotty.



 



   The fourth objective, that of obtaining consensus on broad



definitional, conceptual and classification issues, has not been



well met.  Indeed, the FCSM apparently does not deem such issues to



be within its purview.  Very well, but such issues are of immediate



concern to OMB's Statistical Policy Office and at this time



pressures are increasing to re-examine basic concepts and



classifications in both economic and social areas and to establish



more extensive statistical standards for the federal data



collecting agencies.



 



   For example, many of our economic statistics depend for



classification purposes on the concept of the establishment; many



of our social statistics are collected about and from families --



yet both of these concepts are becoming more difficult to apply and



possibly less relevant.  Changes in either would have major impacts



on the uses of the resulting data; they might also have major



methodological repercussions.



 



   Although the FCSM has developed an admirable program of



sponsoring, reviewing and publishing reports on specific topics, it



has not been so forthcoming about its own operations.  I am



especially curious about how it selects the areas for subcommittee



operation.  The most important problems?  By what criteria?  The



problems most likely of immediate solution?  Or some interface



between these criteria?  Here I would only note that the Committee



has not yet tackled the most difficult problem of all those facing



federal statistical agencies, that of setting statistical



priorities.  It is possible that chances of a successful



subcommittee outcome are too remote to warrant effort on this



issue.  It is now fifteen years since a panel of the Committee on



National Statistics issued a preliminary report* and recommended



additional research on costs, and, especially, the benefits of



statistical activities.  To my knowledge, there has been little if



any follow-through by federal agencies.  The time may be ripe for



another look at this issue.



 



                               463



 



  Fritz Scheuren spoke yesterday about possible paradigm shifts



in the taking of the Census.  I have my own pet paradigm shift to



recommend.  Back in the 1940's and 1950's when I was being educated



in statistical methodology by Morris Hansen and others in the



federal statistical agencies as part of the process of their



obtaining Bureau of the Budget approval for forms, I learned the



paradigm that the sponsor of the form (the subject matter



specialist, the scientist, the policymaker) specified the subjects



to be covered and the accuracy desired, the statistician provided



the statistical design and methodology and estimated the costs of



alternatives.  The description is over-simplified but not unfair,



I think.  Yet how far from actual practice.  The economic or social



theorists seldom specify an operationally feasible concept.  It is



the applied survey economists, demographers and other specialists,



together with the statisticians, interacting, who develop concepts,



designs and methodologies in a succession of approximations.



 



  As a case in point, take the definition of employment.  In



classical economics, employment is not defined, nor even mentioned.



An undifferentiated mass known as "labor" was identified as one of



the three factors of production, and when current information on



the demand for labor was wanted in the late nineteenth century, it



was determined to collect data on employment from employers.  The



result was that employment was defined to be something obtainable



from employer records, a concept approximating filled jobs.  It



excluded the self-employed, but one person could be counted on



several payrolls by holding a number of part-time or part-period



jobs.  The source and method of data collection thus determined the



effective definition.  Later, when a new survey attempted to



measure the unemployed, it proved necessary to go to persons in



households for the information and to identify the employed to



differentiate them from those not at work and seeking work.  A



quite different count of employment resulted, again reflecting the



basic survey methodology.  Some of the differences between the two



series in level and change can be explained by known differences in



the concepts, some remain intractable.



 



 I think it is high time to shift the paradigm more towards one



centered on survey methodology broadly defined.  This would argue



for either expanding the scope of the FCSM or establishing



additional coordinating committees under OMB auspices to work on



developing consensus on critical conceptual and classification



issues.



 



Reference



 



Setting Statistical Priorities, Report of the Panel on Methodology



for Statistical Priorities, Richard Savage, Chair, Committee on



National Statistics, National Academy of Sciences, Washington, DC,



1976.



 



 



                            464



 



                TOWARDS AN AGENDA FOR THE FUTURE



 



                        Hermann Habermann



                Office of Management and Budget



 



 



    The keynote address at this seminar given by Bob Groves



presented the goals for the working papers prepared by the Federal



Committee on Statistical Methodology.  The goals are documentation



of Federal statistical practices, cross-fertilization among



agencies, and to prod new developments.  The address suggested the



need to establish a reward structure for the Federal staff that



work on the FCSM projects.



 



    The Federal statistical system needs to periodically examine



itself to determine if we are meeting our goals.  Some of the areas



of work that the statistical system must now investigate are listed



below.



 



o    Cognitive laboratories



 



o    The National Academy of Sciences' Committee on National



    Statistics is studying:



 



         Trade data



         Disclosure-avoidance techniques



 



o    The Bureau of Economic Analysis is moving away from the



    present system of National Accounts towards the United



    Nations National Accounts System.



 



o    Census 2000 is 10 years away so we need a fresh look at



    the methods used to collect census data.



 



o    Private data bases are now burgeoning.  The Federal



    government no longer has a monopoly on data bases.



 



    There are many changes in public attitudes.  Some of the



questions that we need to consider in the context of the Seminar on



Quality of Federal Data follow.



 



 1. What is the purpose of the decennial census?



 



 2. What is the relationship between the "10 year ceremony"



    and intercensal data collection?



 



 3. Where are we going on disclosure-avoidance techniques?



 



 4. Where are we going with Federal-State statistical



    program?  How can we evaluate the multiple models used in



    these programs?



 



                                465



 



5. What is the best strategy to take care of the increasing



 difficulties that agencies have with recruitment and



 training of technical personnel?



 



 



 



                                466



 



 



                     Reports Available in the



                        Statistical Policy



                       Working Paper Series



 



 



1.   Report on Statistics for Allocation of funds (Available



     through NTIS Document Sales, PB86-211521/AS)



2.   Report on Statistical Disclosure and Disclosure-Avoidance



     Techniques (NTIS Document Sales, PB86-211539/AS)



3.   An Error Profile:  Employment as Measured by the Current



     Population Survey (NTIS Document Sales PB86-214269/AS)



4.   Glossary of Nonsampling Error Terms:  An Illustration of a



     Semantic Problem in Statistics (NTIS Document Sales, PB86-



     211547/AS)



5.   Report on Exact and Statistical Matching Techniques (NTIS



     Document Sales, PB86-215829/AS)



6.   Report on Statistical Uses of Administrative Records (NTIS



     Document Sales, PB86-214285/AS)



7.   An Interagency Review of time-Series Revision Policies (NTIS



     Document Sales, PB86-232451/AS)



8.   Statistical Interagency Agreements (NTIS Document Sales,



     PB86-230570/AS)



9.   Contracting for Surveys (NTIS Document Sales, PB83-233148)



10.  Approaches to Developing Questionnaires (NTIS Document 



     Sales, PB84-105055/AS)



11.  A Review of Industry Coding Systems (NTIS Document Sales,



     PB84-135276)



12.  The Role of Telephone Data Collection in Federal Statistics



     (NTIS Document Sales, PB85-105971)



13.  Federal Longitudinal Surveys (NTIS Document Sales, PB86-



     139730)



14.  Workshop on Statistical Uses of Microcomputers in Federal



     Agencies (NTIS Document Sales, PB87-166393)



15.  Quality in Establishment Surveys (NTIS Document Sales, PB88-



     232921)



16.  A Comparative Study of Reporting Units in Selected Employer



     Data Systems (NTIS Document Sales, PB-90-205238)



17.  Survey Coverage (NTIS Document Sales, PB90-205246)



18.  Data Editing in Federal Statistical Agencies (NTIS Document



     Sales, PB90-205253)



19.  Computer Assisted Survey Information Collection (NTIS



     Document Sales, PB90-205261)



20.  Seminar on the Quality of Federal Data (NTIS Document Sales,



     PB91-142414)



 



Copies of these working papers may be ordered from NTIS Document



Sales, 5285 Port Royal Road, Springfield, VA  22161 (703) 487-4650



 



                                  467



 



 



"1"  A copy of the complete paper which details the sample



selection and matching procedures used in ERUMS is available from



John Pinkos, Bureau of Labor Statistics, GAO Building Room 2913,



441 G Street, NW, Washington, DC 20212, Telephone (202)523-1636.



 



"2" Acknowledgement:  This chapter was partially supported by a



grant from the National Science Foundation's program of Measurement



Methods and Data Improvement, Grant # SES-8511609.  The chapter



benefitted from the helpful comments and suggestions of Robert F.



Boruch, Calvin C. Jones, and Nancy A. Mathiowetz.  A more extended



version of this chapter appears in Krishnan Namboodiri and Ronald



G. Corwin, editors, Research in the Sociology of Education and



Socialization.  Volume S. Greenwich, Connecticut:  JAI Press,



1989, pages 177-199, and is reprinted here in part with the



permission of the publisher.



 



"3" This survey of approximately 2,000 respondents was a face-to-



face stratifed probability sample of the adult noninstitutionalized



population of the United States, which included a special



supplemental sample of minorities for that year.



 



"4" This survey of approximately 10,000 respondents is a cohort



of youth (age 14-21 during the first year of the survey in 1979)



which included oversamples of females and minority youth and a



special military sample.



 



"5" A longer version of this paper that discusses several other topics



in longitudinal analysis such as designing a longitudinal file,



dealing with attrition, imputation and weighting issues, and the



choice of an accounting period is available from the Census Bureau



as SIPP Working Paper No. 9007.



 



"6" See Duncan (ed.) (1984) and McNeil et al. (1988).



 



"7" Applications illustrating the use of this technique to analyze



income change can be found in Ruggles and Williams (1986) and



Williams and Ruggles (1987).



 



"8" See Bianchi and McArthur (1989).



 



"9" See for example Guatman and Steiruneier (1986).



 



"10" Additionally, if rates of divorce are changing rapidly over time,



  the use of pooled data on transitions from A long-term sample such



  as the PSID may give misleading estimates of, transition



  probabilities.  See for example Tuma and Hannan (1984) for more



  discussion of this point.



 



fnote>"12" Mary Jo Bane and David Ellwood's classic paper on poverty spells



  makes this point very well, and provides a good example of spell



  analysis as applied to the PSID.  (See Bane and Ellwood (1986)).



  For a similar example using SIPP data, see Ruggles and Williams



  (1989).  Other useful applications include the work by Pamela Parley



  Short and her colleagues on spells of Medicaid participation and



  Rebecca Slank's imaginative use of longitudinal data from the



  Seattle and Denver Income Maintenance Experiments to examine spells



  of welfare program participation.  See Short et al.  (1988) and Blank



  (1986).



 



"13" See Duncin et al. (1984).



 



     



"14" This discussion is aimed at the analyst trying to decide whether



  this approach is appropriate for the particular application he or



  she has in mind.  Anyone attempting to implement such an analysis



  should bf course review some of the more technical literature on



  this topic.  Tuma and Hannan (1984) provide a good basic an overview



  of these methods.  In addition, the treatment in Allison (1982) may



  be helpful to analysts who are completely unfamiliar with event



  history analysis techniques.



 



"15" The author is Assistant Commissioner, office of Economic



  Research, Bureau of Labor Statistics.  The views expressed herein



  are those of the author and do not necessarily reflect those of the



  Bureau of Labor Statistics.



 



 



 



 



 



 



 

(wp20c.html)

ARROW UP

 


Page Last Modified: April 20, 2007 FCSM Home
Methodology Reports