Federal Committee on Statistical Methodology
Office of Management and Budget
FCSM Home ^
Methodology Reports ^

 

  Statistical Policy Working Paper 6 - Report on Statistical Uses of Administrative Records


Click HERE for graphic.

 

 

 

 

 

Statistical Policy

Working Paper

 

Report on

Statistical Uses Of

Administrative Records

 

Prepared by

Subcommittee on Statistical Uses of Administrative Records



Federal Committee on Statistical Methodology

 

 

U.S. DEPARTMENT OF COMMERCE

Philip M. Klutznick, Secretary

Luther H. Hodges, Jr., Deputy Secretary

Courtenay M. Slater, Chief Economist

 



Office of Federal Statistical Policy and Standards

Joseph W. Duncan, Director

 

Issued:   December 1980

 

 

 

 

 



Statistical Policy Working Papers are a series of technical

documents prepared under the auspices of the Office of Federal

Statistical Policy and Standards.  These documents are the product

of working groups or task forces, as noted in the Preface to each

report.



 



These Statistical Policy Working Papers are published for the

purpose of encouraging further discussion of the technical issues

and to stimulate policy actions which flow from the technical

findings and recommendations.  Readers of Statistical Policy

Working Papers are encouraged to communicate directly with the

Office of Federal Statistical Policy and Standards With additional

views, suggestions, or technical concerns.

 

 

Office of                                                 W. Duncan

Federal Statistical                                        Director

Policy and Standards

 

           For sale by the Superintendent of Documents,

                  U.S. Government Printing Office

                      Washington, D.C. 20402



 

 

 

 

 

 

 

 

                   Office of Federal Statistical

 

                       Policy and Standards



 

                    Joseph W. Duncan, Director

 

     Katherine K. Wallman, Deputy Director, Social Statistics

      Gaylord E. Worden, Deputy Director, Economic Statistics

                  Maria E. Gonzalez, Chairperson,

               Committee on Statistical Methodology

 

                              Preface

 

      This working paper was by the members of the Subcommittee on

Statistical Uses of Administrative Records, Committee on

Statistical Methodology.  The Subcommittee was chaired by Daniel H.

Garnick, Bureau of Economic Analysis, Department of Commerce.  The

members of the subcommittee are the authors of this report, their

names are listed below.

 

     The first portion of this report provides a review of major

administrative report files pertaining to individuals and to

businesses.  Major statistical uses of administrative records are

outlined, including: (1) direct use of the records to obtain

statistics and to supplement existing data via expanding coverage

or content; and (2) technical uses of the data for constructing

sampling frames, quality control, improving on procedures, and data

evaluation.  New developments in data from business establishment

reporting and a number of potential uses of administrative records

for data linkage are described.  Technical problems in the

statistical use of administrative records, including coverage,

comparability, error and timing of data are discussed.  the final

section of the report covers various in accessing administrative

records for statistical purposes.



     While much statistical use of administrative records is

currently made in Federal agencies, this report is intended to

inform managerial and technical staffs of the vast potential as

well as difficulties entailed in augmenting current uses of

administrative records for statistical purposes.  The Office of   

Statistical Policy and Standards hopes to organize, with the help

of Subcommittee members, seminars with Federal employers to

disseminate the findings of this report. The implementation of the

recommendations in report will be explored by the Office of

Statistical Policy and Standards.



 

 

 

 

 



            Members of the Subcommittee on Statistical

                  Uses Of Administrative Records.

                            (June 1980)

 



Daniel H. Garnick* (Chair)



Bureau of Economic Analysis (Commerce)

 

Lois Alexander

Social Security Administration (HHS)

 

Paul A. Armknecht

Bureau of Labor Statistics (Labor)

 

David V. Bateman

Bureau of the Census (Commerce)

 

Lawrence A. Blum

Bureau of the Census (Commerce)

 

Warren L. Buckler

Social Security Administration (HHS)

 

David W. Cartwright

Bureau of Economic Analysis (Commerce)

 

John DiPaolo

Internal Revenue Service (Treasury)

 

Maria E. Gonzalez* (ex officio)

Office of Federal Statistical Policy & Standards (Commerce)

 

John A. Gorman

Bureau of Economic Analysis (Commerce)

 

David A. Hirshberg

Small Business Administration

 

Beth A. Kilss

Social Security Administration (HHS)

 

J. Knott

Bureau of the Census (Commerce)

 

Bruce Levine

Bureau of Economic Analysis (Commerce)

 

Nash J. Monsour

Bureau of the Census (Commerce)

 

Allan Olson

Economic Development Administration (Commerce)

 

Elizabeth H. Queen

Bureau of Economic Analysis (Commerce)

 

Vernon Renshaw

Bureau of Economic Analysis (Commerce)

 



Fritz J. Scheuren*



Social Security Administration (HHS)



 



Daniel F. Skelly



Internal Revenue Service (Treasury)



 



Hyman Steinberg



U.S. Postal Service



 



 



Additional Contributors to the Report on Statistical Uses of



Administrative Records



 



Jeanne E. Griffith



Office of Statistical Policy and Standards (Commerce)



 



Daniel Kasprzyk



Assistant Secretary for Planning and Evaluation (HHS)



 



Susan Miskura



Bureau of the Census (Commerce)



 



 



* Member, Committee on Statistical Methodology



 



                                ii



 



 



 



 



 



                          Acknowledgments



 



The body of this report is the collective effort of the



Subcommittee on Statistical Uses of Administrative Records. 



Although the subcommittee members reviewed and commented on all



parts of this report, specific individuals were responsible for



preparing the various sections.  In the case of Chapter VI, the



subcommittee benefitted from the expertise and contribution of



several additional persons in preparing the case studies.  The



authors of the chapters appear below:



 



Chapter                       Authors



I         Daniel Garnick, Maria Gonzalez, Vernon Renshaw, Lois



          Alexander, David Hirschberg, Fritz Schuren



II        Vernon Renshaw, David Hirschberg, Daniel Garnick



III       Joseph Knott, Lawrence Blum, Waken Buckler, Vernon



          Renshaw, Fritz Scheuren



IV        Vernon Renshaw, David Cartwright, Nash Monsour, Lawrence



          Blum, John Gorman, Daniel Skelly, John DiPaolo, Warren



          Buckler, Elizabeth Queen



V         Lawrence Blum, Paul Armknecht, Warren Buckler, David



          Cartwright, Vernon Renshaw



VI        Fritz Scheuren, Beth Kilss, Jeanne Griffith, Daniel



          Kasprzyk, David Bateman, Sue Miskura, Maria Gonzalez



VII       David Cartwright, Vernon Renshaw, Bruce Levine, Warren



          Buckler, Fritz Scheuren



VIII      Lois Alexander



 



     Maria Gonzalez worked with the subcommittee throughout its



two-year study.  Members of the Federal Committee on Statistical



Methodology and the Office of Statistical Policy and Standards



provided additional assistance and encouragement.  Critical reviews



of earlier draft versions by Thomas Jabine, Barbara Bailar, and



Tore Dalenius were particularly helpful in the development of this



report.



 



     Discussion by Richard Ruggles on papers by Daniel Garnick and



Joseph Knott, David Cartwright and Paul Armknecht, David Hirschberg



and Vernon Renshaw, and Lois Alexander at the Statistical Uses of



Administrative Records Session of the 1979 American Statistical    



meetings aided in sharpening the focus of this report.



 



     Others who contributed to the work of the Subcommittee



include: Yoshio Akiyama, Leroy Bailey, Robert Berney, J. Robert



Brown, Morris M. Kleiner, Lillian Madow, Harriet Orcutt, and Max



Shor.



 



                                iii



 



 



 



 



 



                Members of the Federal Committee on



                      Statistical Methodology



                            (June 1980)



 



 



Maria Elena Gonzalez (Chair)



Office of Federal Statistical Policy and Standards (Commerce)



 



Barbara A. Bailar



Bureau of the Census



 



Norman D. Beller



Economics, Statistics, and Cooperatives Service (Agriculture)



 



Barbara A. Boyes



Bureau of Statistics



 



Edwin J. Coleman



Bureau of Economic Analysis (Commerce)



 



John E. Cremeans



Bureau of Economic Analysis (Commerce)



 



Marie D. Eldridge



National Center for Education Statistics (Education)



 



Daniel H. Garnick



Bureau of Economic Analysis (Commerce)



 



Thomas B. Jabine



Energy Information Administration (Energy)



 



Charles D. Jones



Bureau of the Census (Commerce)



 



William E. Kibler



Economics, Statistics, and Cooperatives Service (Agriculture)



 



Alfred D. McKeon



Bureau of Labor Statistics (Labor)



 



Raymond C. Sansing



Internal Revenue Service (Treasury)



 



Fritz J. Scheuren



Social Security Administration (HHS)



 



Lincoln E. Moses



Energy Information Administration (Energy)



 



Monroe G. Sirken



National Center for Health Statistics (HHS)



 



Wray Smith



Office of the Assistant Secretary for Planning and Evaluation (HHS)



 



Thomas G. Staples



Social Security Administration (HHS)



 



                                iv



 



 



 



 



 



                         Table of Contents



 



                                                               page



Preface                                                           i



Acknowledgments                                                 iii



List of Figures                                                  ix



List of Tables                                                    x



Abbreviations                                                    xi



Chapter I.     Findings and Recommendations                       1



     A.   Statistical Standards                                   1



     B.   Access                                                  2



     C.   Other Government-Wide Program Coordination and Support  3



Chapter II.    Introduction and Summary                           5



     A.   Introduction                                            5



     B.   Summary                                                 6



          1.   Chapter III                                        6



          2.   Chapter IV                                         7



          3.   Chapter V                                          8



          4.   Chapter VI                                         8



          5.   Chapter VII                                        8



          6.   VIII                                               9



Chapter III.   Major Administrative Record Files                 11



     A.   Scope of Study and Survey Conducted                    11



          1.   Scope of Study                                    11



          2.   Survey Conducted                                  12



     B.   Survey Results                                         12



          1.   Files Pertaining Mainly to Individuals            12



               a.   Universe                                     12



               b.   Geographic Information                       17



               c.   Demographic Information                      17



               d.   Reporting Unit                               17



          2.   Files Pertaining Mainly to Businesses             18



               a.   Universe                                     18



               b.   Geographic Information                       18



               c.   Economic Data                                18



               d.   Reporting Unit                               18



     C.   Continuous Work History Files                          18



     D.   The Evolution of Statistical Uses of Administrative



          Records                                                19



     E.   Appendix III.1 The Survey Questionnaire                20



     F.   Appendix III.2 The CWHS Data System                    23



          1.   Data Sources                                      23



          2.   Processing Procedures - Administrative Records    23



          3.   Processing Procedures - Statistical Records       24



          4.   Sample Design                                     24



          5.   Data Files                                        25



               a.   One percent Sample Annual Employee-Employer



                    (Ee-Er) File                                 25



               b.   One percent Sample Annual Self-Employed (SE)



                    File                                         25



 



                                 v



 



 



 



 



 



                                                               page



               c.   One percent Sample Longitudinal Employee-



                    Employer Data (LEED) File                    25



               d.   One percent 1937 to Date CWHS File           26



               c.   One-Tenth of One percent 1937 to Date CWHS



                    File                                         26



Chapter IV.    Major Statistical Uses of Administrative          27



     A.   Defining Administrative and Using Them Statistically   27



     B.   Internal Revenue Service                               28



     C.   Social Security Administration                         29



     D.   Bureau of Economic Analysis                            29



     E.   Census Bureau                                          32



          1.   Economic Censuses                                 32



          2.   Census Of Agriculture                             33



          3.   Survey of Minority-Owned Businesses (SMOBE)       33



          4.   Current Economic Indicators                       33



          5.   The Standard Statistical Establishment List       34



     F.   The Small Business Administration                      34



     G.   Appendix IV.1 Data from IRS and SSA                    35



          1.   Data from IRS                                     35



          2.   Data from SSA                                     39



Chapter V.     Developments in Data from Business Establishment



               Reporting                                         43



     A.   Standard Statistical Establishment list                43



          1.   File Construction                                 44



          2.   Multiestablishment Firms                          44



          3.   Single Establishment Firms                        44



          4.   File Maintenance                                  45



          5.   Confidentiality                                   45



     B.   W-2 and W-3 Records                                    45



     C.   Unemployment Insurance System                          47



          1.   Master List of Employers                          47



          2.   Employers' Quarterly Tax Report                   47



          3.   Individual Wage Records                           49



          4.   Improving Data Quality                            49



Chapter VI.    Potential Uses of Administrative Records for Data



               linkages: Selected Case Studies                   51



     A.   Introduction                                           51



     B.   Case Study 1:  Linked Administrative Statistical Sample



                         (LASS) Project                          51



          1.   Background and Initial Project Goals              52



               a.   LASS Data Elements                           52



               b.   LASS Goals                                   53



          2.   Pilot Activities and Feasibility Issues           53



               a.   Resolving Privacy Concerns                   53



               b.   Examining SSA-NCHS Death Reporting Differences54



               c.   Adding Data From Death Certificates to the



                    CWHS                                         54



               d.   Usability of IRS Occupation Information      54



               c.   Upgrading CWHS Industry and Place of Work Data55



               f.   Evaluating W-2 Residence Data                55



          3.   Operational Implementation Issues                 56



          4.   References                                        56



     C.   Case Study 2:  The Use of Administrative Records in the



                         Survey of Income and Program



                         Participation                           57



          1.   Objectives and Description                        58



 



                                vi



 



 



 



 



 



                                                               page



               a.   Site Research                                58



               b.   1978 Panel                                   59



               c.   1979 Panel                                   61



          2.   Major Difficulties                                61



          3.   Uses of Administrative Files                      62



          4.   Quality of Results                                63



          5.   Bibliography                                      63



     D.   Case Study 3:  Use of IRS/SSA/HCFA Administrative Files



                         for 1980 Census Coverage Evaluation     64



          1.   Introduction                                      64



          2.   Objectives of the Program to Estimate the Census



               Undercount                                        64



          3.   Matching Techniques                               65



               a.   Matching of Survey Housing Unit and Person



                    Records to Census Records                    66



               b.   Matching of CPS and Census Enumerated Housing



                    Unit and Person Records to Administrative File



                    Records                                      66



          4.   Administrative matching                           66



          5.   Research Conducted for Match Study                67



               a.   1978 CPS/IRS Match Study                     67



               b.   IRS Census Match Study (Involving Richmond



                    Virginia and Southwest Colorado Dress



                    Rehearsal Censuses)                          68



          6.   Estimation                                        68



          7.   Anticipated Cost and Timing of Administrative Match



               Study                                             70



          8.   References                                        70



     E.   Case Study 4:  Record Linkage in the Nonhousehold      70



          1.   Introduction                                      70



          2.   Results from the Travis County, Texas and Camden New



               Jersey Pretest                                    71



          3.   Plans for the 1980 Census Nonhousehold Sources    75



          4.   Summary and Future Considerations                 76



          5.   Sources of Further Information                    77



          6.   References                                        77



          7.   Appendix-Matching Instructions                    78



     F.   Concluding Comments                                    78



Chapter VII.   Technical Problems in the Statistical Use of



               Administrative Records                            81



     A.   Coverage                                               81



     B.   Comparability                                          83



     C.   Reporting and Processing Errors                        85



          1.   Reporting Problems                                86



          2.   Processing Problems                               86



          3.   Extent of Errors                                  97



          4.   Related Problems with Other Data                  88



          5.   Errors in Other Information                       98



     D.   Problems with Timing of the Data                       89



     E.   Conclusion                                             89



Chapter VIII:  Legal Issues in the Statistical Use of



               Administrative Records                            91



     A.   Legal and Administrative System                        91



          1.   Factors Precipitating the Shift Toward Greater



               Statistical Use of Administrative Records         91



 



                                vii



 



 



 



 



 



                                                               page



          2.   Concept of Functional Separation                  92



          3.   A Language Framework for Legal Issues             93



          4.   Options:  Legislative Approaches to Functional



                         Separation                              95



     B.   Dynamics of Functional Separation                      96



          1.   Dimensions and Characteristics of the Legal



               Framework                                         96



               a.   Disclosure Within the Agency, a Broader View 96



               b.   Disclosure to Agency Contractors             97



               c.   Disclosure Among Federal Agencies            98



               d.   Use By Non-Statisticians of Statistical Files



                    Compiled From Administrative Source Records  99



          2.   A Closer Look At Some Federal Statutes Affecting



               Statistical Use of Administrative Records and



               Protection of Statistical Records from



               Nonstatistical Use                               100



     C.   Summary and Directions for the Future                 102



     D.   Notes and References                                  102



 



References                                                      104



 



                               viii



 



 



 



 



 



                          List of Figures



 



Figure                                                         page



 



III.1     Major Administrative Files Surveyed by the Subcommittee



          on the Statistical Uses of Administrative              11



V.1       Forms W-2 and W-3                                      46



V.2       Statistical Uses of Unemployment Insurance Administrative



          Records from Establishments                            48



VI.4.1    Nonhousehold Sources Worksheet to Search Census Records



          for Selected Person: 1976 Census of Travis County,Texas72



VI.4.2    Nonhousehold Sources Census Record Search and Telephone



          Follow-up Verification Record: 1976 Census of Camden, New



          Jersey                                                 75



VI.4.3    Nonhousehold Sources Record:   20th Decennial Census-



          1980                                                   76



 



                                ix



 



 



 



 



 



                          List of Tables



 



Figure                                                         Page



III.1     Major administrative Record Systems Pertaining to



          Individuals                                            12



III.2     Major administrative Record Systems Pertaining to



          Businesses                                             15



IV.1      National Income and Product Account Components Based on



          Administrative Records                                 30



IV.2      Input-Output Account Industry Estimates Based on



          Administrative Records                                 30



IV.3      Balance of Payment Account Components Based on



          Administrative Records                                 31



IV.4      National Income and Product Account Components Based on



          Current Surveys Using Administrative Record Based



          Sampling Frames                                        31



VI.2.1    Distribution of Site Research Sample Households by Sample



          Frame and Questionnaire Type                           58



VI.2.2    Distribution of Site Research Adult Respondents by Sample



          Frame and Questionnaire Type                           59



VI.2.3    A Sampling of AFDC Matching Results in the Site Research



          Survey                                                 60



VI.2.4    SSI Match Results for the 1978 Panel                   60



VI.3.1    Forming a Dual-System Estimate for One of the 61



          Divisions                                              69



VI.4.1    Camden Match Results                                   73



VI.4.2    Cross Tabulation of Age Reported on Drivers Licenses and



          Census Questionnaire (Camden, New Jersey)              74



VII.1     Comparison of Employment Estimates: CWHS, Census, UI, and



          CBP                                                    84



 



                                 x



 



 



 



 



 



                           Abbreviations



 



AFDC      Aid to Families with Dependent Children



BEA       Bureau of Economic Analysis



BEOG      Basic Education Opportunity Grant



BLS       Bureau of Labor Statistics



BMF       Business Master File (of IRS)



CAB       Civil Aeronautics Board



CBP       County Business Patterns



CofC      Comptroller of the Currency



CES       Current Employment Statistics



CETA      Comprehensive Employment and Training Act



CPS       Current Population Survey



CWBH      Current Wage and Benefit History



CWHS      Continuous Work History Sample



ED        Enumeration District



EEOC      Equal Employment Opportunity Commission



EI(N)     Employer Identification (Number)



ERP       Establishment Reporting Plan



FAA       Federal Aviation Administration



FCC       Federal Communications Commission



FDIC      Federal Deposit Insurance Corporation



FICA      Federal Insurance Contributions Act



FOIA      Freedom of Information Act



FPC       Federal Power Commission



FRB       Federal Reserve Board



FTC       Federal Trade Commission



GAO       General Accounting Office



GBF       Geographic Base File



HCFA      Health Care Financing Administration



HHS       Department of Health and Human Services



HEW       (Department of) Health, Education, and Welfare



ICC       Interstate Commerce Commission



IMF       Individual Master File (of IRS)



I-O       Input-Output



IRS       Internal Revenue Service



ISDP      Income Survey Development Program



LASS      Linked Administrative Statistical Sample



LTS       Labor Turnover Statistics



NCEUS     National Commission on Employment and Unemployment



          Statistics



NCHS      National Center for Health Statistics



NCI       National Cancer Institute



NIPA      National Income and Product Accounts



OASDI     Old Age, Survivors, and Disability Insurance



OES       Occupation Employment Statistics



OFSPS     Office of Federal Statistical Policy and Standards



OMB       Office of Management and Budget



 



                                xi



 



 



 



 



 



OPM       Office Of Personnel Management



ORS       Office of Research and Statistics (of SSA)



OSHS      Occupation Safety and Health Statistics



PES       Post Census Enumeration Survey



PPSC      Privacy Protection Study Commission



REA       Rural Electrification Administration



RFP       Request for Proposal



SBA       Small Business Administration



SER       Summary Earnings Record



SESA      State Employment Security Agency



SIC       Standard Industrial Classification



SIPP      Survey of Income and Program Participation



SMD       Statistical Methods Division (of Census)



SMOBE     Survey of Minority Business Enterprises



SMSA      Standard Metropolitan Statistical Area



SOI       Statistics of Income



SSA       Social Security Administration



SSEL      Standard Statistical Establishment List



SSI       Supplemental Security Income



SSN       Social Security Number



SSR       Supplemental Security Record



SUAR      Statistical Uses of Administrative Records



TCMP      Taxpayer Compliance Measurement Program



UI        Unemployment Insurance



USDA      United States Department of Agriculture



 



                                xii



 



 



 



 



 



                                                          CHAPTER I



                   Findings and Recommendations



 



     Statistical use of administrative records grew rapidly during



the 1970's, in large part as a response to legislative requirements



for timely data to use in the distribution of Federal funds to



State and local governments.  The principal reason for increasing



reliance on administrative records for statistical data is the



availability of administrative records which can be used to obtain



small area data at minimal cost and without increasing respondent



burden.  And cost is likely to be an increasingly important factor



in the statistical use of administrative records in the 1980's.



Although statistical use of administrative records is growing, many



unanswered questions remain concerning the quality of statistics



derived from administrative records.  From a statistical point of



view, the standards of quality and consistency in administrative



data collection and processing programs are frequently inadequate. 



Difficulties in accessing administrative records, moreover, often



inhibit the efficient joint use of particular administrative record



sets with other administrative and statistical records in meeting



statistical needs.  Improved statistics from administrative records



will require modification in data collection and processing



procedures, modification of laws and administrative procedures



relating to access to records, and increased resources for



evaluating and upgrading the quality of administrative records for



statistical use.  While the costs of improving administrative



records for statistical applications can be significant, they will



often be substantially less than alternatives requiring expanded



censuses and surveys.  And in many instances both administrative



and statistical programs could benefit from reduced respondent



burdens and data processing costs obtainable by applying more



efficient statistical tools in the collection and use of



administrative records.



     To solve problem impeding efficient statistical use of



administrative records, coordinated treatment of a variety of



interagency issues is needed to serve as a counterweight to the



decentralized operations of Federal information collection



programs.  In addressing these issues, the Subcommittee on



Statistical Uses of Administrative Records has divided its



recommendations into dim sections concerned with:



     A.   Identifying and formulating solutions for common problems



          related to statistical standards for administrative



          information programs.



     B.   Identifying and meeting various problems related to



          access to administrative record systems.



     C.   Identifying collection programs and research activities



          requiring government-wide coordination and support.



     Individual recommendations are in some cases accompanied by



examples of subcommittee findings which illustrate the need for the



recommendation.



 



 



                     A. Statistical Standards



 



     There is a need for greater standardization in the procedures



for collecting and presenting data based on administrative records



in order to provide a basis for reducing duplicate collection



efforts and improving the quality and consistency of the



information that is collected.



 



Recommendation 1.   Common identifiers should be used whenever



possible in collecting information Pertaining to the sow



individuals or organizations.



     The capability for linking information from a variety of



sources is central in making efficient statistical use of



administrative records.  This capability depends on both



appropriate access to administrative records (see Section B) and



consistency among administrative and statistical agencies in



procedures for identifying respondents or reporting units.  The



subcommittee noted, for example, that household surveys could be



used more effectively in conjunction with administrative records if



social security numbers and related identifying information were



collected in selected surveys.  This would permit linking detailed



socioeconomic information from surveys with longitudinal records



from administrative sources concerned, for example, with employment



or medical histories.  Such linkages are performed in various areas



of social research including specialized fields such as



epidemiology.  In business data collection programs, employer



identification numbers should be supplemented with a common set of



identifiers for the individual establishments of large businesses. 



Selected administrative record data for multi-establishment



businesses could then be linked more readily to economic census and



survey data for purposes of improving geographical and industrial



analysis of economic activity



 



                                 1



 



 



 



 



 



Recommendation 2.   The quality of administrative records to be



used for statistical purposes should be evaluated systematically to



determine the appropriateness of the records for the proposed use.



     The quality of administrative record files, including such



factors as the type and quality of identification on the file and



the completeness, definitional suitability, and quality of



individual or organizational characteristics on the file. will



determine the appropriateness of the use of the files for



particular statistical applications.  For example, in matching



applications the completeness of the coverage of the administrative



record files and the accuracy of identifiers will determine whether



a high match rate will be achieved.  Similarly, in such



applications as the distribution of Federal funds to State and



local governments. completeness and accuracy of administrative



records, will determine the extent to which estimates derived from



these records may serve as complements as well as substitutes for



census and survey data.



Recommendation 3.   Consistent procedures should be used in



administrative and statistical data collection efforts for defining



reporting units, identifying and coding reporting unit



characteristics, and developing standards



for data tabulation.



     When common reporting units are not appropriate there should



still be efforts to ensure that the more detailed reporting unit



breakdowns of one program can be readily combined into more



aggregative units used in other programs.  The subcommittee noted,



for example, a lack of congruity in the definition of companies



filing corporate income tax returns and companies reporting for



statistical Purposes to the Census Bureau.  The subcommittee also



found a particularly serious problem of inconsistency between



"establishment" reporting plans associated with administrative



programs and the definitions of establishments of multiunit



companies used in the Census Bureau's Standard Statistical



Establishment List.  The Social Security payroll tax program, for



example, involves a voluntary establishment reporting plan with



company self-identification of reporting units on a basis differing



from SSEL definitions.  The need for consistent reporting



requirements that eliminate duplicate and other unnecessary



reporting is highlighted by the fact that the compliance of large



companies with the SSA establishment reporting plan and other



voluntary statistical programs has been deteriorating in recent



years.



     Problems of inadequate procedures for coding reporting unit



characteristics have been emphasized by the subcommittee in such



areas as geographic coding and the industrial coding of business



establishments.  Reliable and detailed geographic coding in



administrative record systems, in particular. has become



increasingly important as administrative records have received



wider application in preparing statistics for use in distributing



Federal funds to State and local governments.  For many purposes



geographic coding is required at the municipal level, but substate



coding in administrative record systems tends to be restricted to



county identifiers.  The lack of current economic information by



municipality has hindered effective planning and economic policy



making at the Federal as well as State and local level.  For



business reporting systems, the SSEL coding system can provide a



basis for obtaining consistency in both geographic and industrial



coding.



     The need for consistent standards for data tabulation have



recently been highlighted by efforts to assemble a data base for



analyzing small business policy issues.  These efforts have been



hampered by inconsistencies among various administrative and



statistical programs in the ways in which data are identified and



tabulated by size of business.



 



                             B. Access



 



     A central issue related to meeting the differing requirements



of data for administrative vs. statistical applications efficiently



involves the problem of obtaining an appropriate balance between



the need to access individual records and the right to privacy as



well as consideration of confidentiality of responding persons and



businesses.  Resolution of this issue requires that distinctions be



made both in terms of the uses to be made of records and the types



of reporting units and information involved.



Recommendation 4.   Natural persons should be distinguished from



organizations and other entities when developing standards and



practices of record confidentiality.



     The need for confidentiality is not the same for businesses



and other organizations as for natural persons.  Often,, the need



for access to selected information pertaining to businesses



requires interagency transfer of information about organizations. 



The subcommittee has found, for example, instances in which Federal



a#coca purchase privately produced lists of businesses containing



generally available information, such as name and address of the



businesses, because access to more complete and reliable lists such



as the Census Bureau's SSEL has been excessively restricted.  The



subcommittee is not persuaded that these restrictions are



reasonable or necessary.



Recommendation 5.   Legislation and administrative procedures



should be modified to make comprehensive Federal lists of



businesses and organizations, such as the



 



                                 2



 



 



 



 



 



Census Bureau's Standard Statistical Establishment List and SSA's



employer listing, more readily available for statistical uses.



     Legislation has been drafted to make the SSEL available to



Federal agencies for statistical purposes.  Passage of the proposed



legislation could aid in reducing the duplication and costs, and



the attendant differences in definition and coverage resulting when



independently developed lists are maintained.  SSA's listing of



employers is compiled from the applications for employer numbers



required of employers of workers covered by Social Security, now



virtually the entire workforce.  Availability of this list as a



statistical sampling frame has been closed by application of the



Tax Reform Act of 1976.



Recommendation 6.   For natural persons. the principles of



"functional separation" developed by the Privacy Protection Study



Commission, the White House Privacy Initiative, and the President's



Statistical Reorganization Project should be applied in



distinguishing records to be used for administrative (and



enforcement) purposes from records to be used for statistical



purposes.



     Functional separation will establish two discrete categories



of information according to the statistical or administrative and



enforcement functions to which the information is assigned.  The



separate category of statistical information- can be freely used



and transferred with individual identifiers intact for statistical



purposes.  Between the two categories, information that can be



uniquely associated with subject individuals flows only one way,



into the statistical category.  The flow from the statistical



category into other uses must be in a form or under conditions that



prevent unique association.  When administrative records are the



initial information source, the resultant copies or extracts which



have been incorporated into statistical files may not be



subsequently used in individually identifiable form for



administrative or enforcement purposes.'



Recommendation 7.   Particular legal and administrative barriers to



access to administrative records for statistical use should be



identified and eliminated for records pertaining to both natural



persons and organizations.



     The subcommittee, for example. has found limitations on access



to IRS data imposed under Section 6103 of the Tax Reform Act of



1976 to be excessively restrictive to statistical uses of the data. 



In this connection it can be noted that the Internal Revenue



Service has denied other Federal agencies access to Taxpayer



Compliance Measurement Program data files for 1976 and subsequent



years.  In addition, the Tax Reform Act has prevented the Social



Security Administration from supplying the Bureau of Economic



Analysis with post- 1975 Continuous Work History Sample Files



needed to continue a long-standing cooperative program to use and



improve this important statistical data base.



 



 



     C. Other Government-Wide Program Coordination and Support



 



 



     In order to maximize the usefulness of administrative record



systems, it will be necessary to identify on a government-wide



basis those data collection programs, as well as research



initiatives, which need interagency support.  Further the needs of



data users should be considered in designing statistical series



based on administrative records.



Recommendation 8.   Procedures for planning and setting budget



priorities should be developed to ensure that agency and program-



specific budget allocations are responsive to those interagency



data needs that are met most effectively through the specific



programs under review.



     Many administrative programs are not explicitly budgeted for



supplying those general-purpose statistical needs which could be



met efficiently through statistical use of administrative records. 



The subcommittee has found, for example, that geographic and



industrial data quality in the Social Security Administration's



Continuous Work History Sample has been declining because the data



have few applications for internal SSA programs and therefore



receive low priority in the agency budgeting process.  Geographic



and industrial data from the CWHS, however, are very important for



outside data users.  And they will become even more important if



administrative records are called on to play a central role in



providing intercensal estimates.  In planning alternatives to a



mid-decade census there should be careful cost-benefit analysis of



different approaches involving various combinations of survey and



administrative record data sources.



Recommendation 9.   As recommended by the President's Statistical



Reorganization Project, efficient statistical tools should be



applied in information collection programs extending well beyond



the confines of the principal statistical agencies.



     Statistics can contribute techniques for improving design of



forms. both to improve quality of response on administrative forms,



and to improve the multi-purpose utility of the information



provided.  Development and extension of such statistical techniques



as scientific sampling. record matching, and synthetic estimation



can be used effectively to economize on the amount of information



that needs to be collected, thereby reducing paperwork burdens and



budgetary costs associated with administrative as well as



statistical data collection programs.



 



                                 3



 



 



 



 



 



Many administrative record data collection  programs have lagged



well behind the "state of the art" in the application of



statistical tools, and modernization of programs is badly needed.



Recommendation 10.  To obtain statistical data. increased use



should be made of matches between sample surveys and administrative



files.  Samples based on linkageS among administrative record



systems also should be encouraged for statistical purposes.



     The subcommittee has investigated the statistical uses of



linking of administrative record files with sample survey data. as



well as with samples from other administrative records.  The



subcommittee endorses the use of matching to obtain statistical



data based on the combination of administrative records and sample



surveys.  The analytic potential of obtaining expanded. more



detailed data bases through successful matching is sufficiently



great that complicated procedures are often worth the effort. 



However, for each specific program proposing to use linkage s to



obtain statistical data. it is necessary to examine the costs and



benefits to the program to determine whether the match should be



performed.



     The case studies in Chapter VI illustrate potential uses of



administrative records for important statistical programs'. each



case study has specific goals, applications, and advantages.  Mc



combined use of administrative record files and sample survey data



for linkage programs may be effective for a variety of masons.



including that: (1) respondent burden may be reduced while



estimates of subpopulation characteristics are improved and data



accuracy is assessed (see SIPP case study), (2) data which are



difficult for a survey respondent to provide may be obtained from



administrative record files (see LASS case study). (3) improved



counts of population from the 1980 Census may be obtained in a



cost-effective manner (see Nonhousehold Sources Program case



study), and (4) estimates of coverage of population for States and



selected subgroups of the population based on the 1980 Census my be



obtained (see case study on IRS/SSA/HCFA matched with CPS and



Census).



Recommendation 11.  The provision o  f services to users should be



recognized as a statistical program function to optimize the



availability of statistical information in Federal.  State and



local government and in the private sector, and to give the Federal



system the benefit of feedback from users in planning statistical



programs based on administrative records.



     A major obstacle to encouraging statistical use of ad-



ministrative records is the lack of knowledge (both inside and



outside the Federal Government) about the information in these



records and their coverage and quality.  The American Statistics



Index provides a comprehensive list of published statistics from



administrative and survey sources, but information on the quality



and availability of unpublished data, particularly from



administrative records, is seriously deficient.  Centralized



information is needed to make existing data more readily accessible



to potential users and to help in identifying unnecessary



duplication in data collection programs.  Promising recent



initiatives in this area include a Small Business Administration



program to document all Federal reporting requirements placed on



businesses and a National Center for Health Statistics program to



establish a clearinghouse for data relating to environmental health



hazards.  In addition, the proposed Paperwork Reduction Act of 1980



(H.R. 6410) provides for establishing a Federal Information Locator



System, as recommended by the Commission of Federal Paperwork.



 



                                 4



 



 



 



 



 



                                                         CHAPTER II



                     Introduction and Summary



 



 



A. Introduction



 



     The Federal Statistical System is under pressure to respond



simultaneously to a growing demand for statistical data and a



growing demand for reductions in the "paper blizzard" generated by



Government requests for information from individuals and



businesses.  These demands will necessarily conflict unless the



efficiency of current programs can be improved.  Responsiveness to



both demands will require reduced duplication among Government



information collection programs combined with more intensive



utilization of existing administrative information sources in



meeting statistical data needs.  The latter requirement will



involve bringing together information collected in numerous



different Government administrative programs in ways that make



possible their combined use for statistical analysis.  As stated by



Edgar Dunn (1965, P. 5) in a review of the Ruggles' Committee



proposal for a national data center.



     The central problem of data use is one of associating



numerical records.  No number conveys any information by itself. 



It acquires meaning and significance only when compared with other



numbers.  The greatest deficiency of the existing Federal



Statistical System is its failure to provide access to data in a



way that permits the association of the elements of data sets in



order to identify and measure the interrelationship among



interdependent activities.



     As Dunn further notes (1965, Summary, p. 2) problems of access



and record association are particularly serious in the case of



statistical use of administrative records because: "Many of the



most useful records are produced as a by-product of administrative



or regulatory procedures by agencies that do not recognize a



general-purpose statistical service function as an important part



of their mission."



     The association or merger of administrative records from a



variety of sources is important for statistical applications



because: (1) populations of statistical interest do not always



correspond closely to populations covered in individual



administrative record systems; and (2) individual administrative



record files often identify relatively few of those characteristics



and attributes of the members of a population that social



scientists and policy analysts consider to be important in meeting



their statistical needs.  Merging individual administrative record



sets with other administrative and statistical data sources can



help to alleviate the deficiencies of many individual administra-



tive sources; but record merging is often difficult--particularly



when the records are collected and maintained by separate agencies. 



Provisions for protecting the confidentiality of records pertaining



to identifiable individuals or businesses often preclude



interagency transfer of such records for statistical applications. 



And even when access to the records needed for merging can be



arranged, differences in the ways different agencies identify



individual reporting units, and/or inconsistencies in the ways



agencies collect, process, and maintain information about reporting



units, can preclude successful data matching and merging operations



(see Chapter VI).



     Although difficult problems remain to be solved, statistical



uses of administrative records have been increasing and will



continue to increase because of high data collection costs and



heavy respondent burdens associated with censuses and surveys. 



Many important statistical needs cannot be adequately met by a



system involving censuses, carried out every 5 or 10 years,



combined with intercensal surveys which provide national data.  And



the extra costs of moving to more frequent censuses and/or larger



sample surveys which might provide small area data are high both in



terms of direct government expenditure and response burden.  The



projected high cost to the government was an important factor in



the recent decision to disallow further planning funds for the 1985



mid-decade census.



     The most striking illustrations of the need to make improved



statistical use of administrative records arise in cases involving



the use of socioeconomic data to distribute Federal funds to State



and local areas.  For example, in reviewing alternatives for



meeting the legislative mandate to produce current local-area



unemployment estimates for use in allocating funds under the



Comprehensive Employment and Training Act, the National Commission



on Employment and Unemployment Statistics ( 1 979, p. 253) has



estimated that it would cost about $2.3 billion annually to expand



the Current Population Survey to provide monthly unemployment



estimates for the over 4,000 geographic areas potentially eligible



for CETA funding.  As important as the high money costs involved in



obtain-



 



                                 5



 



 



 



 



 



ing frequent small-area data by survey techniques is the



substantial increase in response burdens associated with greatly



expanded data collection efforts.



     For example, another alternative considered by the NCEUS was



improving the handbook method (called 70-step method) based on



unemployment insurance records.



     Not only is them pressure for statisticians to increase their



use of administrative records in developing general-purpose



statistics, but statisticians also have a strong interest in



supporting efforts to reduce the duplication and improve the



efficiency of administrative as well as statistical information



collection efforts.  Direct reporting for statistical purposes



accounts for a very small proportion of the overall Federal



reporting burden; major reductions in overall paperwork burdens



must be achieved through improvements in nonstatistical arm.  At



the same time; however, statistical programs could be more



adversely affected than other programs because statistical programs



tend to be more often viewed as optional than administrative record



systems and, therefore. more dependent on the voluntary cooperation



of the public in obtaining responses to information requests.



As the following statement from the President's Statistical



Reorganization Project's "Issues and Options" paper (1978, p. 7-1)



indicates, there is a growing recognition of the importance of



applying statistical tools to more general problems of information



collection in order to reduce reporting burdens:



     The tools used by statistical agencies (sampling, quality



     control, intensive analysis of existing data, etc.) are near



     the roots of reporting requirements, and the use of



     appropriate tools reduces reporting burden.  It is in this



     sense that. from the point of view of response burden, the use



     of appropriate statistical techniques is of major importance



     and should extend well beyond any formal definition of the



     Federal Statistical System.



The statistical system, however, cannot hope to dominate Government



information collection activities; There must be a genuine effort



to cooperate with administrators in nonstatistical programs in



order to achieve mutual goals of efficient information collection. 



Statisticians must attempt to understand the needs and constraints



facing program administrator and statistical budgets should bear a



fair share of the costs of collecting and processing administrative



records in ways that permit efficient use for statistical purposes.



Much must be learned and many difficult problems confronted if



progress is to be made in the statistical use of administrative



records and in improving the overall efficiency of Government



information collection and use, With the hope of contributing to



progress in this area, this report attempts to: (1) identify major



administrative data files with significant potential for general-



purpose statistical applications; (2) indicate various kinds of



statistical uses of administrative records which are being made or



considered; (3) identify major technical and institutional or legal



problems which are impeding effective statistical use of



administrative records; and (4) suggest possible approaches to



improving information collection and statistical use of



administrative records.



     The Subcommittee on Statistical Uses of Administrative Records



has not attempted to provide comprehensive documentation of



administrative record systems and their uses.  The report instead



reflects largely the areas of interest and expertise of



Subcommittee members.  Important areas such as energy and



environmental statistics are not covered at all, and very little



attention is given to records generated by the complex array of



Government regulatory agencies.  There is, however, relatively



intensive coverage of administrative data from programs of the



internal Revenue Service and Social Security Administration, and



from related administrative programs that collect important social



and economic information from individuals and businesses.



 



 



                            B. Summary



 



     Chapter III of the report presents the results of a survey



conducted by the Subcommittee to obtain documentation of major



administrative record data files maintained by selected Federal



agencies.  Chapter IV presents a description of statistical



applications of administrative records in selected agencies.  The



following three chapters (V-VII) illustrate, largely by means of



case studies, specific approaches to statistical use of



administrative records and problems encountered in such approaches. 



Chapter VIII reviews legal considerations, particularly those



related to restricted access to records, that influence the



statistical use of administrative records.



 



1.   Chapter III-Major Administrative Files



 



     This chapter summarizes the characteristics of major



computerized administrative record files that are maintained or



mandated by the Federal Government and contain statistically useful



information pertaining to (I) individuals or (2) businesses.  The



information contained in the administrative files for individuals



is compared to the information on individuals collected in



decennial censuses; and the information contained in the



administrative files for businesses is compared to the information



contained on the Census Bureau's Standard Statistical Establishment



List (which is itself assembled from a combination of



administrative and survey data sources).  The chap-



 



                                 6



 



 



 



 



 



ter also contains a description of the Social Security



Administration's Continuous Work History Sample which is a set of



statistical files of individual worker records assembled using



several SSA business and individual administrative record files.



     Compared with the decennial census, most administrative record



files for individuals contain relatively little information on



population characteristics and/or cover only a limited segment of



the population.  In addition, the, census usually provides more



reliable and detailed geographic information than administrative



files; and at best, administrative records can provide only tough



approximations to such census reporting units as the family and



household.  On the other hand, many administrative files provide



data at much more frequent intervals than the decennial census, and



the presence of social security numbers on most administrative



files opens the possibility of linking files over time



(longitudinally) or merging information from more than one



administrative file in order to increase the cove rage of



individuals and/or the number of characteristics identified for



particular individuals.  The absence of SSN's in census records



generally makes it difficult to integrate information from censuses



with information from administrative records.



     Administrative record coverage of businesses is complete than



is true for individuals.  In fact, administrative lists of



businesses provide the basis for conducting statistical censuses



and surveys.  For the most part, however, administrative records do



not maintain separate information for the different establishments



of a single legal business entity, even though the business may



operate in several different geographic areas and/or industrial



categories.  The Census Bureau does collect information for



individual establishments; and the SSEL, therefore, contains a



larger list of reporting units than most administrative files. 



While most administrative business files do not contain the



establishment detail necessary for developing reliable geographic



and industrial data, the SSA and Unemployment Insurance payroll tax



programs do involve reports breaking out county level "establish-



ment" detail.  Unfortunately, however, the reporting units in these



programs are not consistent with the establishment concept used in



the SSEL, and there is currently no satisfactory basis for



coordinating the reporting of similar information (or resolving



data discrepancies) among the three systems.



     CWHS data files provide information on the demographic



characteristics (sex, age, and race) of. workers along with



longitudinal information on their employment and earnings patterns. 



The CWHS program illustrates the potential statistical advantages



of administrative records for longitudinal analysis and for linking



together information about individuals and businesses.



 



2.   Chapter IV-Major Statistical Uses of Administrative Records



 



     This chapter illustrates statistical uses of administrative



records with reference to the programs of selected Federal



agencies, particularly programs of the Social Security



Administration, the Internal Revenue Service, the Bureau of



Economic Analysis, the Census Bureau, and the Small Business



Administration.  The SSA and IRS programs involve the development



of general-purpose statistics by statistical divisions of agencies



that collect large amounts of information from individuals and



businesses in the course of their administrative responsibilities. 



The programs illustrate the large quantity and variety of adminis-



trative data collected as well as the limitations of incomplete



population coverage and lack of information on important population



characteristics that plague statistical use of administrative



records.



     The BEA programs illustrate the use of a wide variety of



administrative data (obtained from many agencies) for estimating



data series within the context of a systematic economic accounting



framework.  Administrative data are used in conjunction with census



and survey data (also generally obtained from other agencies); and



there are substantial variations among the administrative data



series in the extent to which they involve concepts and measurement



procedures that "fit" well with the concepts involved in the design



of the accounting framework and with concepts underlying the census



and survey data used.



     Census Bureau programs illustrate a wide variety of



applications of administrative records for both individuals and



businesses.  For example, records obtained from administrative



agencies are used in developing intercensal population and related



estimates, as a substitute for censuses in the collection of



economic data from many small businesses, in the development and



maintenance of sampling frames for surveys, and in the evaluation



of the completeness and, reliability of information collected in



censuses and surveys.  Again there are substantial variations in



the extent to which administrative record concepts match desired



statistical concepts.  A few census programs. primarily in the area



of economic statistics. art discussed in more detail than other



programs covered in Chapter IV.  These more detailed examples



illustrate the substantial cost savings as well as limitations



associated with the statistical use of administrative records.



     The SBA involvement in the statistical use of administrative



records stems largely from a recently initiated project to develop



a small business data base in conjunction with the 1980 White House



Conference on Small Business.  In part because of concerns over



reporting burdens, small businesses have been exempted from or



 



                                 7



 



 



 



 



 



covered on a very small sample basis, in most economic censuses and



surveys.  Therefore. a small business data base must rely heavily



on administrative records.  SBA efforts to develop such a data base



illustrate many of the problems that are often encountered in



gaining access to administrative records and adapting them for



statistical analysis.



 



3.   Chapter V-Developments in Data from Business Establishment



     Reporting



 



This chapter contains case studies of three important and related



statistical programs that are currently evolving based in large



part on developments in administrative record systems-(1) the



Census Bureau's SSEL program; (2) SSA's program for adapting its



CWHS data program to a new system of annual employer reports of



worker wages on forms W-2 and W-3; and (3) the Bureau of Labor



Statistics' program for developing work force statistics in



connection with the UI payroll tax program.  These programs produce



both complementary and overlapping statistical products in the area



of work force statistics; and they illustrate not only the



importance and potential of administrative records for developing



work force data, but they also illustrate some important problems



in the area of establishment reporting by multiestablishment



businesses and in the area of coordinating similar data collection



efforts in different agencies.  The Census Bureau program employs



the most satisfactory concept of establishment from a statistical



point of view, but the Census work force data assembled in



connection with the SSEL cannot match the frequency and timeliness



of BLS data based on the UI system, nor can the SSEL-based data



provide the information on demographic characteristics of workers



available from the SSA system.  And the different establishment



reporting plans of the three data systems combined with



difficulties of interagency transfers of records (for example, the



current restrictions on access to the SSEL) have severely limited



the scope for coordinating data collection and development efforts



in the three programs.



 



4.   Chapter VI--Potential Uses of Administrative Records for Data



     Linkages: Selected Case Studies



 



     This chapter involves four case studies that illustrate the



potential and the problems associated with record linkages as a



means of improving and extending the use of. administrative records



in developing primary data and in evaluating census and survey



data--(1) the "Linked Administrative Statistical Sample Project"



(2) the "Use of Administrative Records in the Survey of Income and



Program Participation," (3) the "Use of IRS/SSA/HCFA Administrative



Files for 1980 Census Coverage Evaluation," and (4) "Record Linkage



in the Nonhousehold Sources Program." In contrast to Chapter V,



where the difficulties of coordinating and linking business



establishment records among programs was highlighted, Chapter VI is



concerned with linkages involving records for individuals.



The LASS project involves efforts to link records from a variety of



administrative record sources in order to develop a general-purpose



statistical sample file that will be suited for mortality research. 



The sampling procedures will conform closely to those involved in



the CWHS in order to facilitate longitudinal data analysis, but



CWHS records will be supplemented with records from IRS and the



National Center for Health Statistics.  The project illustrates the



substantial potential for combining complementary data through



interagency linkage of administrative record files.  But the



project also illustrates significant technical problems and



problems of access restriction that need to be resolved in linking



data files prepared in different agencies.



     The SIPP case study illustrates the importance of



administrative records in efforts to alleviate substantial survey



biases in coverage and income reporting for low-income groups



(participating in various income maintenance programs) and



administrative record importance as a source of income data to



evaluate the reliability with which selected types of income are



reported in surveys.



     The third and fourth case studies are both associated with



efforts to evaluate and improve the 1980 Census of Population and



Housing.  The IRS/SSA/HCFA files will be used primarily in efforts



to evaluate the extent of Census undercoverage, while the



Nonhousehold Sources Program will be concerned with improving



population coverage in selected areas of anticipated high



undercount.  The latter program involves, in addition lo the use of



Federal agency records, the use of such State and local



administrative records as drivers' license records.  Both projects



demonstrate the potential of administrative records to identify



individuals who are missed in censuses and surveys.  The projects



also illustrate; however, the difficulties and high costs of



linking administrative records to census records (which contain no



social security number) and the difficulty of determining the



extent to which particular groups are not covered in either census



or administrative record sources.



 



5.   Chapter VII-Technical Problems in the Statistical Use of



     Administrative Records



 



     This chapter illustrates technical problems encountered in



making statistical use of administrative records that arise or are



exacerbated because of limited statistical control in



administrative record systems over such factors as population



coverage,, definitions and comparability of information concepts



among programs, and reporting and



 



                                 8



 



 



 



 



 



processing procedures.  The CWHS data program is used as the



principal source of illustrations, in part because the CWHS program



involves the use of files containing information about businesses



as well as individuals, and perhaps more importantly because it



illustrates well the problems that can arise when important



statistical aspects of the reporting and processing of records we



largely outside the control of statisticians responsible for making



statistical use of the records.  In particular them is evidence of



significant and increasing numbers of geograPhic coding errors in



the CWHS that have resulted from low priority attached by SSA



administrators to the statistical problem of obtaining reliable



geographic reports and ensuring accurate coding and processing of



geographic information in employer payroll reports to SSA.



 



6.   Chapter VIII: Legal Issues in the Statistical Use of



     Administrative Records.



 



     This chapter illustrates legal and related institutional



barriers which inhibit the interagency access to records that is



needed for improving the efficiency and effectiveness of



statistical use of administrative records.  Emphasis is placed on



problems which arise because of a failure of existing



confidentiality laws to make an adequate functional distinction



between statistical and administrative processes which use records



about individuals.



     The basis for interagency transfer of administrative records



is often found in a logic that imposes regular Procedures or



conditions for expanding the scope 'of administrative actions or



decisions which can be based on the. particular content of records



about an individual.  Such a logic is generally irrelevant with



respect to legitimate statistical processes which, in contrast to



administrative uses, merely produce relationships and summaries of



data, and do not involve any direct Government action against (or



in favor of) the individual as a consequence of information in



records pertaining to that individual.



     Clearly not all statistical performance is functionally



divorced from administrative processes: program integrity and



quality assurance are functions which may explicitly---and quite



properly-rely on applied statistical techniques to identify



individual cases for administrative action.  Such functions are



within the reasonable expectations of program participants, and do



not rely, moreover, on collection of information from volunteers,



with assurances of confidential treatment.  In contrast, there are



particular statistical activities or collections of data whose



existence and rationale for compiling and making interagency



transfer Of data is limited by the degree to which statisticians



can fulfill a legal or ethical duty to protect the confidentiality



of individual information.



     Statistical uses in this latter category need to be separated



out as discrete functional uses, and be governed by different rules



and standards from those which govern administrative and compliance



uses.  Proposals for functional separation" of statistical from



administrative uses argue for separating these statistical records



about identifiable individuals from the decision/action stream, and



permitting the statistical results to be available to adminis-



trators only in summary or other unidentifiable form.  Functional



separation would allow summaries, of course, to be used



administratively in ways which my result indirectly in consequences



affecting all members of the group in uniform ways. However,



functional separation would not permit the direct use of individual



records as the basis for individual actions.  Alternative



legislative proposals for implementing the concept of functional



separation are reviewed in the chapter.



 



                                 9



 



 



 



 



 



 



 



 



                                                        CHAPTER III



 



                  Major Administrative Data Files



 



 



     This chapter describes the general properties of most of the



major Federal administrative record files containing statistically



useful information pertaining to individuals or businesses.  The



discussion is based largely on a survey of selected Federal



agencies conducted by the SUAR Subcommittee.  An attempt is made to



lay the groundwork and indeed begin the discussion, continued in



Chapter IV. of the statistical uses of administrative record



systems.



     Organizationally, the chapter is divided into four sections



and two appendices.  The first section indicates the scope of the



administrative record files covered and describes the survey



instrument used to obtain file documentation.  In the second



section there is a brief summary of the survey results.  In the



third section there is a brief description of the Social Security



Administration's Continuous Work History Sample files.  The CWHS



files illustrate the process of extracting and merging information



from basic administrative files to obtain files useful for



statistical analysis.  In the final section there is a discussion



of selected factors associated with the historical evolution of the



statistical use of administrative files covered in the chapter. 



The survey questionnaire is reproduced in the first appendix, and a



more detailed description of the CWHS program and data files is



contained in the second appendix.



 



              A. Scope of Study and Survey Conducted



 



1. Scope of Study



 



     In compiling a list of "administrative" record files that



would be of greatest statistical interest, three criteria were



employed:



 



     1.   Does the file have extensive coverage of a Population



          (either individuals or businesses)?



     2.   Is the population covered by the administrative record



          set of statistical interest?



     3.   Is the file maintained by computer?



The systems chosen for examination under these criteria are shown



in Figure III.1. Information relating to individuals was sought



from ten Federal agencies; some twenty-four administrative record



files were involved in all.



 



 



Figure III.1   Major Administrative Record Files Surveyed by the



               Subcommittee on the Statistical Uses of



               Administrative Records



______________________________________________________________



Agency                        Administrative Record File



______________________________________________________________



                 Part I-Information on individuals



Bureau of the Census          1970 Census of Population



                              1980 Census of Population



Office of Personnel Man-      Central Personnel Data File



agement                       Civil Service Annuity Roll



Department of Defense         Active Military Personnel Data File



                              (Army, Navy, Air Force and Marines)



                              Military Retirement Compensation File



                              (Army. Navy Air Force, and Marines)



Department of Trans-          National Driver Register



portation



Internal Revenue Service      Individual Master Filer



Department of Education       Basic Education Opportunity Grant



Railroad Retirement           Research Master Beneficiary File



Board                         Service and Compensation (SCORE)



                              Railroad Retirement, Survivor  and



                              Pensioner Benefit Payment File



Social Security Adminis-      Summary Earnings Record



nation                        Master Beneficiary Record



                              Numerical Identification File (SS-3)



U.S. Coast Guard              Personnel Management Information



                              System



                              Retired Officers Support System



                              Retired Pay and Personnel System



Veterans Administration       Compensation and Pension Master



                              Record Insurance (In-Force) Master



                              Record File



                              Education Master Record File



                              Vocational Rehabilitation and



                              Education Statistical File



                              Insurance Awards Master Record File



                              Education Master File



 



______________________________________________________________



                 Part II-Information on Businesses



Bureau Of the Census          Standard Statistical Establishment 



                              List



Bureau of Labor Statis-       Unemployment Insurance Address File



tics



Department of Agricul-        Producer Name and Address Master



ture File                     Economics, Statistics, and



                              Cooperative Service List Sampling



                              Frame



Department of Health          Master Facility Inventory



and Human Services



Internal Revenue Service      Business Master File



                              Exempt Organization Master File



Social Security Adminis-      Master Employer Name Directory



tration                       Multi-Unit Code File



                              Single-Unit Code File



 



                                11



 



 



 



 



 



     For businesses, the scope of the inquiry was restricted to



nine major Federal systems in six agencies.



     It should be noted that although the Subcommittee does, not



Classify the decennial censuses of population as administrative



data files. since their main purpose is statistical, they are



nonetheless. included to provide a basis for comparison with the



other files on individuals.  The Census Bureau's Standard



Statistical Establishment List was also treated as "in scope" for



comparison purposes. this time with business administrative record



files.



 



 



2.   Survey Conducted



 



     In late 1978. the Subcommittee conducted a survey of



the administrative files listed in Figure II.1. This survey was



entitled "Statistical Use Survey of Records Pertaining to



Individuals.  Individual Firms, and Employers Maintained and/or



Mandated by the Federal Government.



     A questionnaire was mailed to each agency maintaining one of



the selected files.  The principal purpose of the questionnaire was



to document the data elements on each file that might be of



statistical interest. it was not the intent of the survey to be



comprehensive, but simply to provide a starting point for



structuring inquiries about the files.



This survey collected data on both individual and business files by



providing optional sections to completed depending on the type of



file being considered.



     The survey consisted of only fifteen questions, but a number



of the questions contained several parts.  Respondents were asked



to report the availability of documentation concerning the file,



the information carried on the file, and the history of the file



development and maintenance.  For the most part, each agency made a



serious effort to provide detailed responses to the questions.



 



                         B. Survey Results



 



     This section briefly summarizes the survey results.  First.



the files pertaining to individuals are considered. then those



pertaining to businesses.  Detailed tabulations from the survey are



included in Tables II.1.1 and III.2.



 



1.   Files Pertaining Mainly to Individuals



 



     Not unexpectedly, there are extensive differences



among the administrative record files on individuals.  some of



those which deserve special mention are the differences in coverage



(or "universes") among the files, the degree of coded geographic



information; the demographic item included and the reporting units



used:



 



     a. Universe



 



     In terms of coverage of individuals in the U.S. population.



the decennial Census files are the most complete, followed by



Social Security's Summary Earnings Files and the IRS Individual



Master Fide.  No other files have the same breadth of coverage as



these.  However, several other files do provide comprehensive



coverage of important segments of the population.  For example, the



Health insurance Master File for the "65 + " population, the



 





                                12



 

Central Personnel Data File-for Federal government workers; and the



Military Personnel Data Files-for present and former Armed Forces



members.



 



     b. Geographic information



 



     Administrative files tend to have limited coded geographic



information.  Some contain a State code, but this was usually



derived from the mailing address.  The only exceptions appear to be



SSA's Master Beneficiary Record file, and the related HCFA Health



Insurance Master File, which contain a county code obtained by



clerically coding the mailing address.  By way of contrast, the



Census geographic data are collected on a residence basis and we



available to the block level.



     This lack of detailed "residence geography" is a major problem



in using administrative records to prepare small area statistics. 



By using the mailing address, subcounty geography may be assigned



with a Geographic Base File developed for use in the 1970 or 1980



census.  However, this presents a number of problems.  First, the



mailing addresses are not always the usual place of residence. 



Second, GBF's do not exist for areas located outside the built up



portion of SMSA's.  Third, people living outside the city limits



tend to report themselves as living in the city if they have a city



post office address.  Fourth, post office delivery or zip code



areas do not conform with political boundaries.  Also, the cost of



assigning geography with a GBF system is high.



     Another approach is to add a residence geographic code to the



administrative file.  This was done for the 1972 and 1975



Individual Master Files so that IRS data could be used in preparing



population and per capita total money income estimates for use in



distributing General Revenue Sharing funds.  The cost of this



straightforward approach makes it unlikely that it will be widely



implemented on other files.



 



     c.   Demographic information



 



     By   comparison with the Census data, all administrative



files contain very limited demographic information.  The Numerical



Identification (SS-5) file does contain sex, date of birth, and



race which have been transferred to the Summary Earnings Record and



the Master Beneficiary Record.  The personnel files also have some



race information.  However, other than this, there is very little



demographic data present.



 



     d. Reporting unit



 



     The Census data are the only data organized into households



and families.  Tax returns, and Social Security claims, however,



can for some purposes be treated as approximations to family units. 



For the most part, however, the units are just individuals with no



potential for structuring them into households.



     One final point.  The survey showed that all the



administrative files for individuals are organized by social



 



                                13



 



security number.  This is distinct from the decennial census files



which do not-have the SSN recorded- BY and large, the SSN is the



major administrative identifier. Obviously, then, it is this



variable which would have to be employed for linkages among the



files-whether for statistical or operational purposes.



 



2.   Files pertaining Mainly to Businesses



 



     The employer identification number is a major identifier on



most of the administrative record files- including even the Census'



Standard Statistical Establishment List. Some other similarities



and differences in the files are:



 



     a. Universe



 



     The file with the largest coverage is the Master Employer Name



Directory with about 27 million records' However, this file is not



current and contains inactive businesses.  The SSEL is the most



comprehensive current list of businesses with the exception of the



very small businesses.  For these businesses, the IRS Business Mas-



ter File is more complete.  The Department Of Agriculture's



Producer name and Address Master File, and their Economics,



Statistics, and Cooperative Service List Sampling Frame have



extensive coverage of the farming sector.



 



     b. Geographic information



 



     As with the individual record systems, them is no subcounty



geography data,present on any of the business files with the



exception of the SSEL.  For businesses, location may have different



meanings.  Most of the geography reported on these files is in



terms of company headquarters and may not refer to the individual



establishment.  Consequently, a reporting of a major geographically



dispersed company at its headquarter's location can introduce a



significant error into the data.



 



     c. Economic data



 



     Number of employees, total payroll, and gross sales seem to be



the most common economic items present on the files.



 



     d. Reporting Unit



 



     The reporting unit of these files is mainly the Employer



Identification Number with the exception of the SSEL.  This creates



a problem in any statistical use of these files because some EIN's



represent only part of a company but an EIN may cover many



establishments.



 



 



              C. Continuous Work History Sample Files



 



     The survey results in the previous section indicate



clearly that individual administrative record files usually do not



contain the comprehensive population coverage and detailed



identification of population characteristics desired for most



statistical analysis.  The results also indicate, however, that it



is often technically possible to overcome some of the limitations



of single administrative files by linking several files and merging



the information contained in these files.  With files pertaining to



individuals the SSN provides the principal basis for linkage and



with business files the EIN is usually the basis for linkage.  Both



the problems and the potential benefits of file linkage we



increased significantly when interagency linkages are considered



(see, for example, the discussion of the Linked Administrative



Statistical Sample in Chapter VI); but highly valuable statistical



files can be developed through intra-agency linkages of



administrative files in such large agencies as IRS and SSA.  The



Continuous Work History Sample program of SSA illustrates well the



problems and potential of such intra-agency file linkages.



     The CWHS program involves the construction of several



statistical sample files from information contained in the SSA



administrative files documented in Tables III.1 and III.2., The 1



percent 1937-to-date CWHS file, for example, involves primarily the



extraction and merger of information from the Summary Earnings



Record and Master Beneficiary Record files documented in Table III.



1. Annual and longitudinal employee-employer CWHS files are



constructed largely by merging detailed earnings items which are



input to the Summary Earnings Record File with industrial and



geographic information obtained from the SSA employer files



documented in Table III.2.



     CWHS files do not contain occupational information for



workers, nor do they contain the detailed socioeconomic



characteristics available in census sample files.  CWHS files do,



however, contain information on worker sex, age, and race; and they



can provide much greater longitudinal detail relating to the



earnings history of workers than is available from any survey



source.  The CWHS program, moreover, has a considerable advantage



over household surveys in obtaining employer information because of



the possibility of direct links between employer and employee



administrative files.  The advantage of direct links between



employer and employee information; however, is offset somewhat by



quality problems associated with the geographic and industrial



coding in SSA employer files (sec Chapter VII).



     Because the CWHS program illustrates well both the potential



and the problems associated with the statistical use of



administrative records. examples of CWHS applications and



deficiencies are presented throughout the report.  Some of the more



detailed references to the CWHS program are included in: (1) the



discussion in



 



                                14



 





     Chapter V of the new joint IRS-SSA system of annual employer



reporting (on Form W-2) of individual worker wages; (2) the



discussion in Chapter VI of the development of the new Linked



Administrative Statistical Sample program; and (3) the discussion



in Chapter VII of technical problems encountered in the statistical



use of administrative records.  To permit the reader to better



follow the references to the CWHS made throughout the report, a



detailed description of the CWHS program and CWHS files is



presented in the second appendix to this chapter.



 



 



   D. The Evolution of Statistical Use of Administrative Records



 



     Chapter IV contains a detailed discussion of statistical uses



of administrative records from the perspective of selected Federal



agencies that make extensive use of administrative records in their



statistical and research programs.  Chapters V and VI then follow



with detailed case studies of selected projects and programs



involving intensive statistical use of administrative records.  To



provide additional background for the chapters on uses, this



section reviews some of the circumstances surrounding t he



evolution of statistical uses of administrative record files



covered in Tables III.1 and III.2.



     The use of administrative records as a source of statistical



information is not a new idea, but the last decade's extensive



computerization of these files has fostered an increasing interest



in the topic.  In fact, there seems to have been a progression in



the employment of administrative records for statistical purposes. 



Initially, with the establishment of an administrative records



system, an agency prepared summaries of the data for guiding their



operations and for policy decisions.  This may be done with the



full data set or a sample.  Its purpose is primarily



administrative, not statistical.  Perhaps IRS is the best example.



What started out as a mainly administrative effort has evolved into



the current Statistics of Income program (see Chapter IV).  While



administrative considerations are still important, the Statistics



of Income sample is used extensively by researchers to study issues



of general statistical and economic interest.



     Administrative records systems were used very early in



evaluation projects such as the evaluation of the 1950 Census



income results using IRS and SSA data (NBER, 1958).  After each



decennial population census since then, there have been attempts to



understand and quantify any error in the results by matching a



small sample of census records to various administrative record



sets such as IRS data (Schneider and Knott. 1973), Medicare data



(U.S. Bureau of the Census 1973c), birth records (U.S. Bureau of



the Census, 1963 and 1973a), death records (Kitagawa and Hauser,



1973), and employment records (U.S. Bureau of the Census, 1965).



These evaluation efforts may be characterized by the relatively



small number of cases involved.  This limit on size is the result



of the objective of the project as well as cost considerations. 



Most evaluation projects involving these Federal files are aimed at



National results only and do not attempt to measure differences at



the State or even regional level. (This is changing, however, for



the 1980 Census Evaluation, the matching will attempt to produce



estimates at the State level-see Chapter VI.)



     With the extensive computerization of administrative files in



the 1960's, the possibilities for expanded statistical uses became



obvious.  For example, IRS completed the computerization of the



Individual Master File with the 1967 file.  Also, over this same



period, there was a great reduction in the cost of computer data



processing and an increase in understanding how to process and



control large data files, thus making the use of these administra-



tive files feasible for statistical purposes.



     These developments and potential uses of administrative



records were understood and debated (Hansen, 1974).  While that



debate cannot be reviewed here, the outcome has been that no



centralization of administrative records has taken place in the



Federal government, but statistical uses of administrative records



have continued.  Some transfer of administrative records between



agencies has been permitted, but each transfer has been justified



and approved on a case-by-case basis (Kilss and Scheuren, 1979). 



Some people feel that this case-by-case approach has retarded the



use of administrative records in developing useful statistical



data, but this has never been fully documented.



     In one sense, survey- and census-based data may be blamed for



the slow development of administrative records-based data.  Up



until recently (and perhaps still), survey- and census-based data



have had a real edge on administrative records in several areas. 



For example, if small area data are needed, the Census of



Population and Housing provides small area data defined completely



and in the "correct" geography (i.e., by residence).  Adminis-



trative records-based data may be able to approximate the needed



data, but not at the same level of accuracy.  It is a question of



trading-off accuracy for currency.  If the need is for national.



regional, or even State data, surveys may be a more efficient way



to obtain needed data than the development of an administrative



records-based system.



     However, with the need for small area data on a regular basis,



the currency and small area advantages of administrative records



may now outweigh the disadvantages of definitional problems and



less accuracy.  For example, with the passage of the State and



Local Fiscal Assistance Act of 1972, the Bureau of the Census was



asked to



 



                                15



 







provide population and per capita total money income data for



38,500 governmental units.  The Bureau accomplished this by using



an extract from the 1969 and 1972 entire IRS Individual Master



File.  This required IRS to collect and clerically code the



residence address of all taxpayers on the 1972 IMF.  The cost of



the first set of estimates. including the IRS coding, was in excess



of $5 million.  This was the first administrative records-based



project of this magnitude and demonstrated the expense and benefit



of administrative records.  It should also be noted that this



successful application of administrative records used



administrative records to measure change since the 1970 census (Fay



and Herriot, 1979).  In this way. the definitional problems were



minimized.



     With the expanded interest in administrative records, them is



now taking place the needed experimentation and research to



understand the particular idiosyncracies of these files.  This



will, hopefully, come to fruition in the 1980's with useful data in



several areas.  For example, migration rates by race can be



computed by linking race from the SSA Summary Earnings File to the



IRS data.  This has been done on a sample basis and State estimates



prepared (Word 1978).  It is expected that this work will continue.



     By using tax returns (or W-2's) to establish a current



residence, and the Form 941 to link an employer to an employee, and



the Master Employer Name Directory (mainly SS-4) to define an



employer's location, current journey-to-work estimates are



possible.  The Bureau of the Census and the Bureau of Economic



Analysis have done some work in this area, so far, however, without



great success.  The problems of multi-establishment employers, low



quality geography coding of employers, etc.. are major obstacles



when trying to estimate the change in a particular journey-to-work



flow. (Chapter VII contains a more detailed discussion of the



problems encountered in the BEA journey-to-work study.)



Currently, the Census Bureau uses IRS adjusted gross income and



wages and salary data to update the 1970 census per capita income



estimates.  By using the age, race, and sex data from the Social



Security Administration, the IRS information could be adjusted for



differential reporting by age, race, and sex.  Updating income size



distribution estimates with IRS data has long been considered



desirable.  The inability to group IRS returns directly into



families or households makes such updating difficult, but synthetic



estimation procedures involving IRS data are being used in the



development of family personal income size distribution estimates



at BEA (see Chapter IV).



     The need for targeted surveys and more sampling efficiency for



small populations will continue to make administrative records



important as a sampling frame.  In the business files, the use of



the business lists as sampling francs may be their single most



important function, either to complete or to stratify a universe



for sampling.



     In summary. the statistical use of administrative records will



continue to grow, but not easily.  The use of administrative



records data in preparing statistics must be preceded by a period



of analysis and experimentation in order to understand the



particular problems inherent in each administrative record system.



 



                        E. Appendix III.1



 



                       The Survey Questionnaire



 



Statistical Use Survey of Records Pertaining to



Individuals, Individual Firms, or Employers Maintain



and/or Mandated by the Federal Government



 



Survey for: Subcommittee on Statistical Uses of Administrative Records



            Federal Committee on Statistis Methodology



            Office of Federal Statistical Policy and Standards



 



Please complete the following questions as applicable. Since this survey 



covers individuals, householdsm and business organizations (firms and 



employers), not all of the questions may pertain to the data file you are



answering the questions about.  If you have any questions concerning the 



survey or concerning a particular question; or need additional copies of the



survey form, please contact Ms. Maria Gonzales on (202) 673-7953.



 



              (Please mark the appropriate category or categories



                      or supply the requested information)



 



1.  What is the name of the file?



    A) General name by which the file is usually called___________________________



    B) Technical or official name if different from 



       the general name_______________________________________________________



2.  What type of documentation exists for the file? 



    __ International Documentation



       __ Not available to anyone outside the agency.



       __ Available on request.



 



20



 



                           16



 





  _ Outside Documentation



    _ None currently prepared.



    _ Available on request.



    _ Not now available, but could be prepared upon request.



3.  What type of documentation is available outside the agency?



  _ Record Layout



  _ File description--technical description



  _ General file description without specific field description



  _ No documentation available outside agency



4.  What type of information is present on the file? The purpose of this 



    question is to obtain a list of the kind of information present on the



    file which might have statistical uses.  You may respond to the 



    appropriate questions below or provide a separate listing of the infor-



    mation on the file.  Is the reporting or filing unit an individual, 



    household, business, or some other unit?



    _ Individual (Answer 4A)



    _ Household, Family, or Other Group of Individuals (Answer 4B)



    _ Business or Employer (Answer 4C)



    _ Other reporting unit (Answer 4D)



 



4A. What kind of information on individuals is present on the file?



 



                                        Please Circle Yes



                                       or No as Appropriate



1) Person's name                           Yes      No



2) Mailing address                         Yes      No



3) Residence address                       Yes      No



4) Has the address been assigned           Yes      No



   a geographic code? If yes, what



   level of geography are present?



     State                                 Yes      No



     County                                Yes      No



     Place                                 Yes      No



     Other, please specify__________



5) Race--If yes, what are the cate-        Yes      No     



   gories?



6) Spanish or oher ethnic origin de-



   signation--If yes, what are the 



   categories? ____________________        Yes      No



7) Date of birth or age                    Yes      No



8) Sex                                     Yes      No



9) Marital Status--If yes, what are 



   the categories?__________________       Yes      No



10) Income--If yes, what are the           Yes      No



    types of income present?________



11) Person's family or household in-  



    come--If yes, please specify type.



12) Social Security or Railroad Retire-



    ment Number                            Yes      No



13) Is the person's employer identified?   Yes      No



    If yes, is the employer's Empoly-



    er Identification Number present



14) Is the person's occupation identi-     Yes      No



    fied? 



15) Is the person's occupation identi-     Yes      No



    fied?



16) Level of education or technical        Yes      No



    skill



17) Place of birth or foreign country      Yes      No



    of birth



18) Information on person's health or      Yes      No



    disability--If yes, please specify



    __________________________________



19) Other relevant statistical informa-    Yes      No



    tion --If yes, please specify_____



 



4B. What kind of information on a household, family, or other group



    of individuals is present on the file?



 



                                        Please Circle Yes



                                       or No as Appropriate



1) Person's name                           Yes      No



2) Mailing address                         Yes      No



3) Residence address                       Yes      No



4) Has the address been assigned           Yes      No



   a geographic code? If yes, what



   level of geography are present?



     State                                 Yes      No



     County                                Yes      No



     Place                                 Yes      No



     Other, please specify__________



5) Household or family size                Yes      No



6) Each household or family member         Yes      No



   identified



7) Household or family income              Yes      No



 



The following questions apply to the household or familly head or



primary applicant.



 



8) Date of birth or age                    Yes      No



9) Sex                                     Yes      No



10) Race--If yes, what are the cate        Yes      No



     gories? ______________________



11) Spanish or other ethnic origin des-    Yes      No



    ignation--If yes, what are the



    categories? ___________________



12) Social Security or Railroad Retire-    Yes      No



    ment Number



 



4C.  What kind of information on business organizations or employers



is present on this file?



                                                Employer         Other please



                 Company or      Establish-     Identification  specify in the



                 Enterprise         ment        Number (EIN)   Remark section



                 ___________________________________________________________________



The file is 



organized by        



(please check           ß              ß            ß                 ß



the correct):       



 



1) Name             Yes     No     Yes     No    Yes     No        Yes     No



2) Address          Yes     No     Yes     No    Yes     No        Yes     No



3) Location code    Yes     No     Yes     No    Yes     No        Yes     No



for establishment 



or other report-



ing unit



                                21



 



 



 



 



 



4C.  What kind of information on business organizations or employers



     is present on this file? (Continued)



 



                                                            Employer         Other please



                            Company or      Establish-     Identification  specify in the



                             Enterprise         ment        Number (EIN)   Remark section



                 ___________________________________________________________________



 



4) Number of employees--       Yes     No     Yes     No    Yes     No        Yes     No



 If yes, as of what 



  date?_________________



5) Total payroll               Yes     No     Yes     No    Yes     No        Yes     No



    Annually                   Yes     No     Yes     No    Yes     No        Yes     No



    Quarterly                  Yes     No     Yes     No    Yes     No        Yes     No



6) Primary industry-- if yes   Yes     No     Yes     No    Yes     No        Yes     No



   what industry coding



   system is used?  for 



   example, 4 digit SIC,



   2 digit SIC, etc.



   ______________________



   ______________________



   ______________________



7) Secondary industry          Yes     No     Yes     No    Yes     No        Yes     No



8) Gross sales or receipts     Yes     No     Yes     No    Yes     No        Yes     No



9) Product description         Yes     No     Yes     No    Yes     No        Yes     No



10) Amount and description of  Yes     No     Yes     No    Yes     No        Yes     No



    capital base, total invest-



    ment in plant and equip-



    ment



11) What other items of statistical interest are available? Please list 



    in Remarks section below.



 



4D. What kind of information is available for the "other reporting unit?"



    Please specigy the kind of information present on the file for the "other



    reporting unit" in the space provided below.



5.  What are the applications or forms which the data are derived? If



    possible, include the OMB (or other) form number.



6.  Briefly describe the process by which this information is obtained



    from the individual or business(firm, employer) and procesed



    to the data file being described.



7.  What is the purpose of the file?  If the purpose is to meet specific 



    legislative requirements, please include a citation for applicable



    Federal law agency regulation, or agency requirement.



8. a) Is the file a computerized version of a "paper system?"



               Yes         No



   b) What year was the file first created?________________________



   c) Has the file been expanded or has the data on the file



      changed significanlty over its history?    Yes        No



        If yes, please explain how.



9. How many individuals or businesses are represented on the file?



   (An approximate number only.) __________________________________



10. What are the restrictions on the use of file?



    a) Legal Restrictions--



    b) Administrative Restrictions--



    c) Other Restrictions--



11. If either the SSN or EIN are present on the data file, what is their



    purpose?



12. Is the file currently being used for statistical purposes?



        Yes     No



    For example: Is the file used as a sampling frame for any surveys?



    Are tabulations prepared from the file that are used for statistical



    purposes?



    Please briefly describe any statistical uses of the data file.



13. How often are data collected and updated for this file?



        Collected                            Updated



        _ One time only                  _ As needed



        _ Annually                       _ Annually



        - Quarterly                      _ Quarterly



        _ Other, please specify          _ Other, please specify



 



14.  Please provide the name, address, and telephone number of a person



     who could answer questions concerning the data file (this persons 



     need not be the same person who answers this survey). 



        Name:  ___________________________________



        Address:  ________________________________



                  ________________________________



        City and State:___________________________



                       ___________________________



        Zip Code: ________________________________



        Telephone Number:  _______________________



15. Name and telephone number of person who completed this survey 



    if different from above.



        Name:  ___________________________________



        Telephone Number:  _______________________



       



 



 



                                22



 



 



 



 



 



                         F. Appendix III.2



 



                       The CWHS Data System



 



     The Continuous Work History Sample is a system of general



multipurpose statistical data files designed primarily for



socioeconomic research.  The system consists of samples of records



of individuals with employment covered by social security. 



Earnings, employment and benefit data for the individual along with



personal characteristics and employer characteristics are



maintained at varying degrees among five basic data files and two



special files that are produced in the CWHS system.



     This appendix describes: (1) the data sources for the CWHS



system; (2) the procedures used to construct the administrative



data files underlying the system; (3) the procedures used to create



statistical files from the records in the administrative files; (4)



the sample design used for the system; and (5) the principal data



elements in each of the five basic CWHS files.  The discussion



refers to data and procedures predating the start of annual wage



reporting in 1979 (for calendar year 1978).  A discussion of the



new annual reporting system is presented in Chapter V. And Chapter



VII contains considerable discussion of the limitations of CWHS



data.



 



1. Data Sources



 



     Data for the CWHS are obtained from records derived from



reporting and informational forms and applications used in



administering the retirement, survivors and disability programs of



the Social Security Administration.  The date of birth. sex and



race of the person is obtained from the Application for a Social



Security Number (Form SS-5).  Geographic and industry information



is obtained from the employer's Application for an Identification



Number (Form SS-4) and other related forms that are used



periodically to update this information (Form OAA-100, OAA-103 and



SSA-5019).  Initially, employers are assigned geographical and



industry classifications based on the location and nature of



business information sup. plied on the Form SS-4.  Information that



is not satisfactorily reported on the SS-4 is obtained through the



supplemental forms OAA-100 and OAA-103.



     Employers who operate more than one place of business and have



a total of 50 employees with at least six in a separate location



are asked to use the Establishment Reporting Plan.  Under this plan



the employer gives SSA- a list showing the location. industrial



activity and approximate number of employees of each establishment. 



On subsequent wage reports the employer groups his employees by



establishment, identifying each group with a  preassigned



establishment number.  The arrangement allows SSA to properly



classify the employees according to geography and industry.



     Data on earnings and employment are derived from various



reporting forms submitted by employers and self-employed persons. 



Prior to 1978, with the advent of annual wage reporting, taxable



wages of employees were reported quarterly by regular employers on



Form 941, household employers on Form 942, and State and local



government employers on Form OAR-S3.  Farm employers report



annually on Form 943 and self-employed persons use Schedule SE of



Form 1040 to report annually. (Refer to Chapter V for a discussion



of the new annual reporting system).



     Claims and benefits information is obtained from applications



and forms that are completed in the process of filing for and



determining entitlement to benefits.



 



2.   Processing Procedures--Administrative Records



 



     The demographic information (date of birth, sex and race)



furnished by the applicant on the Form SS-5 is extracted after the



social security number has been issued.  This information is



maintained on magnetic tape in a master file called the Summary



Earnings Record (see Table III.1).  This is the record in which the



lifetime earnings and quarters of coverage of the individual is



recorded for use in determining entitlement to benefits and



calculating benefit amounts at the time a claim for benefits is



made.



     The information supplied by the employer on the Form SS-4,



relating to the location and nature of his business, is manually



coded with geographic and industry codes.  This information is key



punched and maintained on magnetic tape in a master file of



employers called the Employer Identification file (see Table



III.2). Additionally, the information supplied on Form SSA-5019 by



multi-unit employers using the Establishment Reporting Plan per-



taining to the location and nature of business of each separate



reporting unit, is also manually coded with geographic and industry



codes and maintained in the EI file.



     The earnings data that are reported by employers are received



and processed at SSA in a variety of ways.  Hand filled paper forms



that meet certain criteria are optically scanned to produce a



machine-readable record, while others are keypunched.  Some



employers, usually having a large number of employees, report



directly on magnetic tape.  The reports of self-employed persons



are received directly from the Internal Revenue Service on magnetic



tape.  After all of the earnings data is in machine-readable form



with appropriate identifying information, the tapes enter a



computer balancing operation in which each page of each report is



checked to see that the wage items balance to the page totals



provided by the employer.  Out



 



                                23



 



 



 



 



 



of balance items are investigated and corrective action taken. 



Balanced items are passed on to an operation where individual items



are sorted in social security number sequence and then matched to



the Summary Earnings Record on number and the first six letters of



the surname.  Earnings amounts are added to the summary records



where complete matches occur.  Unmatched records are rejected for



further investigation and processing.



     Prior to annual reporting, this processing occurred at regular



intervals four times during the year.  It generally takes about 9



months after the end of reference period to receive, process and



update the summary earnings records with virtually all of the items



for that period.



     Claims for social security benefits are filed in local social



security district offices.  Requests for earnings records and



benefit computations are made by the district offices to SSA



headquarters.  After the earnings record is located, benefit



computations are made and documentation of the claim is prepared



and forwarded to the requesting office where the claim is developed



and forwarded to program service centers for benefit authorization. 



Upon authorization of benefits, the program service center sends a



notification of award to headquarters where a new beneficiary



record is established in the Master Beneficiary Record file (see



Table III.1).  Changes to records in the beneficiary file are made



through reports by the district office or program center.  The



Master Beneficiary Record file is used in the preparation of



monthly social security benefit check records which are forwarded



to the Treasury Department for payment.



 



3. Processing Procedures-Statistical Records



 



      Once a year after the Summary Earnings Record has been



updated with virtually all of the prior year's earnings, a 1



percent sample (based on specified digits of the social security



number) is extracted.  This file becomes the foundation for



producing the 1 percent 1937-to-date CWHS.  It is used along with



the prior year's CWHS, a 1 percent sample extracted from the Master



Beneficiary Record file, and miscellaneous correction files to



generate the required data elements for the current year's 1



percent CWHS.



     At the same time that earnings data for the current processing



period are posted to the Summary Earnings Record, the 1 percent



sample of earnings items records are written off separately on



magnetic tape.  The items are accumulated until all four quarters



of the year have been processed.  They are then summarized into one



record for each employee-employer-establishment combination with



quarterly earnings amounts maintained separately.  The resulting



records are matched to the Employer Identification file and



geographic and industry codes are inserted.  They are then



resummarized to an employee-employer level.  Cases having



employment with more than one establishment of the same employer



are assigned to the unit having the most activity in terms of



quarters of employment.  A match is the n made to a special extract



from the 1 percent sample 1937-to-date CWHS containing date of



birth, sex and race codes.  These personal characteristics are



inserted into the record to form the final 1 percent Sample Annual



Employee-Employer file.



     Another file of the earnings items that are posted to the



Summary Earnings Record, previously referred to, is written off



separately for another type of processing.  This is a 0.1 percent



sample and is a subset of the 1 percent sample.  These records are



accumulated over the same time period as the 1 percent sample



records and are processed along with the prior year's 0.1 percent



basic file and a special 0.1 percent write off of certain data



items from the current year's 1 percent CWHS file to create the



current year's 0.1 percent 1937-to-date CWHS.



     Information for self-employed persons. coming from the



Schedule SE of the Form 1040, is submitted to SSA from IRS directly



on magnetic tape.  After initial processing of these records in



order to properly credit and post earnings to the Summary Earnings



Record, the 1 percent sample records in this file are written off



for statistical processing.  In subsequent computer operations IRS



industry codes that are in the original record are converted to SSA



industry codes and addresses are converted to geographic codes



through a special coding file that utilizes Zip code and place



names.  Correspondence is generated for cases with missing and/or



incomplete information asking for the required data.  The final



resulting file from these operations is the 1 percent Sample Annual



Self Employed file.



     In addition to the regular statistical processing described



above, in recent years special processing has been done to generate



two additional files; the First Quarter Employee-Employer-



Establishment files for the 1 percent sample and a special 10



percent Sample First Quarter Employee-Employer-Establishment file. 



Processing for these files is similar to processing for the Annual



Employee-Employer files except that it is done after all first



quarter receipts have been received and posted to the summary



earnings record.  Record contents are virtually the same as the



annual except that only first quarter data are included.  The 1



percent first quarter files have been prepared for the years 1970-



76, while the 10 percent first quarter files have been produced for



the years 1971, 1973, and 1975.



 



4. Sample Design



 



     The population from which the CWHS is selected consists of the



one billion possible nine-digit social security



 



                                24



 



 



 



 



 



numbers.  These numbers have the following digital arrangement:



 



 Area in which



    number



   assigned          Group number       Serial number



(three digits)       (two digits)       (four digits)



     XXX                XX                   XXXX



 



In the issuance of social security numbers, each State is assigned



one or more area numbers with the exception of a special block of



numbers assigned prior to August 1963 to persons covered under the



Railroad Retirement Act.  Each State number, in combination with a



given group number defines a stratum.  The population assigned



social security numbers is thus stratified geographically (by place



of application for social security number) and chronologically (by



the process of assigning these numbers).  Each number is an el