| Federal
Committee on Statistical
Methodology Office of Management and Budget |
FCSM
Home ^ Methodology Reports ^ |
Statistical Policy Working Paper 6 - Report on Statistical Uses of Administrative Records
Click HERE for graphic. Statistical Policy Working Paper Report on Statistical Uses Of Administrative Records Prepared by Subcommittee on Statistical Uses of Administrative Records Federal Committee on Statistical Methodology U.S. DEPARTMENT OF COMMERCE Philip M. Klutznick, Secretary Luther H. Hodges, Jr., Deputy Secretary Courtenay M. Slater, Chief Economist Office of Federal Statistical Policy and Standards Joseph W. Duncan, Director Issued: December 1980 Statistical Policy Working Papers are a series of technical documents prepared under the auspices of the Office of Federal Statistical Policy and Standards. These documents are the product of working groups or task forces, as noted in the Preface to each report. These Statistical Policy Working Papers are published for the purpose of encouraging further discussion of the technical issues and to stimulate policy actions which flow from the technical findings and recommendations. Readers of Statistical Policy Working Papers are encouraged to communicate directly with the Office of Federal Statistical Policy and Standards With additional views, suggestions, or technical concerns. Office of W. Duncan Federal Statistical Director Policy and Standards For sale by the Superintendent of Documents, U.S. Government Printing Office Washington, D.C. 20402 Office of Federal Statistical Policy and Standards Joseph W. Duncan, Director Katherine K. Wallman, Deputy Director, Social Statistics Gaylord E. Worden, Deputy Director, Economic Statistics Maria E. Gonzalez, Chairperson, Committee on Statistical Methodology Preface This working paper was by the members of the Subcommittee on Statistical Uses of Administrative Records, Committee on Statistical Methodology. The Subcommittee was chaired by Daniel H. Garnick, Bureau of Economic Analysis, Department of Commerce. The members of the subcommittee are the authors of this report, their names are listed below. The first portion of this report provides a review of major administrative report files pertaining to individuals and to businesses. Major statistical uses of administrative records are outlined, including: (1) direct use of the records to obtain statistics and to supplement existing data via expanding coverage or content; and (2) technical uses of the data for constructing sampling frames, quality control, improving on procedures, and data evaluation. New developments in data from business establishment reporting and a number of potential uses of administrative records for data linkage are described. Technical problems in the statistical use of administrative records, including coverage, comparability, error and timing of data are discussed. the final section of the report covers various in accessing administrative records for statistical purposes. While much statistical use of administrative records is currently made in Federal agencies, this report is intended to inform managerial and technical staffs of the vast potential as well as difficulties entailed in augmenting current uses of administrative records for statistical purposes. The Office of Statistical Policy and Standards hopes to organize, with the help of Subcommittee members, seminars with Federal employers to disseminate the findings of this report. The implementation of the recommendations in report will be explored by the Office of Statistical Policy and Standards. Members of the Subcommittee on Statistical Uses Of Administrative Records. (June 1980) Daniel H. Garnick* (Chair) Bureau of Economic Analysis (Commerce) Lois Alexander Social Security Administration (HHS) Paul A. Armknecht Bureau of Labor Statistics (Labor) David V. Bateman Bureau of the Census (Commerce) Lawrence A. Blum Bureau of the Census (Commerce) Warren L. Buckler Social Security Administration (HHS) David W. Cartwright Bureau of Economic Analysis (Commerce) John DiPaolo Internal Revenue Service (Treasury) Maria E. Gonzalez* (ex officio) Office of Federal Statistical Policy & Standards (Commerce) John A. Gorman Bureau of Economic Analysis (Commerce) David A. Hirshberg Small Business Administration Beth A. Kilss Social Security Administration (HHS) J. Knott Bureau of the Census (Commerce) Bruce Levine Bureau of Economic Analysis (Commerce) Nash J. Monsour Bureau of the Census (Commerce) Allan Olson Economic Development Administration (Commerce) Elizabeth H. Queen Bureau of Economic Analysis (Commerce) Vernon Renshaw Bureau of Economic Analysis (Commerce) Fritz J. Scheuren* Social Security Administration (HHS) Daniel F. Skelly Internal Revenue Service (Treasury) Hyman Steinberg U.S. Postal Service Additional Contributors to the Report on Statistical Uses of Administrative Records Jeanne E. Griffith Office of Statistical Policy and Standards (Commerce) Daniel Kasprzyk Assistant Secretary for Planning and Evaluation (HHS) Susan Miskura Bureau of the Census (Commerce) * Member, Committee on Statistical Methodology ii Acknowledgments The body of this report is the collective effort of the Subcommittee on Statistical Uses of Administrative Records. Although the subcommittee members reviewed and commented on all parts of this report, specific individuals were responsible for preparing the various sections. In the case of Chapter VI, the subcommittee benefitted from the expertise and contribution of several additional persons in preparing the case studies. The authors of the chapters appear below: Chapter Authors I Daniel Garnick, Maria Gonzalez, Vernon Renshaw, Lois Alexander, David Hirschberg, Fritz Schuren II Vernon Renshaw, David Hirschberg, Daniel Garnick III Joseph Knott, Lawrence Blum, Waken Buckler, Vernon Renshaw, Fritz Scheuren IV Vernon Renshaw, David Cartwright, Nash Monsour, Lawrence Blum, John Gorman, Daniel Skelly, John DiPaolo, Warren Buckler, Elizabeth Queen V Lawrence Blum, Paul Armknecht, Warren Buckler, David Cartwright, Vernon Renshaw VI Fritz Scheuren, Beth Kilss, Jeanne Griffith, Daniel Kasprzyk, David Bateman, Sue Miskura, Maria Gonzalez VII David Cartwright, Vernon Renshaw, Bruce Levine, Warren Buckler, Fritz Scheuren VIII Lois Alexander Maria Gonzalez worked with the subcommittee throughout its two-year study. Members of the Federal Committee on Statistical Methodology and the Office of Statistical Policy and Standards provided additional assistance and encouragement. Critical reviews of earlier draft versions by Thomas Jabine, Barbara Bailar, and Tore Dalenius were particularly helpful in the development of this report. Discussion by Richard Ruggles on papers by Daniel Garnick and Joseph Knott, David Cartwright and Paul Armknecht, David Hirschberg and Vernon Renshaw, and Lois Alexander at the Statistical Uses of Administrative Records Session of the 1979 American Statistical meetings aided in sharpening the focus of this report. Others who contributed to the work of the Subcommittee include: Yoshio Akiyama, Leroy Bailey, Robert Berney, J. Robert Brown, Morris M. Kleiner, Lillian Madow, Harriet Orcutt, and Max Shor. iii Members of the Federal Committee on Statistical Methodology (June 1980) Maria Elena Gonzalez (Chair) Office of Federal Statistical Policy and Standards (Commerce) Barbara A. Bailar Bureau of the Census Norman D. Beller Economics, Statistics, and Cooperatives Service (Agriculture) Barbara A. Boyes Bureau of Statistics Edwin J. Coleman Bureau of Economic Analysis (Commerce) John E. Cremeans Bureau of Economic Analysis (Commerce) Marie D. Eldridge National Center for Education Statistics (Education) Daniel H. Garnick Bureau of Economic Analysis (Commerce) Thomas B. Jabine Energy Information Administration (Energy) Charles D. Jones Bureau of the Census (Commerce) William E. Kibler Economics, Statistics, and Cooperatives Service (Agriculture) Alfred D. McKeon Bureau of Labor Statistics (Labor) Raymond C. Sansing Internal Revenue Service (Treasury) Fritz J. Scheuren Social Security Administration (HHS) Lincoln E. Moses Energy Information Administration (Energy) Monroe G. Sirken National Center for Health Statistics (HHS) Wray Smith Office of the Assistant Secretary for Planning and Evaluation (HHS) Thomas G. Staples Social Security Administration (HHS) iv Table of Contents page Preface i Acknowledgments iii List of Figures ix List of Tables x Abbreviations xi Chapter I. Findings and Recommendations 1 A. Statistical Standards 1 B. Access 2 C. Other Government-Wide Program Coordination and Support 3 Chapter II. Introduction and Summary 5 A. Introduction 5 B. Summary 6 1. Chapter III 6 2. Chapter IV 7 3. Chapter V 8 4. Chapter VI 8 5. Chapter VII 8 6. VIII 9 Chapter III. Major Administrative Record Files 11 A. Scope of Study and Survey Conducted 11 1. Scope of Study 11 2. Survey Conducted 12 B. Survey Results 12 1. Files Pertaining Mainly to Individuals 12 a. Universe 12 b. Geographic Information 17 c. Demographic Information 17 d. Reporting Unit 17 2. Files Pertaining Mainly to Businesses 18 a. Universe 18 b. Geographic Information 18 c. Economic Data 18 d. Reporting Unit 18 C. Continuous Work History Files 18 D. The Evolution of Statistical Uses of Administrative Records 19 E. Appendix III.1 The Survey Questionnaire 20 F. Appendix III.2 The CWHS Data System 23 1. Data Sources 23 2. Processing Procedures - Administrative Records 23 3. Processing Procedures - Statistical Records 24 4. Sample Design 24 5. Data Files 25 a. One percent Sample Annual Employee-Employer (Ee-Er) File 25 b. One percent Sample Annual Self-Employed (SE) File 25 v page c. One percent Sample Longitudinal Employee- Employer Data (LEED) File 25 d. One percent 1937 to Date CWHS File 26 c. One-Tenth of One percent 1937 to Date CWHS File 26 Chapter IV. Major Statistical Uses of Administrative 27 A. Defining Administrative and Using Them Statistically 27 B. Internal Revenue Service 28 C. Social Security Administration 29 D. Bureau of Economic Analysis 29 E. Census Bureau 32 1. Economic Censuses 32 2. Census Of Agriculture 33 3. Survey of Minority-Owned Businesses (SMOBE) 33 4. Current Economic Indicators 33 5. The Standard Statistical Establishment List 34 F. The Small Business Administration 34 G. Appendix IV.1 Data from IRS and SSA 35 1. Data from IRS 35 2. Data from SSA 39 Chapter V. Developments in Data from Business Establishment Reporting 43 A. Standard Statistical Establishment list 43 1. File Construction 44 2. Multiestablishment Firms 44 3. Single Establishment Firms 44 4. File Maintenance 45 5. Confidentiality 45 B. W-2 and W-3 Records 45 C. Unemployment Insurance System 47 1. Master List of Employers 47 2. Employers' Quarterly Tax Report 47 3. Individual Wage Records 49 4. Improving Data Quality 49 Chapter VI. Potential Uses of Administrative Records for Data linkages: Selected Case Studies 51 A. Introduction 51 B. Case Study 1: Linked Administrative Statistical Sample (LASS) Project 51 1. Background and Initial Project Goals 52 a. LASS Data Elements 52 b. LASS Goals 53 2. Pilot Activities and Feasibility Issues 53 a. Resolving Privacy Concerns 53 b. Examining SSA-NCHS Death Reporting Differences54 c. Adding Data From Death Certificates to the CWHS 54 d. Usability of IRS Occupation Information 54 c. Upgrading CWHS Industry and Place of Work Data55 f. Evaluating W-2 Residence Data 55 3. Operational Implementation Issues 56 4. References 56 C. Case Study 2: The Use of Administrative Records in the Survey of Income and Program Participation 57 1. Objectives and Description 58 vi page a. Site Research 58 b. 1978 Panel 59 c. 1979 Panel 61 2. Major Difficulties 61 3. Uses of Administrative Files 62 4. Quality of Results 63 5. Bibliography 63 D. Case Study 3: Use of IRS/SSA/HCFA Administrative Files for 1980 Census Coverage Evaluation 64 1. Introduction 64 2. Objectives of the Program to Estimate the Census Undercount 64 3. Matching Techniques 65 a. Matching of Survey Housing Unit and Person Records to Census Records 66 b. Matching of CPS and Census Enumerated Housing Unit and Person Records to Administrative File Records 66 4. Administrative matching 66 5. Research Conducted for Match Study 67 a. 1978 CPS/IRS Match Study 67 b. IRS Census Match Study (Involving Richmond Virginia and Southwest Colorado Dress Rehearsal Censuses) 68 6. Estimation 68 7. Anticipated Cost and Timing of Administrative Match Study 70 8. References 70 E. Case Study 4: Record Linkage in the Nonhousehold 70 1. Introduction 70 2. Results from the Travis County, Texas and Camden New Jersey Pretest 71 3. Plans for the 1980 Census Nonhousehold Sources 75 4. Summary and Future Considerations 76 5. Sources of Further Information 77 6. References 77 7. Appendix-Matching Instructions 78 F. Concluding Comments 78 Chapter VII. Technical Problems in the Statistical Use of Administrative Records 81 A. Coverage 81 B. Comparability 83 C. Reporting and Processing Errors 85 1. Reporting Problems 86 2. Processing Problems 86 3. Extent of Errors 97 4. Related Problems with Other Data 88 5. Errors in Other Information 98 D. Problems with Timing of the Data 89 E. Conclusion 89 Chapter VIII: Legal Issues in the Statistical Use of Administrative Records 91 A. Legal and Administrative System 91 1. Factors Precipitating the Shift Toward Greater Statistical Use of Administrative Records 91 vii page 2. Concept of Functional Separation 92 3. A Language Framework for Legal Issues 93 4. Options: Legislative Approaches to Functional Separation 95 B. Dynamics of Functional Separation 96 1. Dimensions and Characteristics of the Legal Framework 96 a. Disclosure Within the Agency, a Broader View 96 b. Disclosure to Agency Contractors 97 c. Disclosure Among Federal Agencies 98 d. Use By Non-Statisticians of Statistical Files Compiled From Administrative Source Records 99 2. A Closer Look At Some Federal Statutes Affecting Statistical Use of Administrative Records and Protection of Statistical Records from Nonstatistical Use 100 C. Summary and Directions for the Future 102 D. Notes and References 102 References 104 viii List of Figures Figure page III.1 Major Administrative Files Surveyed by the Subcommittee on the Statistical Uses of Administrative 11 V.1 Forms W-2 and W-3 46 V.2 Statistical Uses of Unemployment Insurance Administrative Records from Establishments 48 VI.4.1 Nonhousehold Sources Worksheet to Search Census Records for Selected Person: 1976 Census of Travis County,Texas72 VI.4.2 Nonhousehold Sources Census Record Search and Telephone Follow-up Verification Record: 1976 Census of Camden, New Jersey 75 VI.4.3 Nonhousehold Sources Record: 20th Decennial Census- 1980 76 ix List of Tables Figure Page III.1 Major administrative Record Systems Pertaining to Individuals 12 III.2 Major administrative Record Systems Pertaining to Businesses 15 IV.1 National Income and Product Account Components Based on Administrative Records 30 IV.2 Input-Output Account Industry Estimates Based on Administrative Records 30 IV.3 Balance of Payment Account Components Based on Administrative Records 31 IV.4 National Income and Product Account Components Based on Current Surveys Using Administrative Record Based Sampling Frames 31 VI.2.1 Distribution of Site Research Sample Households by Sample Frame and Questionnaire Type 58 VI.2.2 Distribution of Site Research Adult Respondents by Sample Frame and Questionnaire Type 59 VI.2.3 A Sampling of AFDC Matching Results in the Site Research Survey 60 VI.2.4 SSI Match Results for the 1978 Panel 60 VI.3.1 Forming a Dual-System Estimate for One of the 61 Divisions 69 VI.4.1 Camden Match Results 73 VI.4.2 Cross Tabulation of Age Reported on Drivers Licenses and Census Questionnaire (Camden, New Jersey) 74 VII.1 Comparison of Employment Estimates: CWHS, Census, UI, and CBP 84 x Abbreviations AFDC Aid to Families with Dependent Children BEA Bureau of Economic Analysis BEOG Basic Education Opportunity Grant BLS Bureau of Labor Statistics BMF Business Master File (of IRS) CAB Civil Aeronautics Board CBP County Business Patterns CofC Comptroller of the Currency CES Current Employment Statistics CETA Comprehensive Employment and Training Act CPS Current Population Survey CWBH Current Wage and Benefit History CWHS Continuous Work History Sample ED Enumeration District EEOC Equal Employment Opportunity Commission EI(N) Employer Identification (Number) ERP Establishment Reporting Plan FAA Federal Aviation Administration FCC Federal Communications Commission FDIC Federal Deposit Insurance Corporation FICA Federal Insurance Contributions Act FOIA Freedom of Information Act FPC Federal Power Commission FRB Federal Reserve Board FTC Federal Trade Commission GAO General Accounting Office GBF Geographic Base File HCFA Health Care Financing Administration HHS Department of Health and Human Services HEW (Department of) Health, Education, and Welfare ICC Interstate Commerce Commission IMF Individual Master File (of IRS) I-O Input-Output IRS Internal Revenue Service ISDP Income Survey Development Program LASS Linked Administrative Statistical Sample LTS Labor Turnover Statistics NCEUS National Commission on Employment and Unemployment Statistics NCHS National Center for Health Statistics NCI National Cancer Institute NIPA National Income and Product Accounts OASDI Old Age, Survivors, and Disability Insurance OES Occupation Employment Statistics OFSPS Office of Federal Statistical Policy and Standards OMB Office of Management and Budget xi OPM Office Of Personnel Management ORS Office of Research and Statistics (of SSA) OSHS Occupation Safety and Health Statistics PES Post Census Enumeration Survey PPSC Privacy Protection Study Commission REA Rural Electrification Administration RFP Request for Proposal SBA Small Business Administration SER Summary Earnings Record SESA State Employment Security Agency SIC Standard Industrial Classification SIPP Survey of Income and Program Participation SMD Statistical Methods Division (of Census) SMOBE Survey of Minority Business Enterprises SMSA Standard Metropolitan Statistical Area SOI Statistics of Income SSA Social Security Administration SSEL Standard Statistical Establishment List SSI Supplemental Security Income SSN Social Security Number SSR Supplemental Security Record SUAR Statistical Uses of Administrative Records TCMP Taxpayer Compliance Measurement Program UI Unemployment Insurance USDA United States Department of Agriculture xii CHAPTER I Findings and Recommendations Statistical use of administrative records grew rapidly during the 1970's, in large part as a response to legislative requirements for timely data to use in the distribution of Federal funds to State and local governments. The principal reason for increasing reliance on administrative records for statistical data is the availability of administrative records which can be used to obtain small area data at minimal cost and without increasing respondent burden. And cost is likely to be an increasingly important factor in the statistical use of administrative records in the 1980's. Although statistical use of administrative records is growing, many unanswered questions remain concerning the quality of statistics derived from administrative records. From a statistical point of view, the standards of quality and consistency in administrative data collection and processing programs are frequently inadequate. Difficulties in accessing administrative records, moreover, often inhibit the efficient joint use of particular administrative record sets with other administrative and statistical records in meeting statistical needs. Improved statistics from administrative records will require modification in data collection and processing procedures, modification of laws and administrative procedures relating to access to records, and increased resources for evaluating and upgrading the quality of administrative records for statistical use. While the costs of improving administrative records for statistical applications can be significant, they will often be substantially less than alternatives requiring expanded censuses and surveys. And in many instances both administrative and statistical programs could benefit from reduced respondent burdens and data processing costs obtainable by applying more efficient statistical tools in the collection and use of administrative records. To solve problem impeding efficient statistical use of administrative records, coordinated treatment of a variety of interagency issues is needed to serve as a counterweight to the decentralized operations of Federal information collection programs. In addressing these issues, the Subcommittee on Statistical Uses of Administrative Records has divided its recommendations into dim sections concerned with: A. Identifying and formulating solutions for common problems related to statistical standards for administrative information programs. B. Identifying and meeting various problems related to access to administrative record systems. C. Identifying collection programs and research activities requiring government-wide coordination and support. Individual recommendations are in some cases accompanied by examples of subcommittee findings which illustrate the need for the recommendation. A. Statistical Standards There is a need for greater standardization in the procedures for collecting and presenting data based on administrative records in order to provide a basis for reducing duplicate collection efforts and improving the quality and consistency of the information that is collected. Recommendation 1. Common identifiers should be used whenever possible in collecting information Pertaining to the sow individuals or organizations. The capability for linking information from a variety of sources is central in making efficient statistical use of administrative records. This capability depends on both appropriate access to administrative records (see Section B) and consistency among administrative and statistical agencies in procedures for identifying respondents or reporting units. The subcommittee noted, for example, that household surveys could be used more effectively in conjunction with administrative records if social security numbers and related identifying information were collected in selected surveys. This would permit linking detailed socioeconomic information from surveys with longitudinal records from administrative sources concerned, for example, with employment or medical histories. Such linkages are performed in various areas of social research including specialized fields such as epidemiology. In business data collection programs, employer identification numbers should be supplemented with a common set of identifiers for the individual establishments of large businesses. Selected administrative record data for multi-establishment businesses could then be linked more readily to economic census and survey data for purposes of improving geographical and industrial analysis of economic activity 1 Recommendation 2. The quality of administrative records to be used for statistical purposes should be evaluated systematically to determine the appropriateness of the records for the proposed use. The quality of administrative record files, including such factors as the type and quality of identification on the file and the completeness, definitional suitability, and quality of individual or organizational characteristics on the file. will determine the appropriateness of the use of the files for particular statistical applications. For example, in matching applications the completeness of the coverage of the administrative record files and the accuracy of identifiers will determine whether a high match rate will be achieved. Similarly, in such applications as the distribution of Federal funds to State and local governments. completeness and accuracy of administrative records, will determine the extent to which estimates derived from these records may serve as complements as well as substitutes for census and survey data. Recommendation 3. Consistent procedures should be used in administrative and statistical data collection efforts for defining reporting units, identifying and coding reporting unit characteristics, and developing standards for data tabulation. When common reporting units are not appropriate there should still be efforts to ensure that the more detailed reporting unit breakdowns of one program can be readily combined into more aggregative units used in other programs. The subcommittee noted, for example, a lack of congruity in the definition of companies filing corporate income tax returns and companies reporting for statistical Purposes to the Census Bureau. The subcommittee also found a particularly serious problem of inconsistency between "establishment" reporting plans associated with administrative programs and the definitions of establishments of multiunit companies used in the Census Bureau's Standard Statistical Establishment List. The Social Security payroll tax program, for example, involves a voluntary establishment reporting plan with company self-identification of reporting units on a basis differing from SSEL definitions. The need for consistent reporting requirements that eliminate duplicate and other unnecessary reporting is highlighted by the fact that the compliance of large companies with the SSA establishment reporting plan and other voluntary statistical programs has been deteriorating in recent years. Problems of inadequate procedures for coding reporting unit characteristics have been emphasized by the subcommittee in such areas as geographic coding and the industrial coding of business establishments. Reliable and detailed geographic coding in administrative record systems, in particular. has become increasingly important as administrative records have received wider application in preparing statistics for use in distributing Federal funds to State and local governments. For many purposes geographic coding is required at the municipal level, but substate coding in administrative record systems tends to be restricted to county identifiers. The lack of current economic information by municipality has hindered effective planning and economic policy making at the Federal as well as State and local level. For business reporting systems, the SSEL coding system can provide a basis for obtaining consistency in both geographic and industrial coding. The need for consistent standards for data tabulation have recently been highlighted by efforts to assemble a data base for analyzing small business policy issues. These efforts have been hampered by inconsistencies among various administrative and statistical programs in the ways in which data are identified and tabulated by size of business. B. Access A central issue related to meeting the differing requirements of data for administrative vs. statistical applications efficiently involves the problem of obtaining an appropriate balance between the need to access individual records and the right to privacy as well as consideration of confidentiality of responding persons and businesses. Resolution of this issue requires that distinctions be made both in terms of the uses to be made of records and the types of reporting units and information involved. Recommendation 4. Natural persons should be distinguished from organizations and other entities when developing standards and practices of record confidentiality. The need for confidentiality is not the same for businesses and other organizations as for natural persons. Often,, the need for access to selected information pertaining to businesses requires interagency transfer of information about organizations. The subcommittee has found, for example, instances in which Federal a#coca purchase privately produced lists of businesses containing generally available information, such as name and address of the businesses, because access to more complete and reliable lists such as the Census Bureau's SSEL has been excessively restricted. The subcommittee is not persuaded that these restrictions are reasonable or necessary. Recommendation 5. Legislation and administrative procedures should be modified to make comprehensive Federal lists of businesses and organizations, such as the 2 Census Bureau's Standard Statistical Establishment List and SSA's employer listing, more readily available for statistical uses. Legislation has been drafted to make the SSEL available to Federal agencies for statistical purposes. Passage of the proposed legislation could aid in reducing the duplication and costs, and the attendant differences in definition and coverage resulting when independently developed lists are maintained. SSA's listing of employers is compiled from the applications for employer numbers required of employers of workers covered by Social Security, now virtually the entire workforce. Availability of this list as a statistical sampling frame has been closed by application of the Tax Reform Act of 1976. Recommendation 6. For natural persons. the principles of "functional separation" developed by the Privacy Protection Study Commission, the White House Privacy Initiative, and the President's Statistical Reorganization Project should be applied in distinguishing records to be used for administrative (and enforcement) purposes from records to be used for statistical purposes. Functional separation will establish two discrete categories of information according to the statistical or administrative and enforcement functions to which the information is assigned. The separate category of statistical information- can be freely used and transferred with individual identifiers intact for statistical purposes. Between the two categories, information that can be uniquely associated with subject individuals flows only one way, into the statistical category. The flow from the statistical category into other uses must be in a form or under conditions that prevent unique association. When administrative records are the initial information source, the resultant copies or extracts which have been incorporated into statistical files may not be subsequently used in individually identifiable form for administrative or enforcement purposes.' Recommendation 7. Particular legal and administrative barriers to access to administrative records for statistical use should be identified and eliminated for records pertaining to both natural persons and organizations. The subcommittee, for example. has found limitations on access to IRS data imposed under Section 6103 of the Tax Reform Act of 1976 to be excessively restrictive to statistical uses of the data. In this connection it can be noted that the Internal Revenue Service has denied other Federal agencies access to Taxpayer Compliance Measurement Program data files for 1976 and subsequent years. In addition, the Tax Reform Act has prevented the Social Security Administration from supplying the Bureau of Economic Analysis with post- 1975 Continuous Work History Sample Files needed to continue a long-standing cooperative program to use and improve this important statistical data base. C. Other Government-Wide Program Coordination and Support In order to maximize the usefulness of administrative record systems, it will be necessary to identify on a government-wide basis those data collection programs, as well as research initiatives, which need interagency support. Further the needs of data users should be considered in designing statistical series based on administrative records. Recommendation 8. Procedures for planning and setting budget priorities should be developed to ensure that agency and program- specific budget allocations are responsive to those interagency data needs that are met most effectively through the specific programs under review. Many administrative programs are not explicitly budgeted for supplying those general-purpose statistical needs which could be met efficiently through statistical use of administrative records. The subcommittee has found, for example, that geographic and industrial data quality in the Social Security Administration's Continuous Work History Sample has been declining because the data have few applications for internal SSA programs and therefore receive low priority in the agency budgeting process. Geographic and industrial data from the CWHS, however, are very important for outside data users. And they will become even more important if administrative records are called on to play a central role in providing intercensal estimates. In planning alternatives to a mid-decade census there should be careful cost-benefit analysis of different approaches involving various combinations of survey and administrative record data sources. Recommendation 9. As recommended by the President's Statistical Reorganization Project, efficient statistical tools should be applied in information collection programs extending well beyond the confines of the principal statistical agencies. Statistics can contribute techniques for improving design of forms. both to improve quality of response on administrative forms, and to improve the multi-purpose utility of the information provided. Development and extension of such statistical techniques as scientific sampling. record matching, and synthetic estimation can be used effectively to economize on the amount of information that needs to be collected, thereby reducing paperwork burdens and budgetary costs associated with administrative as well as statistical data collection programs. 3 Many administrative record data collection programs have lagged well behind the "state of the art" in the application of statistical tools, and modernization of programs is badly needed. Recommendation 10. To obtain statistical data. increased use should be made of matches between sample surveys and administrative files. Samples based on linkageS among administrative record systems also should be encouraged for statistical purposes. The subcommittee has investigated the statistical uses of linking of administrative record files with sample survey data. as well as with samples from other administrative records. The subcommittee endorses the use of matching to obtain statistical data based on the combination of administrative records and sample surveys. The analytic potential of obtaining expanded. more detailed data bases through successful matching is sufficiently great that complicated procedures are often worth the effort. However, for each specific program proposing to use linkage s to obtain statistical data. it is necessary to examine the costs and benefits to the program to determine whether the match should be performed. The case studies in Chapter VI illustrate potential uses of administrative records for important statistical programs'. each case study has specific goals, applications, and advantages. Mc combined use of administrative record files and sample survey data for linkage programs may be effective for a variety of masons. including that: (1) respondent burden may be reduced while estimates of subpopulation characteristics are improved and data accuracy is assessed (see SIPP case study), (2) data which are difficult for a survey respondent to provide may be obtained from administrative record files (see LASS case study). (3) improved counts of population from the 1980 Census may be obtained in a cost-effective manner (see Nonhousehold Sources Program case study), and (4) estimates of coverage of population for States and selected subgroups of the population based on the 1980 Census my be obtained (see case study on IRS/SSA/HCFA matched with CPS and Census). Recommendation 11. The provision o f services to users should be recognized as a statistical program function to optimize the availability of statistical information in Federal. State and local government and in the private sector, and to give the Federal system the benefit of feedback from users in planning statistical programs based on administrative records. A major obstacle to encouraging statistical use of ad- ministrative records is the lack of knowledge (both inside and outside the Federal Government) about the information in these records and their coverage and quality. The American Statistics Index provides a comprehensive list of published statistics from administrative and survey sources, but information on the quality and availability of unpublished data, particularly from administrative records, is seriously deficient. Centralized information is needed to make existing data more readily accessible to potential users and to help in identifying unnecessary duplication in data collection programs. Promising recent initiatives in this area include a Small Business Administration program to document all Federal reporting requirements placed on businesses and a National Center for Health Statistics program to establish a clearinghouse for data relating to environmental health hazards. In addition, the proposed Paperwork Reduction Act of 1980 (H.R. 6410) provides for establishing a Federal Information Locator System, as recommended by the Commission of Federal Paperwork. 4 CHAPTER II Introduction and Summary A. Introduction The Federal Statistical System is under pressure to respond simultaneously to a growing demand for statistical data and a growing demand for reductions in the "paper blizzard" generated by Government requests for information from individuals and businesses. These demands will necessarily conflict unless the efficiency of current programs can be improved. Responsiveness to both demands will require reduced duplication among Government information collection programs combined with more intensive utilization of existing administrative information sources in meeting statistical data needs. The latter requirement will involve bringing together information collected in numerous different Government administrative programs in ways that make possible their combined use for statistical analysis. As stated by Edgar Dunn (1965, P. 5) in a review of the Ruggles' Committee proposal for a national data center. The central problem of data use is one of associating numerical records. No number conveys any information by itself. It acquires meaning and significance only when compared with other numbers. The greatest deficiency of the existing Federal Statistical System is its failure to provide access to data in a way that permits the association of the elements of data sets in order to identify and measure the interrelationship among interdependent activities. As Dunn further notes (1965, Summary, p. 2) problems of access and record association are particularly serious in the case of statistical use of administrative records because: "Many of the most useful records are produced as a by-product of administrative or regulatory procedures by agencies that do not recognize a general-purpose statistical service function as an important part of their mission." The association or merger of administrative records from a variety of sources is important for statistical applications because: (1) populations of statistical interest do not always correspond closely to populations covered in individual administrative record systems; and (2) individual administrative record files often identify relatively few of those characteristics and attributes of the members of a population that social scientists and policy analysts consider to be important in meeting their statistical needs. Merging individual administrative record sets with other administrative and statistical data sources can help to alleviate the deficiencies of many individual administra- tive sources; but record merging is often difficult--particularly when the records are collected and maintained by separate agencies. Provisions for protecting the confidentiality of records pertaining to identifiable individuals or businesses often preclude interagency transfer of such records for statistical applications. And even when access to the records needed for merging can be arranged, differences in the ways different agencies identify individual reporting units, and/or inconsistencies in the ways agencies collect, process, and maintain information about reporting units, can preclude successful data matching and merging operations (see Chapter VI). Although difficult problems remain to be solved, statistical uses of administrative records have been increasing and will continue to increase because of high data collection costs and heavy respondent burdens associated with censuses and surveys. Many important statistical needs cannot be adequately met by a system involving censuses, carried out every 5 or 10 years, combined with intercensal surveys which provide national data. And the extra costs of moving to more frequent censuses and/or larger sample surveys which might provide small area data are high both in terms of direct government expenditure and response burden. The projected high cost to the government was an important factor in the recent decision to disallow further planning funds for the 1985 mid-decade census. The most striking illustrations of the need to make improved statistical use of administrative records arise in cases involving the use of socioeconomic data to distribute Federal funds to State and local areas. For example, in reviewing alternatives for meeting the legislative mandate to produce current local-area unemployment estimates for use in allocating funds under the Comprehensive Employment and Training Act, the National Commission on Employment and Unemployment Statistics ( 1 979, p. 253) has estimated that it would cost about $2.3 billion annually to expand the Current Population Survey to provide monthly unemployment estimates for the over 4,000 geographic areas potentially eligible for CETA funding. As important as the high money costs involved in obtain- 5 ing frequent small-area data by survey techniques is the substantial increase in response burdens associated with greatly expanded data collection efforts. For example, another alternative considered by the NCEUS was improving the handbook method (called 70-step method) based on unemployment insurance records. Not only is them pressure for statisticians to increase their use of administrative records in developing general-purpose statistics, but statisticians also have a strong interest in supporting efforts to reduce the duplication and improve the efficiency of administrative as well as statistical information collection efforts. Direct reporting for statistical purposes accounts for a very small proportion of the overall Federal reporting burden; major reductions in overall paperwork burdens must be achieved through improvements in nonstatistical arm. At the same time; however, statistical programs could be more adversely affected than other programs because statistical programs tend to be more often viewed as optional than administrative record systems and, therefore. more dependent on the voluntary cooperation of the public in obtaining responses to information requests. As the following statement from the President's Statistical Reorganization Project's "Issues and Options" paper (1978, p. 7-1) indicates, there is a growing recognition of the importance of applying statistical tools to more general problems of information collection in order to reduce reporting burdens: The tools used by statistical agencies (sampling, quality control, intensive analysis of existing data, etc.) are near the roots of reporting requirements, and the use of appropriate tools reduces reporting burden. It is in this sense that. from the point of view of response burden, the use of appropriate statistical techniques is of major importance and should extend well beyond any formal definition of the Federal Statistical System. The statistical system, however, cannot hope to dominate Government information collection activities; There must be a genuine effort to cooperate with administrators in nonstatistical programs in order to achieve mutual goals of efficient information collection. Statisticians must attempt to understand the needs and constraints facing program administrator and statistical budgets should bear a fair share of the costs of collecting and processing administrative records in ways that permit efficient use for statistical purposes. Much must be learned and many difficult problems confronted if progress is to be made in the statistical use of administrative records and in improving the overall efficiency of Government information collection and use, With the hope of contributing to progress in this area, this report attempts to: (1) identify major administrative data files with significant potential for general- purpose statistical applications; (2) indicate various kinds of statistical uses of administrative records which are being made or considered; (3) identify major technical and institutional or legal problems which are impeding effective statistical use of administrative records; and (4) suggest possible approaches to improving information collection and statistical use of administrative records. The Subcommittee on Statistical Uses of Administrative Records has not attempted to provide comprehensive documentation of administrative record systems and their uses. The report instead reflects largely the areas of interest and expertise of Subcommittee members. Important areas such as energy and environmental statistics are not covered at all, and very little attention is given to records generated by the complex array of Government regulatory agencies. There is, however, relatively intensive coverage of administrative data from programs of the internal Revenue Service and Social Security Administration, and from related administrative programs that collect important social and economic information from individuals and businesses. B. Summary Chapter III of the report presents the results of a survey conducted by the Subcommittee to obtain documentation of major administrative record data files maintained by selected Federal agencies. Chapter IV presents a description of statistical applications of administrative records in selected agencies. The following three chapters (V-VII) illustrate, largely by means of case studies, specific approaches to statistical use of administrative records and problems encountered in such approaches. Chapter VIII reviews legal considerations, particularly those related to restricted access to records, that influence the statistical use of administrative records. 1. Chapter III-Major Administrative Files This chapter summarizes the characteristics of major computerized administrative record files that are maintained or mandated by the Federal Government and contain statistically useful information pertaining to (I) individuals or (2) businesses. The information contained in the administrative files for individuals is compared to the information on individuals collected in decennial censuses; and the information contained in the administrative files for businesses is compared to the information contained on the Census Bureau's Standard Statistical Establishment List (which is itself assembled from a combination of administrative and survey data sources). The chap- 6 ter also contains a description of the Social Security Administration's Continuous Work History Sample which is a set of statistical files of individual worker records assembled using several SSA business and individual administrative record files. Compared with the decennial census, most administrative record files for individuals contain relatively little information on population characteristics and/or cover only a limited segment of the population. In addition, the, census usually provides more reliable and detailed geographic information than administrative files; and at best, administrative records can provide only tough approximations to such census reporting units as the family and household. On the other hand, many administrative files provide data at much more frequent intervals than the decennial census, and the presence of social security numbers on most administrative files opens the possibility of linking files over time (longitudinally) or merging information from more than one administrative file in order to increase the cove rage of individuals and/or the number of characteristics identified for particular individuals. The absence of SSN's in census records generally makes it difficult to integrate information from censuses with information from administrative records. Administrative record coverage of businesses is complete than is true for individuals. In fact, administrative lists of businesses provide the basis for conducting statistical censuses and surveys. For the most part, however, administrative records do not maintain separate information for the different establishments of a single legal business entity, even though the business may operate in several different geographic areas and/or industrial categories. The Census Bureau does collect information for individual establishments; and the SSEL, therefore, contains a larger list of reporting units than most administrative files. While most administrative business files do not contain the establishment detail necessary for developing reliable geographic and industrial data, the SSA and Unemployment Insurance payroll tax programs do involve reports breaking out county level "establish- ment" detail. Unfortunately, however, the reporting units in these programs are not consistent with the establishment concept used in the SSEL, and there is currently no satisfactory basis for coordinating the reporting of similar information (or resolving data discrepancies) among the three systems. CWHS data files provide information on the demographic characteristics (sex, age, and race) of. workers along with longitudinal information on their employment and earnings patterns. The CWHS program illustrates the potential statistical advantages of administrative records for longitudinal analysis and for linking together information about individuals and businesses. 2. Chapter IV-Major Statistical Uses of Administrative Records This chapter illustrates statistical uses of administrative records with reference to the programs of selected Federal agencies, particularly programs of the Social Security Administration, the Internal Revenue Service, the Bureau of Economic Analysis, the Census Bureau, and the Small Business Administration. The SSA and IRS programs involve the development of general-purpose statistics by statistical divisions of agencies that collect large amounts of information from individuals and businesses in the course of their administrative responsibilities. The programs illustrate the large quantity and variety of adminis- trative data collected as well as the limitations of incomplete population coverage and lack of information on important population characteristics that plague statistical use of administrative records. The BEA programs illustrate the use of a wide variety of administrative data (obtained from many agencies) for estimating data series within the context of a systematic economic accounting framework. Administrative data are used in conjunction with census and survey data (also generally obtained from other agencies); and there are substantial variations among the administrative data series in the extent to which they involve concepts and measurement procedures that "fit" well with the concepts involved in the design of the accounting framework and with concepts underlying the census and survey data used. Census Bureau programs illustrate a wide variety of applications of administrative records for both individuals and businesses. For example, records obtained from administrative agencies are used in developing intercensal population and related estimates, as a substitute for censuses in the collection of economic data from many small businesses, in the development and maintenance of sampling frames for surveys, and in the evaluation of the completeness and, reliability of information collected in censuses and surveys. Again there are substantial variations in the extent to which administrative record concepts match desired statistical concepts. A few census programs. primarily in the area of economic statistics. art discussed in more detail than other programs covered in Chapter IV. These more detailed examples illustrate the substantial cost savings as well as limitations associated with the statistical use of administrative records. The SBA involvement in the statistical use of administrative records stems largely from a recently initiated project to develop a small business data base in conjunction with the 1980 White House Conference on Small Business. In part because of concerns over reporting burdens, small businesses have been exempted from or 7 covered on a very small sample basis, in most economic censuses and surveys. Therefore. a small business data base must rely heavily on administrative records. SBA efforts to develop such a data base illustrate many of the problems that are often encountered in gaining access to administrative records and adapting them for statistical analysis. 3. Chapter V-Developments in Data from Business Establishment Reporting This chapter contains case studies of three important and related statistical programs that are currently evolving based in large part on developments in administrative record systems-(1) the Census Bureau's SSEL program; (2) SSA's program for adapting its CWHS data program to a new system of annual employer reports of worker wages on forms W-2 and W-3; and (3) the Bureau of Labor Statistics' program for developing work force statistics in connection with the UI payroll tax program. These programs produce both complementary and overlapping statistical products in the area of work force statistics; and they illustrate not only the importance and potential of administrative records for developing work force data, but they also illustrate some important problems in the area of establishment reporting by multiestablishment businesses and in the area of coordinating similar data collection efforts in different agencies. The Census Bureau program employs the most satisfactory concept of establishment from a statistical point of view, but the Census work force data assembled in connection with the SSEL cannot match the frequency and timeliness of BLS data based on the UI system, nor can the SSEL-based data provide the information on demographic characteristics of workers available from the SSA system. And the different establishment reporting plans of the three data systems combined with difficulties of interagency transfers of records (for example, the current restrictions on access to the SSEL) have severely limited the scope for coordinating data collection and development efforts in the three programs. 4. Chapter VI--Potential Uses of Administrative Records for Data Linkages: Selected Case Studies This chapter involves four case studies that illustrate the potential and the problems associated with record linkages as a means of improving and extending the use of. administrative records in developing primary data and in evaluating census and survey data--(1) the "Linked Administrative Statistical Sample Project" (2) the "Use of Administrative Records in the Survey of Income and Program Participation," (3) the "Use of IRS/SSA/HCFA Administrative Files for 1980 Census Coverage Evaluation," and (4) "Record Linkage in the Nonhousehold Sources Program." In contrast to Chapter V, where the difficulties of coordinating and linking business establishment records among programs was highlighted, Chapter VI is concerned with linkages involving records for individuals. The LASS project involves efforts to link records from a variety of administrative record sources in order to develop a general-purpose statistical sample file that will be suited for mortality research. The sampling procedures will conform closely to those involved in the CWHS in order to facilitate longitudinal data analysis, but CWHS records will be supplemented with records from IRS and the National Center for Health Statistics. The project illustrates the substantial potential for combining complementary data through interagency linkage of administrative record files. But the project also illustrates significant technical problems and problems of access restriction that need to be resolved in linking data files prepared in different agencies. The SIPP case study illustrates the importance of administrative records in efforts to alleviate substantial survey biases in coverage and income reporting for low-income groups (participating in various income maintenance programs) and administrative record importance as a source of income data to evaluate the reliability with which selected types of income are reported in surveys. The third and fourth case studies are both associated with efforts to evaluate and improve the 1980 Census of Population and Housing. The IRS/SSA/HCFA files will be used primarily in efforts to evaluate the extent of Census undercoverage, while the Nonhousehold Sources Program will be concerned with improving population coverage in selected areas of anticipated high undercount. The latter program involves, in addition lo the use of Federal agency records, the use of such State and local administrative records as drivers' license records. Both projects demonstrate the potential of administrative records to identify individuals who are missed in censuses and surveys. The projects also illustrate; however, the difficulties and high costs of linking administrative records to census records (which contain no social security number) and the difficulty of determining the extent to which particular groups are not covered in either census or administrative record sources. 5. Chapter VII-Technical Problems in the Statistical Use of Administrative Records This chapter illustrates technical problems encountered in making statistical use of administrative records that arise or are exacerbated because of limited statistical control in administrative record systems over such factors as population coverage,, definitions and comparability of information concepts among programs, and reporting and 8 processing procedures. The CWHS data program is used as the principal source of illustrations, in part because the CWHS program involves the use of files containing information about businesses as well as individuals, and perhaps more importantly because it illustrates well the problems that can arise when important statistical aspects of the reporting and processing of records we largely outside the control of statisticians responsible for making statistical use of the records. In particular them is evidence of significant and increasing numbers of geograPhic coding errors in the CWHS that have resulted from low priority attached by SSA administrators to the statistical problem of obtaining reliable geographic reports and ensuring accurate coding and processing of geographic information in employer payroll reports to SSA. 6. Chapter VIII: Legal Issues in the Statistical Use of Administrative Records. This chapter illustrates legal and related institutional barriers which inhibit the interagency access to records that is needed for improving the efficiency and effectiveness of statistical use of administrative records. Emphasis is placed on problems which arise because of a failure of existing confidentiality laws to make an adequate functional distinction between statistical and administrative processes which use records about individuals. The basis for interagency transfer of administrative records is often found in a logic that imposes regular Procedures or conditions for expanding the scope 'of administrative actions or decisions which can be based on the. particular content of records about an individual. Such a logic is generally irrelevant with respect to legitimate statistical processes which, in contrast to administrative uses, merely produce relationships and summaries of data, and do not involve any direct Government action against (or in favor of) the individual as a consequence of information in records pertaining to that individual. Clearly not all statistical performance is functionally divorced from administrative processes: program integrity and quality assurance are functions which may explicitly---and quite properly-rely on applied statistical techniques to identify individual cases for administrative action. Such functions are within the reasonable expectations of program participants, and do not rely, moreover, on collection of information from volunteers, with assurances of confidential treatment. In contrast, there are particular statistical activities or collections of data whose existence and rationale for compiling and making interagency transfer Of data is limited by the degree to which statisticians can fulfill a legal or ethical duty to protect the confidentiality of individual information. Statistical uses in this latter category need to be separated out as discrete functional uses, and be governed by different rules and standards from those which govern administrative and compliance uses. Proposals for functional separation" of statistical from administrative uses argue for separating these statistical records about identifiable individuals from the decision/action stream, and permitting the statistical results to be available to adminis- trators only in summary or other unidentifiable form. Functional separation would allow summaries, of course, to be used administratively in ways which my result indirectly in consequences affecting all members of the group in uniform ways. However, functional separation would not permit the direct use of individual records as the basis for individual actions. Alternative legislative proposals for implementing the concept of functional separation are reviewed in the chapter. 9 CHAPTER III Major Administrative Data Files This chapter describes the general properties of most of the major Federal administrative record files containing statistically useful information pertaining to individuals or businesses. The discussion is based largely on a survey of selected Federal agencies conducted by the SUAR Subcommittee. An attempt is made to lay the groundwork and indeed begin the discussion, continued in Chapter IV. of the statistical uses of administrative record systems. Organizationally, the chapter is divided into four sections and two appendices. The first section indicates the scope of the administrative record files covered and describes the survey instrument used to obtain file documentation. In the second section there is a brief summary of the survey results. In the third section there is a brief description of the Social Security Administration's Continuous Work History Sample files. The CWHS files illustrate the process of extracting and merging information from basic administrative files to obtain files useful for statistical analysis. In the final section there is a discussion of selected factors associated with the historical evolution of the statistical use of administrative files covered in the chapter. The survey questionnaire is reproduced in the first appendix, and a more detailed description of the CWHS program and data files is contained in the second appendix. A. Scope of Study and Survey Conducted 1. Scope of Study In compiling a list of "administrative" record files that would be of greatest statistical interest, three criteria were employed: 1. Does the file have extensive coverage of a Population (either individuals or businesses)? 2. Is the population covered by the administrative record set of statistical interest? 3. Is the file maintained by computer? The systems chosen for examination under these criteria are shown in Figure III.1. Information relating to individuals was sought from ten Federal agencies; some twenty-four administrative record files were involved in all. Figure III.1 Major Administrative Record Files Surveyed by the Subcommittee on the Statistical Uses of Administrative Records ______________________________________________________________ Agency Administrative Record File ______________________________________________________________ Part I-Information on individuals Bureau of the Census 1970 Census of Population 1980 Census of Population Office of Personnel Man- Central Personnel Data File agement Civil Service Annuity Roll Department of Defense Active Military Personnel Data File (Army, Navy, Air Force and Marines) Military Retirement Compensation File (Army. Navy Air Force, and Marines) Department of Trans- National Driver Register portation Internal Revenue Service Individual Master Filer Department of Education Basic Education Opportunity Grant Railroad Retirement Research Master Beneficiary File Board Service and Compensation (SCORE) Railroad Retirement, Survivor and Pensioner Benefit Payment File Social Security Adminis- Summary Earnings Record nation Master Beneficiary Record Numerical Identification File (SS-3) U.S. Coast Guard Personnel Management Information System Retired Officers Support System Retired Pay and Personnel System Veterans Administration Compensation and Pension Master Record Insurance (In-Force) Master Record File Education Master Record File Vocational Rehabilitation and Education Statistical File Insurance Awards Master Record File Education Master File ______________________________________________________________ Part II-Information on Businesses Bureau Of the Census Standard Statistical Establishment List Bureau of Labor Statis- Unemployment Insurance Address File tics Department of Agricul- Producer Name and Address Master ture File Economics, Statistics, and Cooperative Service List Sampling Frame Department of Health Master Facility Inventory and Human Services Internal Revenue Service Business Master File Exempt Organization Master File Social Security Adminis- Master Employer Name Directory tration Multi-Unit Code File Single-Unit Code File 11 For businesses, the scope of the inquiry was restricted to nine major Federal systems in six agencies. It should be noted that although the Subcommittee does, not Classify the decennial censuses of population as administrative data files. since their main purpose is statistical, they are nonetheless. included to provide a basis for comparison with the other files on individuals. The Census Bureau's Standard Statistical Establishment List was also treated as "in scope" for comparison purposes. this time with business administrative record files. 2. Survey Conducted In late 1978. the Subcommittee conducted a survey of the administrative files listed in Figure II.1. This survey was entitled "Statistical Use Survey of Records Pertaining to Individuals. Individual Firms, and Employers Maintained and/or Mandated by the Federal Government. A questionnaire was mailed to each agency maintaining one of the selected files. The principal purpose of the questionnaire was to document the data elements on each file that might be of statistical interest. it was not the intent of the survey to be comprehensive, but simply to provide a starting point for structuring inquiries about the files. This survey collected data on both individual and business files by providing optional sections to completed depending on the type of file being considered. The survey consisted of only fifteen questions, but a number of the questions contained several parts. Respondents were asked to report the availability of documentation concerning the file, the information carried on the file, and the history of the file development and maintenance. For the most part, each agency made a serious effort to provide detailed responses to the questions. B. Survey Results This section briefly summarizes the survey results. First. the files pertaining to individuals are considered. then those pertaining to businesses. Detailed tabulations from the survey are included in Tables II.1.1 and III.2. 1. Files Pertaining Mainly to Individuals Not unexpectedly, there are extensive differences among the administrative record files on individuals. some of those which deserve special mention are the differences in coverage (or "universes") among the files, the degree of coded geographic information; the demographic item included and the reporting units used: a. Universe In terms of coverage of individuals in the U.S. population. the decennial Census files are the most complete, followed by Social Security's Summary Earnings Files and the IRS Individual Master Fide. No other files have the same breadth of coverage as these. However, several other files do provide comprehensive coverage of important segments of the population. For example, the Health insurance Master File for the "65 + " population, the 12 Central Personnel Data File-for Federal government workers; and the Military Personnel Data Files-for present and former Armed Forces members. b. Geographic information Administrative files tend to have limited coded geographic information. Some contain a State code, but this was usually derived from the mailing address. The only exceptions appear to be SSA's Master Beneficiary Record file, and the related HCFA Health Insurance Master File, which contain a county code obtained by clerically coding the mailing address. By way of contrast, the Census geographic data are collected on a residence basis and we available to the block level. This lack of detailed "residence geography" is a major problem in using administrative records to prepare small area statistics. By using the mailing address, subcounty geography may be assigned with a Geographic Base File developed for use in the 1970 or 1980 census. However, this presents a number of problems. First, the mailing addresses are not always the usual place of residence. Second, GBF's do not exist for areas located outside the built up portion of SMSA's. Third, people living outside the city limits tend to report themselves as living in the city if they have a city post office address. Fourth, post office delivery or zip code areas do not conform with political boundaries. Also, the cost of assigning geography with a GBF system is high. Another approach is to add a residence geographic code to the administrative file. This was done for the 1972 and 1975 Individual Master Files so that IRS data could be used in preparing population and per capita total money income estimates for use in distributing General Revenue Sharing funds. The cost of this straightforward approach makes it unlikely that it will be widely implemented on other files. c. Demographic information By comparison with the Census data, all administrative files contain very limited demographic information. The Numerical Identification (SS-5) file does contain sex, date of birth, and race which have been transferred to the Summary Earnings Record and the Master Beneficiary Record. The personnel files also have some race information. However, other than this, there is very little demographic data present. d. Reporting unit The Census data are the only data organized into households and families. Tax returns, and Social Security claims, however, can for some purposes be treated as approximations to family units. For the most part, however, the units are just individuals with no potential for structuring them into households. One final point. The survey showed that all the administrative files for individuals are organized by social 13 security number. This is distinct from the decennial census files which do not-have the SSN recorded- BY and large, the SSN is the major administrative identifier. Obviously, then, it is this variable which would have to be employed for linkages among the files-whether for statistical or operational purposes. 2. Files pertaining Mainly to Businesses The employer identification number is a major identifier on most of the administrative record files- including even the Census' Standard Statistical Establishment List. Some other similarities and differences in the files are: a. Universe The file with the largest coverage is the Master Employer Name Directory with about 27 million records' However, this file is not current and contains inactive businesses. The SSEL is the most comprehensive current list of businesses with the exception of the very small businesses. For these businesses, the IRS Business Mas- ter File is more complete. The Department Of Agriculture's Producer name and Address Master File, and their Economics, Statistics, and Cooperative Service List Sampling Frame have extensive coverage of the farming sector. b. Geographic information As with the individual record systems, them is no subcounty geography data,present on any of the business files with the exception of the SSEL. For businesses, location may have different meanings. Most of the geography reported on these files is in terms of company headquarters and may not refer to the individual establishment. Consequently, a reporting of a major geographically dispersed company at its headquarter's location can introduce a significant error into the data. c. Economic data Number of employees, total payroll, and gross sales seem to be the most common economic items present on the files. d. Reporting Unit The reporting unit of these files is mainly the Employer Identification Number with the exception of the SSEL. This creates a problem in any statistical use of these files because some EIN's represent only part of a company but an EIN may cover many establishments. C. Continuous Work History Sample Files The survey results in the previous section indicate clearly that individual administrative record files usually do not contain the comprehensive population coverage and detailed identification of population characteristics desired for most statistical analysis. The results also indicate, however, that it is often technically possible to overcome some of the limitations of single administrative files by linking several files and merging the information contained in these files. With files pertaining to individuals the SSN provides the principal basis for linkage and with business files the EIN is usually the basis for linkage. Both the problems and the potential benefits of file linkage we increased significantly when interagency linkages are considered (see, for example, the discussion of the Linked Administrative Statistical Sample in Chapter VI); but highly valuable statistical files can be developed through intra-agency linkages of administrative files in such large agencies as IRS and SSA. The Continuous Work History Sample program of SSA illustrates well the problems and potential of such intra-agency file linkages. The CWHS program involves the construction of several statistical sample files from information contained in the SSA administrative files documented in Tables III.1 and III.2., The 1 percent 1937-to-date CWHS file, for example, involves primarily the extraction and merger of information from the Summary Earnings Record and Master Beneficiary Record files documented in Table III. 1. Annual and longitudinal employee-employer CWHS files are constructed largely by merging detailed earnings items which are input to the Summary Earnings Record File with industrial and geographic information obtained from the SSA employer files documented in Table III.2. CWHS files do not contain occupational information for workers, nor do they contain the detailed socioeconomic characteristics available in census sample files. CWHS files do, however, contain information on worker sex, age, and race; and they can provide much greater longitudinal detail relating to the earnings history of workers than is available from any survey source. The CWHS program, moreover, has a considerable advantage over household surveys in obtaining employer information because of the possibility of direct links between employer and employee administrative files. The advantage of direct links between employer and employee information; however, is offset somewhat by quality problems associated with the geographic and industrial coding in SSA employer files (sec Chapter VII). Because the CWHS program illustrates well both the potential and the problems associated with the statistical use of administrative records. examples of CWHS applications and deficiencies are presented throughout the report. Some of the more detailed references to the CWHS program are included in: (1) the discussion in 14 Chapter V of the new joint IRS-SSA system of annual employer reporting (on Form W-2) of individual worker wages; (2) the discussion in Chapter VI of the development of the new Linked Administrative Statistical Sample program; and (3) the discussion in Chapter VII of technical problems encountered in the statistical use of administrative records. To permit the reader to better follow the references to the CWHS made throughout the report, a detailed description of the CWHS program and CWHS files is presented in the second appendix to this chapter. D. The Evolution of Statistical Use of Administrative Records Chapter IV contains a detailed discussion of statistical uses of administrative records from the perspective of selected Federal agencies that make extensive use of administrative records in their statistical and research programs. Chapters V and VI then follow with detailed case studies of selected projects and programs involving intensive statistical use of administrative records. To provide additional background for the chapters on uses, this section reviews some of the circumstances surrounding t he evolution of statistical uses of administrative record files covered in Tables III.1 and III.2. The use of administrative records as a source of statistical information is not a new idea, but the last decade's extensive computerization of these files has fostered an increasing interest in the topic. In fact, there seems to have been a progression in the employment of administrative records for statistical purposes. Initially, with the establishment of an administrative records system, an agency prepared summaries of the data for guiding their operations and for policy decisions. This may be done with the full data set or a sample. Its purpose is primarily administrative, not statistical. Perhaps IRS is the best example. What started out as a mainly administrative effort has evolved into the current Statistics of Income program (see Chapter IV). While administrative considerations are still important, the Statistics of Income sample is used extensively by researchers to study issues of general statistical and economic interest. Administrative records systems were used very early in evaluation projects such as the evaluation of the 1950 Census income results using IRS and SSA data (NBER, 1958). After each decennial population census since then, there have been attempts to understand and quantify any error in the results by matching a small sample of census records to various administrative record sets such as IRS data (Schneider and Knott. 1973), Medicare data (U.S. Bureau of the Census 1973c), birth records (U.S. Bureau of the Census, 1963 and 1973a), death records (Kitagawa and Hauser, 1973), and employment records (U.S. Bureau of the Census, 1965). These evaluation efforts may be characterized by the relatively small number of cases involved. This limit on size is the result of the objective of the project as well as cost considerations. Most evaluation projects involving these Federal files are aimed at National results only and do not attempt to measure differences at the State or even regional level. (This is changing, however, for the 1980 Census Evaluation, the matching will attempt to produce estimates at the State level-see Chapter VI.) With the extensive computerization of administrative files in the 1960's, the possibilities for expanded statistical uses became obvious. For example, IRS completed the computerization of the Individual Master File with the 1967 file. Also, over this same period, there was a great reduction in the cost of computer data processing and an increase in understanding how to process and control large data files, thus making the use of these administra- tive files feasible for statistical purposes. These developments and potential uses of administrative records were understood and debated (Hansen, 1974). While that debate cannot be reviewed here, the outcome has been that no centralization of administrative records has taken place in the Federal government, but statistical uses of administrative records have continued. Some transfer of administrative records between agencies has been permitted, but each transfer has been justified and approved on a case-by-case basis (Kilss and Scheuren, 1979). Some people feel that this case-by-case approach has retarded the use of administrative records in developing useful statistical data, but this has never been fully documented. In one sense, survey- and census-based data may be blamed for the slow development of administrative records-based data. Up until recently (and perhaps still), survey- and census-based data have had a real edge on administrative records in several areas. For example, if small area data are needed, the Census of Population and Housing provides small area data defined completely and in the "correct" geography (i.e., by residence). Adminis- trative records-based data may be able to approximate the needed data, but not at the same level of accuracy. It is a question of trading-off accuracy for currency. If the need is for national. regional, or even State data, surveys may be a more efficient way to obtain needed data than the development of an administrative records-based system. However, with the need for small area data on a regular basis, the currency and small area advantages of administrative records may now outweigh the disadvantages of definitional problems and less accuracy. For example, with the passage of the State and Local Fiscal Assistance Act of 1972, the Bureau of the Census was asked to 15 provide population and per capita total money income data for 38,500 governmental units. The Bureau accomplished this by using an extract from the 1969 and 1972 entire IRS Individual Master File. This required IRS to collect and clerically code the residence address of all taxpayers on the 1972 IMF. The cost of the first set of estimates. including the IRS coding, was in excess of $5 million. This was the first administrative records-based project of this magnitude and demonstrated the expense and benefit of administrative records. It should also be noted that this successful application of administrative records used administrative records to measure change since the 1970 census (Fay and Herriot, 1979). In this way. the definitional problems were minimized. With the expanded interest in administrative records, them is now taking place the needed experimentation and research to understand the particular idiosyncracies of these files. This will, hopefully, come to fruition in the 1980's with useful data in several areas. For example, migration rates by race can be computed by linking race from the SSA Summary Earnings File to the IRS data. This has been done on a sample basis and State estimates prepared (Word 1978). It is expected that this work will continue. By using tax returns (or W-2's) to establish a current residence, and the Form 941 to link an employer to an employee, and the Master Employer Name Directory (mainly SS-4) to define an employer's location, current journey-to-work estimates are possible. The Bureau of the Census and the Bureau of Economic Analysis have done some work in this area, so far, however, without great success. The problems of multi-establishment employers, low quality geography coding of employers, etc.. are major obstacles when trying to estimate the change in a particular journey-to-work flow. (Chapter VII contains a more detailed discussion of the problems encountered in the BEA journey-to-work study.) Currently, the Census Bureau uses IRS adjusted gross income and wages and salary data to update the 1970 census per capita income estimates. By using the age, race, and sex data from the Social Security Administration, the IRS information could be adjusted for differential reporting by age, race, and sex. Updating income size distribution estimates with IRS data has long been considered desirable. The inability to group IRS returns directly into families or households makes such updating difficult, but synthetic estimation procedures involving IRS data are being used in the development of family personal income size distribution estimates at BEA (see Chapter IV). The need for targeted surveys and more sampling efficiency for small populations will continue to make administrative records important as a sampling frame. In the business files, the use of the business lists as sampling francs may be their single most important function, either to complete or to stratify a universe for sampling. In summary. the statistical use of administrative records will continue to grow, but not easily. The use of administrative records data in preparing statistics must be preceded by a period of analysis and experimentation in order to understand the particular problems inherent in each administrative record system. E. Appendix III.1 The Survey Questionnaire Statistical Use Survey of Records Pertaining to Individuals, Individual Firms, or Employers Maintain and/or Mandated by the Federal Government Survey for: Subcommittee on Statistical Uses of Administrative Records Federal Committee on Statistis Methodology Office of Federal Statistical Policy and Standards Please complete the following questions as applicable. Since this survey covers individuals, householdsm and business organizations (firms and employers), not all of the questions may pertain to the data file you are answering the questions about. If you have any questions concerning the survey or concerning a particular question; or need additional copies of the survey form, please contact Ms. Maria Gonzales on (202) 673-7953. (Please mark the appropriate category or categories or supply the requested information) 1. What is the name of the file? A) General name by which the file is usually called___________________________ B) Technical or official name if different from the general name_______________________________________________________ 2. What type of documentation exists for the file? __ International Documentation __ Not available to anyone outside the agency. __ Available on request. 20 16 _ Outside Documentation _ None currently prepared. _ Available on request. _ Not now available, but could be prepared upon request. 3. What type of documentation is available outside the agency? _ Record Layout _ File description--technical description _ General file description without specific field description _ No documentation available outside agency 4. What type of information is present on the file? The purpose of this question is to obtain a list of the kind of information present on the file which might have statistical uses. You may respond to the appropriate questions below or provide a separate listing of the infor- mation on the file. Is the reporting or filing unit an individual, household, business, or some other unit? _ Individual (Answer 4A) _ Household, Family, or Other Group of Individuals (Answer 4B) _ Business or Employer (Answer 4C) _ Other reporting unit (Answer 4D) 4A. What kind of information on individuals is present on the file? Please Circle Yes or No as Appropriate 1) Person's name Yes No 2) Mailing address Yes No 3) Residence address Yes No 4) Has the address been assigned Yes No a geographic code? If yes, what level of geography are present? State Yes No County Yes No Place Yes No Other, please specify__________ 5) Race--If yes, what are the cate- Yes No gories? 6) Spanish or oher ethnic origin de- signation--If yes, what are the categories? ____________________ Yes No 7) Date of birth or age Yes No 8) Sex Yes No 9) Marital Status--If yes, what are the categories?__________________ Yes No 10) Income--If yes, what are the Yes No types of income present?________ 11) Person's family or household in- come--If yes, please specify type. 12) Social Security or Railroad Retire- ment Number Yes No 13) Is the person's employer identified? Yes No If yes, is the employer's Empoly- er Identification Number present 14) Is the person's occupation identi- Yes No fied? 15) Is the person's occupation identi- Yes No fied? 16) Level of education or technical Yes No skill 17) Place of birth or foreign country Yes No of birth 18) Information on person's health or Yes No disability--If yes, please specify __________________________________ 19) Other relevant statistical informa- Yes No tion --If yes, please specify_____ 4B. What kind of information on a household, family, or other group of individuals is present on the file? Please Circle Yes or No as Appropriate 1) Person's name Yes No 2) Mailing address Yes No 3) Residence address Yes No 4) Has the address been assigned Yes No a geographic code? If yes, what level of geography are present? State Yes No County Yes No Place Yes No Other, please specify__________ 5) Household or family size Yes No 6) Each household or family member Yes No identified 7) Household or family income Yes No The following questions apply to the household or familly head or primary applicant. 8) Date of birth or age Yes No 9) Sex Yes No 10) Race--If yes, what are the cate Yes No gories? ______________________ 11) Spanish or other ethnic origin des- Yes No ignation--If yes, what are the categories? ___________________ 12) Social Security or Railroad Retire- Yes No ment Number 4C. What kind of information on business organizations or employers is present on this file? Employer Other please Company or Establish- Identification specify in the Enterprise ment Number (EIN) Remark section ___________________________________________________________________ The file is organized by (please check ß ß ß ß the correct): 1) Name Yes No Yes No Yes No Yes No 2) Address Yes No Yes No Yes No Yes No 3) Location code Yes No Yes No Yes No Yes No for establishment or other report- ing unit 21 4C. What kind of information on business organizations or employers is present on this file? (Continued) Employer Other please Company or Establish- Identification specify in the Enterprise ment Number (EIN) Remark section ___________________________________________________________________ 4) Number of employees-- Yes No Yes No Yes No Yes No If yes, as of what date?_________________ 5) Total payroll Yes No Yes No Yes No Yes No Annually Yes No Yes No Yes No Yes No Quarterly Yes No Yes No Yes No Yes No 6) Primary industry-- if yes Yes No Yes No Yes No Yes No what industry coding system is used? for example, 4 digit SIC, 2 digit SIC, etc. ______________________ ______________________ ______________________ 7) Secondary industry Yes No Yes No Yes No Yes No 8) Gross sales or receipts Yes No Yes No Yes No Yes No 9) Product description Yes No Yes No Yes No Yes No 10) Amount and description of Yes No Yes No Yes No Yes No capital base, total invest- ment in plant and equip- ment 11) What other items of statistical interest are available? Please list in Remarks section below. 4D. What kind of information is available for the "other reporting unit?" Please specigy the kind of information present on the file for the "other reporting unit" in the space provided below. 5. What are the applications or forms which the data are derived? If possible, include the OMB (or other) form number. 6. Briefly describe the process by which this information is obtained from the individual or business(firm, employer) and procesed to the data file being described. 7. What is the purpose of the file? If the purpose is to meet specific legislative requirements, please include a citation for applicable Federal law agency regulation, or agency requirement. 8. a) Is the file a computerized version of a "paper system?" Yes No b) What year was the file first created?________________________ c) Has the file been expanded or has the data on the file changed significanlty over its history? Yes No If yes, please explain how. 9. How many individuals or businesses are represented on the file? (An approximate number only.) __________________________________ 10. What are the restrictions on the use of file? a) Legal Restrictions-- b) Administrative Restrictions-- c) Other Restrictions-- 11. If either the SSN or EIN are present on the data file, what is their purpose? 12. Is the file currently being used for statistical purposes? Yes No For example: Is the file used as a sampling frame for any surveys? Are tabulations prepared from the file that are used for statistical purposes? Please briefly describe any statistical uses of the data file. 13. How often are data collected and updated for this file? Collected Updated _ One time only _ As needed _ Annually _ Annually - Quarterly _ Quarterly _ Other, please specify _ Other, please specify 14. Please provide the name, address, and telephone number of a person who could answer questions concerning the data file (this persons need not be the same person who answers this survey). Name: ___________________________________ Address: ________________________________ ________________________________ City and State:___________________________ ___________________________ Zip Code: ________________________________ Telephone Number: _______________________ 15. Name and telephone number of person who completed this survey if different from above. Name: ___________________________________ Telephone Number: _______________________ 22 F. Appendix III.2 The CWHS Data System The Continuous Work History Sample is a system of general multipurpose statistical data files designed primarily for socioeconomic research. The system consists of samples of records of individuals with employment covered by social security. Earnings, employment and benefit data for the individual along with personal characteristics and employer characteristics are maintained at varying degrees among five basic data files and two special files that are produced in the CWHS system. This appendix describes: (1) the data sources for the CWHS system; (2) the procedures used to construct the administrative data files underlying the system; (3) the procedures used to create statistical files from the records in the administrative files; (4) the sample design used for the system; and (5) the principal data elements in each of the five basic CWHS files. The discussion refers to data and procedures predating the start of annual wage reporting in 1979 (for calendar year 1978). A discussion of the new annual reporting system is presented in Chapter V. And Chapter VII contains considerable discussion of the limitations of CWHS data. 1. Data Sources Data for the CWHS are obtained from records derived from reporting and informational forms and applications used in administering the retirement, survivors and disability programs of the Social Security Administration. The date of birth. sex and race of the person is obtained from the Application for a Social Security Number (Form SS-5). Geographic and industry information is obtained from the employer's Application for an Identification Number (Form SS-4) and other related forms that are used periodically to update this information (Form OAA-100, OAA-103 and SSA-5019). Initially, employers are assigned geographical and industry classifications based on the location and nature of business information sup. plied on the Form SS-4. Information that is not satisfactorily reported on the SS-4 is obtained through the supplemental forms OAA-100 and OAA-103. Employers who operate more than one place of business and have a total of 50 employees with at least six in a separate location are asked to use the Establishment Reporting Plan. Under this plan the employer gives SSA- a list showing the location. industrial activity and approximate number of employees of each establishment. On subsequent wage reports the employer groups his employees by establishment, identifying each group with a preassigned establishment number. The arrangement allows SSA to properly classify the employees according to geography and industry. Data on earnings and employment are derived from various reporting forms submitted by employers and self-employed persons. Prior to 1978, with the advent of annual wage reporting, taxable wages of employees were reported quarterly by regular employers on Form 941, household employers on Form 942, and State and local government employers on Form OAR-S3. Farm employers report annually on Form 943 and self-employed persons use Schedule SE of Form 1040 to report annually. (Refer to Chapter V for a discussion of the new annual reporting system). Claims and benefits information is obtained from applications and forms that are completed in the process of filing for and determining entitlement to benefits. 2. Processing Procedures--Administrative Records The demographic information (date of birth, sex and race) furnished by the applicant on the Form SS-5 is extracted after the social security number has been issued. This information is maintained on magnetic tape in a master file called the Summary Earnings Record (see Table III.1). This is the record in which the lifetime earnings and quarters of coverage of the individual is recorded for use in determining entitlement to benefits and calculating benefit amounts at the time a claim for benefits is made. The information supplied by the employer on the Form SS-4, relating to the location and nature of his business, is manually coded with geographic and industry codes. This information is key punched and maintained on magnetic tape in a master file of employers called the Employer Identification file (see Table III.2). Additionally, the information supplied on Form SSA-5019 by multi-unit employers using the Establishment Reporting Plan per- taining to the location and nature of business of each separate reporting unit, is also manually coded with geographic and industry codes and maintained in the EI file. The earnings data that are reported by employers are received and processed at SSA in a variety of ways. Hand filled paper forms that meet certain criteria are optically scanned to produce a machine-readable record, while others are keypunched. Some employers, usually having a large number of employees, report directly on magnetic tape. The reports of self-employed persons are received directly from the Internal Revenue Service on magnetic tape. After all of the earnings data is in machine-readable form with appropriate identifying information, the tapes enter a computer balancing operation in which each page of each report is checked to see that the wage items balance to the page totals provided by the employer. Out 23 of balance items are investigated and corrective action taken. Balanced items are passed on to an operation where individual items are sorted in social security number sequence and then matched to the Summary Earnings Record on number and the first six letters of the surname. Earnings amounts are added to the summary records where complete matches occur. Unmatched records are rejected for further investigation and processing. Prior to annual reporting, this processing occurred at regular intervals four times during the year. It generally takes about 9 months after the end of reference period to receive, process and update the summary earnings records with virtually all of the items for that period. Claims for social security benefits are filed in local social security district offices. Requests for earnings records and benefit computations are made by the district offices to SSA headquarters. After the earnings record is located, benefit computations are made and documentation of the claim is prepared and forwarded to the requesting office where the claim is developed and forwarded to program service centers for benefit authorization. Upon authorization of benefits, the program service center sends a notification of award to headquarters where a new beneficiary record is established in the Master Beneficiary Record file (see Table III.1). Changes to records in the beneficiary file are made through reports by the district office or program center. The Master Beneficiary Record file is used in the preparation of monthly social security benefit check records which are forwarded to the Treasury Department for payment. 3. Processing Procedures-Statistical Records Once a year after the Summary Earnings Record has been updated with virtually all of the prior year's earnings, a 1 percent sample (based on specified digits of the social security number) is extracted. This file becomes the foundation for producing the 1 percent 1937-to-date CWHS. It is used along with the prior year's CWHS, a 1 percent sample extracted from the Master Beneficiary Record file, and miscellaneous correction files to generate the required data elements for the current year's 1 percent CWHS. At the same time that earnings data for the current processing period are posted to the Summary Earnings Record, the 1 percent sample of earnings items records are written off separately on magnetic tape. The items are accumulated until all four quarters of the year have been processed. They are then summarized into one record for each employee-employer-establishment combination with quarterly earnings amounts maintained separately. The resulting records are matched to the Employer Identification file and geographic and industry codes are inserted. They are then resummarized to an employee-employer level. Cases having employment with more than one establishment of the same employer are assigned to the unit having the most activity in terms of quarters of employment. A match is the n made to a special extract from the 1 percent sample 1937-to-date CWHS containing date of birth, sex and race codes. These personal characteristics are inserted into the record to form the final 1 percent Sample Annual Employee-Employer file. Another file of the earnings items that are posted to the Summary Earnings Record, previously referred to, is written off separately for another type of processing. This is a 0.1 percent sample and is a subset of the 1 percent sample. These records are accumulated over the same time period as the 1 percent sample records and are processed along with the prior year's 0.1 percent basic file and a special 0.1 percent write off of certain data items from the current year's 1 percent CWHS file to create the current year's 0.1 percent 1937-to-date CWHS. Information for self-employed persons. coming from the Schedule SE of the Form 1040, is submitted to SSA from IRS directly on magnetic tape. After initial processing of these records in order to properly credit and post earnings to the Summary Earnings Record, the 1 percent sample records in this file are written off for statistical processing. In subsequent computer operations IRS industry codes that are in the original record are converted to SSA industry codes and addresses are converted to geographic codes through a special coding file that utilizes Zip code and place names. Correspondence is generated for cases with missing and/or incomplete information asking for the required data. The final resulting file from these operations is the 1 percent Sample Annual Self Employed file. In addition to the regular statistical processing described above, in recent years special processing has been done to generate two additional files; the First Quarter Employee-Employer- Establishment files for the 1 percent sample and a special 10 percent Sample First Quarter Employee-Employer-Establishment file. Processing for these files is similar to processing for the Annual Employee-Employer files except that it is done after all first quarter receipts have been received and posted to the summary earnings record. Record contents are virtually the same as the annual except that only first quarter data are included. The 1 percent first quarter files have been prepared for the years 1970- 76, while the 10 percent first quarter files have been produced for the years 1971, 1973, and 1975. 4. Sample Design The population from which the CWHS is selected consists of the one billion possible nine-digit social security 24 numbers. These numbers have the following digital arrangement: Area in which number assigned Group number Serial number (three digits) (two digits) (four digits) XXX XX XXXX In the issuance of social security numbers, each State is assigned one or more area numbers with the exception of a special block of numbers assigned prior to August 1963 to persons covered under the Railroad Retirement Act. Each State number, in combination with a given group number defines a stratum. The population assigned social security numbers is thus stratified geographically (by place of application for social security number) and chronologically (by the process of assigning these numbers). Each number is an el