Federal Committee on Statistical Methodology
Office of Management and Budget
FCSM Home ^
Methodology Reports ^

 

Statistical Policy Working Paper 11 - A Review of Industry Coding Systems


Click HERE for graphic.

 

 

 

                  MEMBERS OF THE FEDERAL COMMITTEE ON

                        STATISTICAL METHODOLOGY

 



                            (February 1984)



 



 



Maria Elena Gonzalez (Chair)            Charles D. Jones



Office of information and               Bureau of the Census



  Regulatory Affairs (OMB)               (Commerce)



 



Barbara A.  Bailar                      Daniel Kasprzyk



Bureau of the Census                    Bureau of the Census



(Commerce)                               (Commerce)



 



Norman D.  Beller                       William E. Kibler



National Center for Education           Statistical Reporting Service



 Statistics (Education)                  (Agriculture)



 



Yvonne M.  Bishop                       David A. Pierce



Energy Information                      Federal Reserve Board



 Administration (Energy)



 



Edwin J.  Coleman                       Thomas Plewes



Bureau of Economic Analysis             Bureau of Labor Statistics



(Commerce)                               (Labor)



 



John E.  Cremeans                       Fritz Scheuren



Bureau of industrial Economics          Internal Revenue Service



(Commerce)                              (Treasury)



 



Zahava D.  Doering                      Monroe G. Sirken



Defense Manpower Data Center            National Center for Health



(Defense)                                 Statistics (Health and



                                          Human Services)



 



Maria D.  Eldridge                      Thomas G. Staples



National Center for Education           Social Security Administration



  Statistics (Education)                (Health and Human Services)



 



Daniel E.  Garnick                      Robert D. Tortora



Bureau of Economic Analysis             Statistical Reporting Service



(Commerce)                               (Agriculture)



 



 



 



 



 



             OFFICE OF INFORMATION AND REGULATORY AFFAIRS



 



                   Christopher DeMuth, Administrator



 



             Thomas D.  Hopkins, Deputy Administrator for



                  Regulatory and Statistical Analysis



 



          Dorothy M.  Tella, Chief, Statistical Policy Office



 



                    Maria E.  Gonzalez, Chairperson



             Federal Committee on Statistical Methodology



 



                                PREFACE



 



The Working Group on Industry Coding was initiated by the



Administrative Records Subcommittee of the Federal Committee on



Statistical Methodology to review the various existing  industry



coding systems and study their relationships, comparability and



accuracy.  The report presents information on the principles and



procedures used to classify and code business establishments by



industry within the framework of the Standard Industrial



Classification (SIC) system.



 



This report  is intended primarily for Federal agencies that are



responsible for industry coding.  However, users  of data classified



by industry should also find it valuable to know more about the coding



procedures and practices that affect the quality of the data.



 



The findings and recommendations of this, report emphasize the need



for increased interagency cooperation to improve the quality and,



comparability of industry codes  and reduce the cost and respondent



burden of multi-agency coding efforts.  A permanent interagency



committee is recommended as the mechanism for coordinating



improvements in industry coding systems.



 



Implementation of the recommendations in this report will be explored



by the Statistical Policy Office.  The report does not necessarily



reflect the views of the Office of Management and Budget.



 



The working Group was chaired by Carl A.  Ronschnik, Bureau of the



Census, Department of Commerce; the Administrative Records



Subcommittee is chaired by Fritz Scheurent Internal Revenue Service.



 



 



 



                                   i



 



 



          MEMBERS OF THE ADMINISTRATIVE RECORDS SUBCOMMITTEE



                            (December 1983)



 



Fritz Scheuren (Chairman)          Daniel Kasprzyk



Internal Revenue Service           Bureau of the Census



(Treasury)                         (Commerce)



 



Faye Aziz                          Beth Kilss



Social Security Administration     Internal Revenue Service



(Health and Human Services)        (Treasury)



 



Warren Buckler                     Carl A.  Ronschnik



Social Security Administration     Bureau of the Census



(Health and Human Services)        (Commerce)



 



Paul Burke                         Bruce Levine



Department of Housing and               Bureau of Economic Analysis



Urban Development                  (Commerce)



 



David W.  Cartwright               Brian MacDonald



Bureau of Economic Analysis        Bureau of Labor Statistics



(Commerce)                         (Labor)



 



Charles Cowan                      James Millette



Bureau of the Census               Bureau of Labor Statistics



(Commerce)                         (Labor)



 



Maria E.  Gonzalez                 Douglas Sater



Office of Management and           Bureau of the Census



Budget                             (Commerce)



 



David A.  Hirschberg               Michael Searson



Small Business Administration      Bureau of Labor Statistics (Labor)



 



Susan Hostetter                    Linda Bouchard Taylor



Bureau of Labor Statistics         Internal Revenue Service



(Labor)                            (Treasury)



 



Thomas B.  Jabine                  Alan Zempel



Consultant to Committee on         Internal Revenue Service



National Statistics                (Treasury)



(National Academy of Sciences)



 



                                  ii



 



 



 



 



                            MEMBERS OF THE



                     INDUSTRY CODING WORKING GROUP



 



                    Carl A. Konschnik, Chairperson



                         Bureau of the Census



                              (Commerce)



 



                             Linda M. Dill



                    Social Security Administration



                      (Health and Human Services)



 



                            Susan Hostetter



                      Bureau of Labor Statistics



                                (Labor)



 



                           Thomas B. Jabine



                      Consultant to Committee on



                          National Statistics



                    (National Academy of Sciences)



 



                              Beth Kilss



                       Internal Revenue Service



                              (Treasury)



 



                             Bruce Levine



                      Bureau of Economic Analysis



                              (Commerce)



 



                            James Millette



                      Bureau of Labor Statistics



                                (Labor)



 



                         Linda Bouchard Taylor



                       Internal Revenue Service



                              (Treasury)



 



                              Alan Zempel



                       Internal Revenue Service



                              (Treasury)



 



 



iii



 



 



 



                            ACKNOWLEDGMENTS



 



The idea for this study grew out of the collective interest of the



members of the Administrative Records Subcommittee in looking at



industry coding issues.



 



Data for the 16 major industry coding systems reviewed were first



collected from the agencies on a questionnaire prepared by the Working



Group.  The questionnaire responses and associated documentation were



then used to prepare "system descriptions" following a standard format



developed by Thomas B. Jabine.Copies of system descriptions, which are



in A supplement to this report entitled Description of Selected



Industry Coding Systems, may be obtained from the Statistics of Income



Division, Internal Revenue Service, D:R:S, 1111 Constitution Avenue,



N.W., Washington, D.C. 20224.



 



In addition to the members of the Working Group, the following persons



contributed to the completion of the questionnaires and system



descriptions;



 



     Bureau of the Census:    Alfred R. Brand, Patricia A. Clark,



          Stanley M. Hyman, C. Harvey Monk, Walter E. Neece, Frank M.



          Hartman



 



     Bureau of Economic Analysis:  George R.  Kruer



 



     Internal Revenue Service:     Bertie Brame, John Maiden, Patrick



          Piet, Paul J.  Rose, Nathan Shaifer, Raymond Wolfe



 



     Social Security Administration: Cheryl Williams



 



The working paper was reviewed by all members of the Working Group. 



The chapters were initially drafted by:.



 



       I. Susan Hostettert James Millette



 



      II. Carl A.  Konschnik, Bruce Levine



 



     III. Linda M.  Dill, Carl A.  Konschnik



 



      IV. Thomas B.  Jabine



 



The entire Working Group provided comments to the initial drafts.  The



final wording was reviewed by the Working Group.  Maria E.  Gonzalez



met with the Working Group throughout its term.  Fritz Scheuren and



members of the Administrative Records Subcommittee provided additional



assistance and encouragement, as did members of the Federal Committee



on Statistical Methodology.



 



                                  iv



 



 



In the preparation of this working paper, substantial use was made of



the following sources:



 



1.   Farrell, M.G., Jabine, T.B., and Konschnik, C.A.



     1982 A review of industry coding systems.  Proceedings of the



          Section on Survey Research Methods, American Statistical



          Association.



 



2.   Jabine, T.B.



     1984 The Comparability and Accuracy of Industry Codes in



          Different Data Systems (in draft).  Committee on National



          Statistics.  Commission on Behavioral and social sciences



          and Education.  Washington, D.C.: National Academy of



          Sciences.



 



The second item is scheduled for publication in 1984.  Several



excerpts from it were used directly or with minor changes in Chapters



III, IV and VI of this working paper.



 



                                   v



 



 



 



 



 



                  A REVIEW OF INDUSTRY CODING SYSTEMS



 



                           Table of Contents



 



 



                                                                  Page



 



Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i



 



Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . .iv



1.   Findings and Recommendations. . . . . . . . . . . . . . . . . . 1



 



     A.   Introduction . . . . . . . . . . . . . . . . . . . . . . . 1



     B.   Code Sharing . . . . . . . . . . . . . . . . . . . . . . . 1



     C.   Standardization of industry Coding Principles  . . . . . . 2



     D.   Standardization of Coding Procedures . . . . . . . . . . . 5



     E.   Documentation. . . . . . . . . . . . . . . . . . . . . . . 7



     F.   Matching Studies and Other Research. . . . . . . . . . . . 8



     G.   Interagency Cooperation. . . . . . . . . . . . . . . . . .10



 



 



II.  Description of the Industry Coding Working .



Group Project. . . . . . . . . . . . . . . . . . . . . . . . . . . .11



 



     A.  Introduction. . . . . . . . . . . . . . . . . . . . . . . .11



     B.  Scope of the Review . . . . . . . . . . . . . . . . . . . .14



     C.  Major Uses of industry Coding Information . . . . . . . . .15



     D.  Composition and Objectives of the Industry Coding



          Working Group. . . . . . . . . . . . . . . . . . . . . . .17



     E.   Development of the Basic Documentation for the



          Federal Industry Coding Systems. . . . . . . . . . . . . .18



 



III. Industry Coding Systems and Their Relationships . . . . . . . .21



 



     A.  Introduction. . . . . . . . . . . . . . . . . . . . . . . .21



     B.  Coverage. . . . . . . . . . . . . . . . . . . . . . . . . .21



     C.  Frequency and Timing of initial Coding and Updating . . . .28



     D.  Classification System Used. . . . . . . . . . . . . . . . .28



     E.  Classification Principles . . . . . . . . . . . . . . . . .30



     F.  Information Used as input to Coding . . . . . . . . . . . .35



 



     G.   Coding Procedures. . . . . . . . . . . . . . . . . . . . .42



     H.   Description of Systems Relationships . . . . . . . . . . .48



 



 



IV.  Quantitative Information on Comparability and Accuracy. . . . .53



 



     A.   Introduction . . . . . . . . . . . . . . . . . . . . . . .53



     B.   Inter-system Macro-comparisons . . . . . . . . . . . . . .53



     C.   Inter-system Micro-comparisons: General. . . . . . . . . .55



     D.   Interagency Comparisons Between Systems. . . . . . . . . .56



     E.   Intra-agency Comparisons Between Systems . . . . . . . . .63



     F.   Data on industry Coding Error in Individual Systems. . . .72



 



     V.   References . . . . . . . . . . . . . . . . . . . . . . . .89



     VI.  Selected Source Documents and Instructions . . . . . . . .95



          



 



                                  vi



 



 



                            List of Tables



Table                                                             Page



 



1    Selected Characteristics of Industry Coding 



     Systems Reviewed. . . . . . . . . . . . . . . . . . . . . . . .23



 



2    Coverage of Industry Coding Systems Reviewed 



     by SIC Division . . . . . . . . . . . . . . . . . . . . . . . .27



 



3    Interagency Transfers of Industry Codes . . . . . . . . . . . .51



 



4    Results of independent Coding of Establishments by Census and



     SSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7



 



5    Summary of Errors as a Result of Reconciling BES and Census



     Records on Delaware Retail Payroll in 1963. . . . . . . . . . .59



 



6    Results of Comparison between Final Industry Codes and SSA-based



     Mailing List Codes: 1963 Economic Censuses. . . . . . . . . . .62



 



7    An Analysis of 1947-1949 Code Changes for 500 Single-Unit



     Establishments in Manufacturing . . . . . . . . . . . . . . . .64



 



8    Indexes of Shift for In Scope and out of Scope of Retail Trade by



     Kind of Business. . . . . . . . . . . . . . . . . . . . . . . .66



 



7    Major Kind-of-Business Cross-Classification of Group 1 Retail



     Trade Establishment Sales in Census and in Current Survey: United



     States, 1958. . . . . . . . . . . . . . . . . . . . . . . . . .67



 



10   Differences between IRS Master File and Statistics of income



     (SOI) Industry Classification by SIC Division and Type of



     organization. . . . . . . . . . . . . . . . . . . . . . . . . .69



 



11   Percent of IRS Master File Codes Agreeing with SOI Codes, by Type



     of Organization and Level of Detail . . . . . . . . . . . . . .71



 



12   Agreement of IRS Master File Codes with SOI Codes at Major



     Industry Level for Corporations: Tax Years 1972 and 1973. . . .72



 



13   IRS Statistics of income Program.  Number of Incompletely



     Classified Returns by Industry 



     Division and Type of organization . . . . . . . . . . . . . . .75



 



14   Reasons for industry Code Differences between Initial and Recheck



     Surveys: Retail Trade Surveys, 1948 . . . . . . . . . . . . . .77



 



 



                                  vii



 



 



 



 



                                                                  Page



 



15   Indexes of inconsistency for Selected Major



     industries: 1960 and 1970 . . . . . . . . . . . . . . . . . . .79



 



16   Comparison of SIC Codes Based on Census Questionnaires



     with those Based on Administrative Records:



     1977 Economic Censuses. . . . . . . . . . . . . . . . . . . . .81



 



17   Evaluation of Published Statistics for Nonemployers



     in Contract Construction: 1977 Census . . . . . . . . . . . . .82



 



18   Single-Unit Establishments in the SSEL with



     Current Year Payroll by SIC Division and



     Source: 1981. . . . . . . . . . . . . . . . . . . . . . . . . .85



 



 



                                 viii



 



 



 



                               CHAPTER I



 



                     FINDINGS AND RECOMMENDATIONS



 



A.   Introduction



 



     This section presents the findings and recommendations of the



Industry Coding Working Group.  The recommendations are based on two



goals:



 



     1.   To improve the quality and comparability of industry codes



          for all of the data systems reviewed by the working Group;



          and



 



     2.   To reduce the overall cost and respondent burden associated



          with initial industry coding and updating of codes for these



          systems.



 



     Meeting these objectives requires increased interagency



cooperation in the areas of standardization and code sharing (the



transfer of industry codes,for individual establishments or other



economic units from one data system to another).  With respect to



these two areas, the Working Group found that:



 



     Significant improvements in quality and comparability of industry



     coding can be achieved by increased standardization of coding



     principles  and procedures; however, a substantial increase in



     code sharing between agencies is needed to achieve the best



     results.



 



B.   Code Sharing



 



     Chapter III of this report describes the differences found by



the Industry Coding Working Group in coding procedures, source



documents, procedures for updating codes, and other features of the



systems reviewed.  These differences, which result in part from cost



and respondent burden limitations, cause differences in the industry



codes assigned to individual units.  This applies both to statistical



data systems and to systems developed primarily for administrative



purposes.  Chapter IV presents quantitative evidence, from several



studies, of differences resulting from system variations.



 



     At present there are few transfers of industry codes between



agencies.  The primary transfers are from the Social Security



Administration (SSA) and the Internal Revenue service (IRS) to the



Census Bureau for use in the latter's economic statistics programs. 



(See Table 3 on page 51 for details.) The Working Group recommends



that:



 



                                   2



 



     Agencies whose systems have been reviewed should expand industry



     code sharing to improve the quality of codes and to reduce code



     differences between systems.



 



     Increased code sharing between agencies should lead to more



comparable and accurate industry codes in major Federal and



Federal/State cooperative data systems.  Initially, there would be a



significant cost to develop a system to match units in different



agency files and to deal with those cases in which the industry codes



or the units fail to match.  However, once these processing systems



were established, considerable savings could be realized by cutting



back on independent data collection activities for assigning and



updating industry codes.  Currently various agencies collect similar



information from the same respondents for use in determining industry



codes.  Thus the beneficial impact "of code sharing between agencies



on both respondent burden and cost should be extensive.



 



     To implement the recommendation for code sharing fully will



require changes in the confidentiality laws currently governing the



Federal statistical community.  Except for a few specific cases,



agencies may not, under current law, disclose individually



identifiable microdata outside their own agency.



 



C.   Standardization of industry Coding Principles



 



The Working Group found that the agency coding systems



reviewed all based their classification systems on the current version



of the SIC Manual, but that each of the systems departs from it in



some respects.  The nature of these departures from the SIC Manual is



described in Chapter III of this report.



 



     It is not clear that all systems would be in a position to follow



the principles of the ST exactly in every respect.  Administrative



requirements and resource limitations may sometimes preclude this. 



Nevertheless, the Working Group believes that greater adherence to



these principles is feasible in most cases, and recommends that:



 



     All Federal and State agencies cooperating in Federal statistical



     programs that classify economic units (establishments or



     reporting units) by industrial activity should, to the greatest



     extent possible, follow the classification principles contained



     in the 1972 Standard Industrial Classification (SIC) Manual as



     amended by the 1977 Supplement.



 



     Agencies using the SIC Manual as the basis for assigning industry



codes to establishments or reporting units should adhere to the



following recommendations on specific classification principles.  The



specific recommendations do not necessarily apply for classifying



enterprises or similar units.





 



                                   3



 



     1.   The basic business unit should be the establishment as



          defined in the SIC Manual.



 



     The establishment is normally an economic unit at a single



physical location and engaged in one, or predominantly one, type of



economic activity.  Special rules apply where two or more distinct and



separate activities are carried on at a common physical location.



 



The SIC manual is intended for assigning codes to establishments. 



However, some agencies assign codes to similar but somewhat



differently defined units-- reporting units.  As along range goal,



these agencies should attempt to redefine their reporting units so  



that they are consistent with the establishment definition.



 



     2.   To the extent possible, all units should be classified by 4-



          digit SIC industry, using all of the industries included in



          the current SIC Manual.



 



     Most of the systems reviewed come close to following the SIC



structure in the Manual, but use groupings of SIC industries in a few



instances.  Some aggregation occurs to avoid disclosure of individual



establishment data.  Some occurs because experience in some agencies



shows that for certain industries adequate reporting records are not



available on an industry-wide basis.  Since different agencies



aggregate for different reasons, varying groupings of industries



result.  Comparability of data by industry would be improved if



participating agencies used all of the 4-digit SIC codes or could



agree on and use a standard set of codes for grouped industries.



 



     This recommendation is not intended to preclude the use of



additional classifiers for the same units.  However, classifiers such



as those used for administrative or tax purposes should be clearly



distinguished from codes based on the SIC.  The assignment of SIC's



should not be altered or controlled in any way by the assignment of



such additional codes.  Some agencies, primarily the Census Bureau,



assign industry codes in greater detail than provided by 4-digit SIC



codes.  This practice is acceptable as long as the detailed



classifications are defined within 4-digit industries.



 



     3.   When an establishment or reporting unit has multiple



          activities, the SIC code should be determined according to



          the principles outlined in the SIC Manual.



 



     This recommendation implies, among other things, that the



treatment of multiple activities be based on the variables recommended



in the SIC Manual to measure the relative importance of each activity



and that 4-digit SIC codes be assigned to each



 



                                   4



 



activity of the establishment.  Also it is necessary to assign a



percent of total value for each activity for which a 4-digit SIC was



determined and then group activities with the same 4-digit SIC's and



sum the percent values.  The establishment's classification would then



be the 4-digit SIC with the greatest percent of total activity.



 



     4.   Information that identifies Central Administrative offices



          (CAO's) and auxiliary units must be collected and reviewed



          to ensure accurate determination of 4-digit industry codes. 



          All systems should incorporate this information.



 



     As stated in the SIC Manual, a CAO is an establishment primarily



engaged in management and general administrative functions performed



centrally for other establishments of the same company.  An auxiliary



unit is an establishment primarily engaged in performing supporting



services for other establishments of the same company rather than for



the general public or for other business firms.  Both CAO's and



auxiliary units should be classified according to the primary 4-digit



industry activity of the operating establishments they serve.



 



     Additional classification codes describing the type of function



performed also should be standardized.  The Working Group recommends



that agencies responsible for industry coding adopt a uniform set of



auxiliary codes for the classification of CAO or auxiliary activities



for use in their systems.  The codes would delineate activities such



as central administration; research and development; warehousing; data



processing; and repair shops.



 



     5.   Agencies should work together to arrive at consistent



          solutions to two problems generally encountered in



          classifying government operations-- determining ownership



          and distinguishing between operating and administrative



          operations.



 



     Many activities are quasi-government and the distinctions between



government and private industry are often unclear.  Most agencies have



guidelines for determining ownership that follow the SIC Manual



concept of "owned and operated".  However, very little coordination



and sharing of the interpretation of the rules have occurred. 



Developing a system for sharing and comparing concepts would foster



consistency among agencies.



 



     The Public Administration division of the SIC Manual includes



"...the legislative, judicial, administrative and regulatory



activities of Federal, State, local and international



governments." However, the government owned and operated



establishments outside of public administration properly should



be classified according to the activities in which they are



 



 



                                   5



 



engaged.  Coordination and cooperation among agencies should enhance



systematic identification and reporting according to these standards.



 



D.   Standardization of Coding Procedures



 



     This section presents recommendations to improve and standardize



coding procedures used by the systems to implement industry coding



principles.  Coding procedures considered most important are those



that relate to the use of source documents, quality assurance training



for coders, and resistance principles.



 



     Chapter III of this report describes source documents used by



each of the systems reviewed.  These source documents vary both in the



level of detail requested and the format and wording of the items



included.  This variability has clearly contributed to differences



between the systems.  Chapter VI contains examples of source



documents.



 



     Although it was beyond the scope of this Working Group to develop



specific questionnaires or standards for questionnaires, the Working



Group recommends that:



 



     1.   Agencies that do industry coding should work together to



          increase the uniformity of product, activity and related



          questions used in their source documents



 



     The Working Group believes that accurate 4-digit industry coding



requires questions specifically tailored to SIC division level and for



some intermediate groupings of 4-digit industries.  Since some



agencies may not have the need or resources to use forms designed for



specific industry groups, the Working Group suggests the development



of two kinds of model source documents: a set for specific industry



groups and an abridged general purpose version.  Separate versions for



initial coding and updating are also suggested.



 



     The development of standardized source documents should be based



on thorough research.  The Working Group's recommendations for



research on source documents are given in section F of this chapter.



 



     This report provides some information on Quality Assurance in



Chapter III.  However, most of the agencies reviewed had limited



information on specific quality assurance measures used for their



systems.  The systems reviewed show considerable variation in the



scope and intensity of procedures for maintaining and improving the



accuracy of industry codes.  The Working Group recommends that:



 



     2.   Each agency should review the procedures it uses to assure



          the quality of industry coding and should try to upgrade



          them where needed.



 



                                   6



 



     Because technology (both in industries upon which codes are based



and in the processing and procedures used by agencies when assigning



codes) is changing rapidly, the Working Group suggests that one or



more interagency workshops be organized to discuss new developments in



industry coding and to promote the exchange of information on coding



procedures.  Workshops should cover computerized coding (coding based



on verbal descriptions or on quantitative product and service data),



computer-assisted coding from activity descriptions, and computer



consistency checks.  Methods of reducing agency cost and respondent



burden also should be examined.



 



     The Working Group found that agencies doing industry coding did



not have formal training programs for coders in some of their systems. 



SSA provides extensive formal training for new coders in their single-



unit employer identification (EI) file system.  This is followed up by



on-the-job training and close quality review.  The Census Bureau



provides training for large groups of coding technicians during the



economic censuses, and ,the Bureau of Labor Statistics (BLS) provides



an ongoing training program for all State coding technicians. 



However, for some systems more on-the-job training and less of a



formal program is used.  The Working Group recommends that:



 



     3.   Agencies should provide periodic training



          based on recommended coding course principles 



          and procedures for their SIC coders.



 



     Such courses should include solutions, preferably those agreed



upon by an interagency group, to coding problems arising from the



development of new industries and from changes in existing industries.



 



     Resistance principles generally take prior industry codes, and



related data into account in determining a current code.  The purpose



of using them is to.  avoid erratic shifts back and forth from one



industry to another and, in sample-based systems, to help control



sampling variability.  Lack of uniformity in the use of resistance



principles has been one of many causes of industry classification



differences between systems.



 



     The Working Group found that resistance principles, while



frequently employed in the systems reviewed, were poorly documented



and inconsistent among agencies.  Therefore, the Working Group



recommends that:



 



     4.   Agencies that apply resistance principles in updating



          industry classifications should collaborate to develop



          uniform guidelines for application of these principles.  The



          rules used for resistance coding should be documented and



          made readily available.



 



                                   7



 



E.   Documentation



 



     A major accomplishment of the Working Group has been the



collection of detailed documentation on the characteristics of



industry coding systems and source documents used for SIC coding. 



System descriptions developed by members of the Working Group with the



help of Other agency personnel include information about: the basic



coding unit, the industry classification principles followed, the



source document used, the coding procedures, the volume and timing of



coding, the quality measures associated with the coding, the general



characteristics of the file in which the codes reside, the timing and



methods for updating codes, planned changes to the coding system, and



the uses and users of the industry codes.  (A collection of these



systems descriptions is available as a supplement to this report



(Internal Revenue Service, 1984).)



 



     This information serves as an essential tool for understanding



the content of each system and the data produced from it.  Therefore,



the Working Group recommends that:



 



          1.   Complete documentation for coding systems included in



               this study should be updated at least every five years. 



               Additionally, major changes occurring in any agency



               system should be documented and the information updated



               promptly.



 



          2.   All coding principles used by an agency



               principle which is either in addition to or contrary to



               those currently in the SIC Manual should be clearly



               described in agency publications that provide data by



               industry.



 



          3.   Coding rules embedded in programs for computerized



               coding systems should be fully documented in a form



               that makes them accessible to data users.



 



          4.   Results of quality control checks and



               evaluation studies of manual and computerized coding



               operations should be systematically documented and made



               available to users.



 



     The Working Group believes that agencies should adhere to certain



standards for internal documentation.  For example, cumulative files



that contain industry codes should show the date of the most recent



review and update for each unit and, where relevant, the source.  in



some cases it may be desirable to show more than one source code to



avoid unnecessary restrictions on access.  An agency may have data of



its own and from other agencies, with differing restrictions on



access.  All data



 



                                   8



 



sources should be identified to avoid unnecessary restrictions on



release of codes to other agencies for statistical purposes.



 



P.   Hatching Studies and Other Research



 



     Chapter IV documents several matching studies.  Generally.y, the



findings of such studies have led to improved methodology within the



matched systems, greater awareness of the need for interagency



cooperation, and a better understanding of the impact of differences



in economic data used for policy determinations.  in addition,



matching studies provide information on the feasibility of code



sharing and supporting evidence for the importance of code sharing. 



Most major matching studies were conducted more than 10 years ago. 



The Working Group recommends that:



 



     1.   interagency microdata matching studies be conducted as a way



          of investigating the feasibility of code sharing and of



          quantifying differences between the systems.



 



     Matching studies should compare industry codes, along with



selected data items such as employment, geographic location, and



payroll, for units which match between agency files.  The Working



Group suggests that the studies first establish a sound matching



process in areas with a high degree of agreement and comparability. 



Using matching processes identified as successful, a study should then



focus on areas where classification is known to be especially



difficult, such as wholesale and retail trade.  Once differences are



quantified, the agency specific procedures that cause the differences



should be identified and improved.



 



     A current interagency group, the Employer Reporting Unit Match



Study (ERUMS) Working Group, has done initial planning for a micro-



record matching study to compare the statistical characteristics of



the Social Security, BLS, and IRS systems.  The ERUMS Working Group



will examine the effects of the variations between agencies in



defining the reporting unit.  Currently, expectations are that a



sample covering 400 employer identification (EI) numbers from one



state will be selected from Unemployment insurance (UI) records.  ADP



and manual matching techniques will be used to match these units with



those in SSA and IRS for the same EI's.  A natural by-product of the



study will be a comparison of the industry codes for matched units. 



The ERUMS Working Group expects to gain useful information about the



kinds of problems that must be solved to match records from different



economic data systems.



 



     While documenting facets of the various industry coding systems,



the Working Group made no attempt to judge the relative merits of any



specific form, procedure, unit identification or updating method.  All



of the source documents and procedures used



 



                                   9



 



by these cooperating agencies lend themselves to research studies



aimed at identifying benefits and limitations.  Chapters III and IV of



this paper discuss in some detail specific forms, procedures, levels



of industry coding, frequency of updating information used to obtain



codes, and other details of each system.  Based upon the review of



these source documents, the Working Group recommends that:



 



     2.   Research studies and tests be conducted with a view toward



          establishing the most effective source documents for SIC



          coding as standards.



 



     3.   Tests and research be conducted on current and new methods



          and procedures for industry coding.



 



     Tests and studies with varying sets of questions designed to



elicit the nature of business activity should be cooperative ventures



among agencies.  Results of tests should be used to establish the most



effective version as a standard.  Since not all agencies can collect



detailed information for use in industrial classification, the goal



should be to develop standard questionnaires with at least two levels



of detail.



 



     A research project testing the verification method of SIC



updating has been initiated by BLS (Hostetter, 1983).  This method



utilizes a form containing a description of the four-digit SIC



industry in which a particular employer was most recently classified. 



The form requests the employer to verify the industry description as



an accurate indicator of his primary economic activity.  If correct,



the employer simply checks the appropriate box, answers some other



questions on ownership, auxiliary status and multi-establishment



status and returns the form.  This reduces both respondent burden and



staff time, since forms checked as correct need not be reviewed to



assign an industry code.  If the industry description does not



correctly describe the economic activity, the employer then is asked



to provide a detailed product and activity statement so that the



correct classification can be determined.  Currently, BLS has



contracted with five State employment security agencies to conduct



independent but identical quality measurement surveys testing the



validity of the verification method of refiling.



 



     The Census Bureau has introduced computer-assisted coding and is



currently researching and refining the process.  Although computer-



assisted coding and updating codes by verification both have potential



for enhancing SIC coding, the Working Group does not endorse wide use



of either method until testing and results substantiate their



effectiveness.



 



     Additional cooperation among agencies on methodological research



would allow progress toward standardization of all facets of industry



coding.  Even where standardization is not



 



                                 10



 



possible, such research could produce detailed documentation of



differences in data stemming from specific methods or procedures. 



This should prove useful to users who combine or compare data from



different sources.



 



G.   Interagency Cooperation



 



     Increased interagency cooperation is essential for significant



progress toward the goals stated at the beginning of this section:



improvements in the quality, comparability and efficiency of industry



coding systems.



 



     The OMB Statistical Policy Office's Technical Committee on



Industrial Classification is devoting most of its attention to



planning for the SIC revision scheduled for 1987, with somewhat less



attention to the other important aspects of industry classification



and coding.  The Working Group recommends that:



 



     The activities relating to industrial classification and coding



     listed below should be undertaken either by the OMB Technical



     Committee on Industrial Classification or by another permanent



     interagency committee established for this purpose:



 



     1.   Regular meetings to discuss and resolve coding problems



          caused by the development of new industries and changes in



          the structure of existing industries.  Interim solutions,



          pending revision of the SIC, should be agreed on and adopted



          by all of the participating agencies.



 



     2.   Promotion, support and coordination of other relevant



          activities along the lines recommended elsewhere in this



          chapter.



 



     Some examples of how this continuing committee might operate



include: periodic updating of the industry coding system descriptions



prepared by the Industry Coding Working Group; conducting interagency



workshops for sharing information about new coding methods and



procedures and about materials and methods used to train coders;



promoting greater uniformity in source documents used for SIC coding;



coordinating and facilitating interagency matching studies; developing



standards for partial coding and for grouping 4-digit industries; and



developing standards for resistance coding..



 



     In addition to leadership from the Statistical Policy office of



OMB and any interagency groups established for these purposes,,



progress on these recommendations will require full cooperation from



agencies that produce and use data classified by industry, as well as



those that control administrative record sources.  from which industry



codes are developed.



 



 



                                  11



                                   



                              CHAPTER II



 



       DESCRIPTION OF THE INDUSTRY CODING WORKING GROUP PROJECT



 



A.   Introduction



 



     Under the auspices of the Administrative Records Subcommittee



of the Federal Committee on Statistical Methodology, the Industry



Coding Working Group reviewed industry coding systems used by Federal



agencies to classify establishments and other economic units for



statistical purposes.  The objective of this interagency working



grout) was to review and document the existing industry coding systems



with a view toward ultimately improving the comparability and quality



of data classified by industry.  This report describes the activity of



the Working, Group and presents some findings and recommendations.



 



     By industry coding systems here we mean the methods and



procedures for assigning industry codes, rather than the technical



aspects of constructing a classification framework and



numbering scheme within which economic units will be assigned



industry codes.     Moreover, the term "industry code" is used in a



generic sense; it refers to the codes actually used in each



system, which are not always equivalent to the four-digit industry



codes in the Standard industrial Classification (Office of Management



and Budget, 1972).  The coding systems reviewed generally conform to



the SIC, but all are at variance with it to some degree.



 



     The Working Group's effort was responsive to two recommendations



made by a predecessor group, the Subcommittee on Statistical Uses of



Administrative Records, which also worked under the auspices of the



Federal Committee on Statistical Methodology.  In its final report



(office of Federal Statistical Policy and Standards, 1980), that



Subcommittee recommended that:



 



     The quality of administrative records to be used for statistical



     purposes should be evaluated systematically to determine the



     appropriateness of the records for the proposed use.



 



     Consistent procedures should be used in administrative and



     statistical data collection efforts for defining reporting units,



     identifying and coding reporting unit characteristics, and



     developing standards for data tabulation.



 



     These recommendations apply with particular force to industry



classification and coding, where the information sources are many and



of varying quality.



 



                                  12



 



     In order to get some idea of the magnitude of the industry code



assignment by the Federal government, consider the following. 



Annually, the Internal Revenue Service (IRS) assigns industry codes to



nearly 16 million business units as part of its revenue processing of



the tax returns.  Additionally, more than 200,000 units are coded for



the IRS Statistics of Income Program.  Similarly, the Social Security



Administration (SSA) assigns industry codes to over 900,000 new



business units each year, with most of these (an estimated 875,000)



coded in the Single-unit Employer Identification (EI) File coding



operation.



 



     As part of the Employment Security Program, the Bureau of Labor



Statistics (BLS) maintains an industry-coded file of about



4.8 million units.  Each year about 500,000 new units are coded,



and codes are reviewed annually and updated, where appropriate,



for about one-third of the existing units.



 



     At the Census Bureau, as part of the annual Company organization



Survey, over 900,000 establishments of multi-unit firms have their



codes reviewed, and changed if appropriate, while about 75,000 new



multi-unit establishments are industry coded.  In addition to this,



about 50,000 new business births are coded each year.  For the



quinquennial economic censuses, the Bureau mails census forms covering



about half of the total universe of 6.7 million establishments in



scope to the censuses.  Responses to items included on the census



forms are used to assign current industry codes to these



establishments.  Also, as part of the censuses, another 200,000 or so



unclassified establishments are coded via a classification form



mailing.



 



     The figures just cited account for a substantial percentage of



the volume of industry coding done by, or under the auspices of, the



Federal government.  However, this is not the whole picture, as can be



seen f rom Table 1 on page 23, where coding volume figures (from



columns (9), (10), and (11)) are given along with other data.



 



     No attempt has been made in this work to quantify the substantial



costs associated with industry code assignment.  This would indeed be



difficult, since the industry coding is a necessary (and in many



instances a relatively small) component of the overall administrative



or statistical work which is being



done concurrently.



 



     Inconsistent industry classification of identical or overlapping



populations of economic units by different agencies has led to



problems of comparability for analysts and other users who try to



compare and combine data from different agency sources.  One example



of this is in the area of productivity measurement.  A recent report



on this subject (National Research Council, 1979) said that "A major



problem with the comparability of the basic data has been that



different agencies assign the same establishments to different



industry classifications, as a consequence,, aggregated data at the



industry level are not in fact comparable



 





                                  13



 



from agency to agency" (p. 178).  Similar problems occur in connection



with the preparation of the national income and product accounts, in



manpower studies, in the development of a data base for small



businesses, and in other uses of economic statistics.



 



     Several review groups have examined these problems (for example,



the Central Statistical Board, 1939; the Hoover Commission, 1949; the



President's Commission on Federal Statistics, 1971; the National



Research Council, 1979; and the General Accounting office, 1979). 



Without exception, they have recommended creation of a central listing



of establishments and other economic units, classified by industry,



which would be available to Federal and possibly State agencies for



statistical purposes.  The Census Bureaus Standard Statistical



Establishment List (SSEL) was in fact developed for this purpose, but



existing statutory restriction--, on the release of Census Bureau



information have so far made it impossible for other agencies to use



the SSEL, except in a very limited sense.



 



     At the technical level, several studies of relationships between



reporting unit definitions and industry coding practices in different



agency systems were undertaken by interagency working groups, under



the general direction of the Office of Statistical Standards of the



Bureau of the Budget, in the early 1950's.  Several of these studies



which were begun in an attempt to account for observed discrepancies



between manufacturing employment totals from the 1947 Census of



manufactures and the BLS's Current Employment Statistics, involved



matching individual reports for selected companies and establishments. 



These studies identified numerous problems that often impaired uniform



reporting, many of which were solved by the working groups or referred



to the Office of Statistical Standards SIC Technical Committee for



action.  The work during this period showed that significant progress



toward comparability could result from carefully conducted studies of



the coding principles and procedures used by different agencies and



their application to particular units (Bureau of the Budget, 1961).



 



     Since that time, however, there does not seem to have been any



comprehensive and detailed technical review of the existing industry



coding systems: their coverage, the classification principles



followed, the coding procedures, and the uses of the industry codes



assigned and of aggregate data classified by these codes.



 



     The findings from the present review, the Working Group believes,



will suggest changes in individual systems that can lead to



significant improvements in quality and to greater comparability



between systems.  Also, these findings suggest advantages from new



code sharing arrangements where these are permitted by law.  Some



gains can be realized even if there are no new exchanges of codes



between agencies (for exchanges at present, see Table 3 on page 51). 



For example, the applicability



 



 





                                  14



 



of shared software for computer assisted coding could be evaluated. 



Should future legislation permit the establishment and general use of



a central list for statistical purposes, the Working Group's findings,



suitably updated, should assist the implementation process.



 



B.   Scope of the Review



 



     The following 16 coding systems have been included in the



Working Group's review:



 



     1.   Bureau of Economic Analysis (BEA) System



 



          -- Direct Investment Statistics



 



     2.   Bureau of Labor Statistics (BLS) System



 



          -- Employment and Wages Program (ES-202 Report)



 



     3.   Bureau of the Census Systems



 



          --   Agriculture Census



          --   Business Births



          --   Company Organization Survey



          --   County Business Patterns



          --   Economic Censuses



 



     4.   Federal Trade Commission (FTC) System



 



          --   Quarterly Financial Report 1/



 



     5.   Internal Revenue Service (IRS) Statistics of Income (SOT)



          Systems



 



          --   Sole Proprietorships



          --   Partnerships



          --   Corporations



 



     6.   Internal Revenue Service (IRS) Administrative Systems



          (Revenue Processing)



 



          --   Sole Proprietorships



          --   Partnerships



          --   Corporations



 



 



1/ Responsibility for publishing the Quarterly Financial Report



was transferred to the Census Bureau in late, 1982.  However,



throughout this paper all references to the FTC system or



 



Quarterly Financial Report apply to the time period before the



transfer.



 



 



                                  15



 



          7.   Social Security Administration (SSA) System:;



 



               --   Single-unit Employer identification (EI) Pile



               --   Multi-unit EI File



 



     The systems selected for review include some used only for



statistical purposes (e.g., all Census systems) and some that are used



for both statistical and non-statistical purposes (e.g., the IRS



revenue processing systems).  All of the systems assign codes to



establishments or other economic units; systems that assign industry



codes directly to individual workers were not included.  most of the



systems reviewed have broad coverage in terms of Standard industrial



Classification (SIC) divisions; however, there are some exceptions,



such as the Agriculture Census system.  All are of a more or less



permanent character, i.e., the universe or a sample of it is coded



periodically, or the coding is continuous in support of accretions or



changes to a cumulative file.  Most systems have a relatively large



volume of coding, and together they are believed to account for a



substantial proportion of the industry coding of establishments and



other business units that is done by the Federal government and by



State agencies under Federal-State cooperative programs.



 



     It was necessary to distinguish between an industry coding system



and the principal file in which the codes reside.  To illustrate this,



generally, industry codes assigned to establishments by the Census



Bureau are placed in the Standard Statistical Establishment List



(SSEL).  (Industry codes assigned to agriculture establishments during



the agriculture census processing are not placed in the SSEL, while



those assigned to agricultural services establishments are.) However,



the separate industry coding activities done at various times and



based upon different source documents are treated as separate industry



coding systems.



 



C.   Major Uses of Industry Coding Information



 



The statistical uses of administrative records are well



Documented in Statistical Policy Working Paper 6 (Office of Federal



Statistical Policy and Standards, 1980).  These uses range widely from



the basic publication of statistics describing economic or demographic



phenomena to being used as components in the formulation of complex



mathematical models.



 



     In general, industrial classification was developed for



classifying an establishment by the activity in which it is primarily



engaged.  The presence of industry codes can facilitate the



collection, tabulation, presentation and analysis of data as well as



promote uniformity and comparability of data series.



 



     The Federal Government uses industry codes as a means of



aggregating much of the administrative and statistical data it



collects for publication.  Some examples of the regular publication of



descriptive statistics by industry from primary data sources include:



 





                                  16



 



 



     -    Quarterly Financial Report _for Manufacturing,_Mining and



          Trade Corporations by the Federal Trade Commission (FTC).



 



     -    Corporation Income Tax Returns, Sole Proprietorship)



          Returns, and Partnership Returns by the Internal Revenue



          Service (IRS).



 



     -    Census Bureau publications such as County Business Patterns



          and the results of the economic censuses.



 



     -    Employment and Earnings and Employment and Wages by the



          Bureau of Labor Statistics (BLS).



 



     There are other data series published that have been synthesized



from several primary data sources.  The Bureau of Economic Analysis



(BEA) , for the most part, does not collect information directly from



firms or individuals.  BEA's estimates of current economic activity



are based on data obtained from other agencies.  The Gross national



Product, which is presented with industry detail, combines data from



many sources including the Census Bureau, IRS, BLS, and FTC.  The



Input-Output Accounts of the U.S.  are composed entirely of industry



information collected by others.  BEA's estimates of State and local



area personal income involve the use of several sets of data



aggregated by industry.  BEA is thus heavily dependent on the



comparability of data from its various sources.



 



     In addition, both published and unpublished sets of industry -



based data are useful for the collecting agency's internal programs. 



For example, various units of the Department of Labor use BLS data for



purposes such as:



 



     -    studies of financial aspects of the Unemployment Insurance



          program are conducted to set maximum weekly benefit levels.



 



     -    States use industry wage and employment data in preparing



          forecasts of program workloads that are used in developing



          annual budgets.



 



     -    Local area workforce and unemployment statistics are



          produced by industry which enables classification of areas



          eligible for benefits under a number of Federal area



          assistance programs.



 



     -    Employment figures are useful in time-series analysis and in



          the study of seasonal employment, and are used extensively



          in industry/area comparisons.



 



 



1/ Responsibility for publishing the Quarterly Financial Report was



transferred to the Census Bureau in late 1982.



 





                                  17



 



 



     -    The data serve as a base for labor market information



          programs at the county, labor market area, State and



          national levels.



 



     Industry codes from some administrative or statistical record



systems are helpful in the processing and tabulation of raw data in



other record systems.  The Social Security Administration (SSA)



assigns industry codes to new firms applying for an employer



identification number.  A major use of these codes is for identifying



industrial activity for workers included in the Continuous Work



History Sample (CWHS).  These codes are also released to the Census



Bureau for incorporation into their standard Statistical establishment



List.  Reciprocally, on some past occasions, the Census Bureau has



provided SSA with updates of industry codes for employers based on the



results of the economic censuses.



 



     Some data producers can use the industry codes from other systems



as a tool to edit aggregated tabulations.  BEA, for example, receives



industry codes from FTC and IRS for individual corporations which help



to explain changes in their estimates of components in the National



income and Product Accounts.



 



     There are other uses that governmental units make of the industry



information that they can obtain from data producing



agencies. The IRS, for instance, releases its industry coded



Statistics of Income (SOI) files to the office of Tax Analysis



and to the Joint Committee on Taxation for use in "tax models" to



evaluate the effects of existing or proposed tax policies.



 



     Nongovernment groups such as businesses and nonprofit



organizations use industry information from administrative and



statistical sources as well.  While confidentiality restrictions



prohibit the transfer of individual industry codes outside the



government (except to contractors of government agencies), aggregated



statistics based on industry can be quite useful.  Business firms can



conduct research to classify and study the industrial profiles of



their customers and suppliers.  Sales patterns can be analyzed, market



potentials can be estimated and commercial strategies can be



evaluated.



 



     The industry dimension of administrative and statistical data is



one of their most interesting and useful characteristics.  It enables



the government to improve and evaluate many of its programs.  It



enhances the research efforts of both public and private groups and it



is very helpful to individuals in gaining understanding of the



economic and demographic characteristics of the nation.



 



D.   Composition and objectives of the Industry Coding Working Group



The Working Group members (see list in preface) were in some



cases members of the parent subcommittee or were designated by



 



                                  18



 



 



the subcommittee representative or their agency.  Working Group



members met for the first time in May of 1981 and have conducted



meetings, generally monthly, throughout 1982 and 1983.



 



     From the outset the Working Group felt that a fundamental task



was to review and document the major industry coding systems.  Once



this was accomplished, analysis and comparison followed, leading to



the proposals for improvements in the comparability and quality of the



industry codes which appear in Chapter I .  As a further application



of this work, a user or potential user of data classified by industry



can be provided with essential information concerning the usability



and relative quality of the data.



 



E.   Development of the Basic Documentation for the Federal Industry



     Coding Systems



 



     The Working Group constructed a questionnaire on industry



coding which requested basic information needed to compare and assess



the systems.  This questionnaire covered the following main areas:



 



     -    The basic coding unit (the unit to which an industry code is



          assigned), the source or source document from which tile



          coding is done, and the industry classification system use];



 



     -    The volume, timing, coding procedures, resource material



          used, and quality measures associated with the coding;



 



     -    General characteristics of the principal file(s) in which



          the codes reside;



 



     -    Updating of the codes and recent or planned changes to the



          coding system;



 



     -    The uses and users of the industry codes.



 



     Within each of these areas specific questions were asked.  Also,



related documentation was requested, principally the forms or source



documents from which the coding is done, code lists and instructions



concerning classification system variations, and any available data



bearing on the quality of the coding.



 



     Members of the Working Group identified industry coding systems



within their own agencies which fit into the scope of the review.  At



the same time, they identified key persons who were most knowledgeable



about each coding system.  The survey questionnaires were then



delivered to these respondents by the Working Group members.



 



     Each completed questionnaire was reviewed by one or more members



of the Working Group and a meeting was arranged with the respondent



for clarification or further information.  As a result



 





                                  19



 



of the meeting, the questionnaire was revised, and frequently



additional documentation of the system was obtained.



 



     A summary system description was prepared from each questionnaire



and the associated materials.  These descriptions Are designed to put



the collected in  formation in a standardized, concise format for easy



reference, comparison, and analysis.  These summary descriptions form



the basis of this report.  Copies of system descriptions may be



obtained by contacting the Statistics of income Division, Internal



Revenue Service.



 





                                  21



 



                              CHAPTER III



 



            INDUSTRY CODING SYSTEMS AND THEIR RELATIONSHIPS



 



A.   Introduction



 



     This chapter provides an analysis of the coding systems



reviewed.  This analysis should provide a stimulus to the agencies



maintaining the systems to make changes aimed at increasing



comparability with other systems and at improving the accuracy of



codes and reducing the cost of coding in their own systems.  in



addition, the information developed can make possible a technical



evaluation of possible new arrangements for interagency code sharing,



subject to legal restrictions on such exchanges.  Finally, the results



should help users of data from these systems to understand their



structure and limitations and the extent to which lata from different



systems are comparable.



 



     An initial step is to identify the system characteristics or



dimensions to be compared.  The primary dimensions that have been



identified are coverage, frequency and timing of initial coding and



updating, classification system used, classification principles,



information used as input to coding, coding procedures, and



description of systems relationships.



 



     Each of these dimensions is discussed in the following sections.



 



B.  Coverage



 



     Systems coverage has 3 sub-dimensions which can be described by



the answers to 3 questions: What kinds of units are coded?



Which of these units are included in the target population? And,



finally, is coding for all units or for a sample?



 



1.    Kinds of Units Coded



 



     The kinds of units that are classified by industry vary widely. 



The Standard Industrial Classification (SIC) was developed for



classification of establishments by industry.  Its offshoot, the



EnterPrise Standard Industrial Classification (ESIC), was developed



for classification by industry of enterprises or companies, many of



which consist of two or more establishments (Office of Management and



Budget, 1972, 1974, and Office of Federal Statistical Policy and



Standards, 1977b.)



 



     Concerning this first aspect of coverage, basic coding units or



simply units, i.e., the units of observation to which industry codes



are applied, are often determined by intended uses of the data files. 



For example, the Census Bureaus systems, which are established and



maintained solely for statistical purposes, use establishments as the



basic unit.  However, the Standard Statistical Establishment List



(SSEL)  which is the



 





                                  22



 



basic file in which industry codes produced by the various Census



Bureau systems reside, is organized to permit the aggregation of



groups of establishments to form other units, such as Employer



Identification (EI) number units (all establishments operating under a



single EI number) and enterprises, and the assignment of industry



codes to these units.



 



     By contrast, the units used in the systems of other agencies



(e.g., employers, tax entities, consolidated corporations) are



determined largely by administrative requirements.  Table I on page 23



provides a comparison of the basic coding units used for each system



studied, as well as comparisons of SIC level of, detail used, sample



or population coverage, an assessment of the level of input data



available for assignment of codes, updating cycles, and the average



annual volume of coding.



 



     In practice, business enterprises consisting of a single



establishment, as defined for purposes of the SIC, are classified in



essentially the same way in all of the systems reviewed by the Working



Group.  There are, to be sure, some elements of judgment in the SIC



definition, especially in those instances where "...distinct and



separate economic activities are performed at a single physical



location..." (Office of Management and Budget, 1972, p.10).  The SIC



Manual states that these activities shall be treated as separate



establishments if the employment in each is "significant" and reports



can be prepared separately for each activity on employment, payrolls,



sales or receipts and other establishment type data.  These criteria



clearly allow some latitude for judgment by the agency collecting the



data, and one could expect to find some cases where establishments



were defined differently by different agencies.



 



     Nevertheless, the major conceptual differences among systems with



regard to definitions of basic coding units are those affecting only



multi-establishment enterprises.  Here the systems reviewed use a



variety of units, including those with a legal, administrative, or



statistical basis, such as employers, taxpayers, corporations,



consolidate, corporations, or "reporting units".



 



     The "reporting units" used by BLS and SSA deserve



special attention.  Although they have the same name and have



been established for similar purposes, their operational



definitions are not identical for .multi-establishment employers.



Basically, the reporting unit in each case is a group of two or more



establishments under the same employer (El number) in the same county



and four-digit industry.  It has been so established for the



convenience of employers who would find it difficult or burdensome to



file separate administrative returns to SSA and to State Employment



Security Agencies for each establishment.



 



The BLS system is primarily an establishment based



system.   However, under certain circumstances a "reporting unit"



concept is substituted.  The "reporting unit" used by BLS



 





                                  25



 



includes two or more establishments under the same employer



identification (EI) or Unemployment Insurance (UI) account number in



the same county and industry.  These exceptions to establishment based



reporting are allowed in order to reduce employer quarterly



unemployment insurance tax reporting burden.  Exceptions to



county/industry level reporting are discouraged.



 



     SSA also uses a "reporting unit" concept under their



establishment Reporting Plan (ERP) to facilitate the processing of



large multi-unit employer wage reports.  When an employer firm agrees



to participate in the plan, it is asked to identify each of the firm's



retorting units (which may be establishments or payroll groupings) by



geographic location (county) and industrial activity and assign a



four-digit reporting unit number to each on a Form SSA-5019.  on



subsequent annual wage reports the firm groups its employees by



reporting unit, identifying each with the preassigned unit number. 



This arrangement provides a basis for SSA to isolate earning



discrepancies and to assign geographic and industrial classification



to each unit so that wage reports can be used as a source of



statistical data.  However, it should be noted that due to the



voluntary nature of ERP, every effort is made to set up and maintain a



breakdown of,reporting units that most closely conforms to the firm's



internal business structure in order to minimize the reporting burden



on the employer.  This may or may not result in the use of



establishments as the reporting unit.  In summary, operational,



procedural, and definitional differences make it difficult to compare



the net effect of the use of the "reporting unit" concepts in the BLS



and SSA systems.



 



     Finally, it is worthwhile to point out that for all systems the



nature of the units which are classified by industry in each system is



affected not only by the formal definitions but also by the specific



procedures used to implement these definitions.



 



          2.    Units Included in the Target Population



 



     The second aspect of coverage is to identify which of the



specified units are included in the target population for the system. 



The 5 principal criteria are:



 



          a.   Geographic location.  All systems cover units located



in the United States and owned by United States citizens or legal



entities.  Treatment varies for units located in United States



territories and possessions, for units with non-United States



ownership physically located in the United States, and United States-



owned units located outside of the United States.



 



     b.   Legal form of organization.  Each of the IRS systems covers



only one form of organization: sole proprietorship, partnership or



corporation.  The FTC Quarterly Financial Report system covers only



corporations.  Most systems cover all forms of organization.  However,



coverage of government-operated units differs greatly, as described in



d.  below.



 





                                  26



 



     C.   Presence of employees.  Sole proprietorships or partnerships



with no employees are included in the IRS systems if they are required



to file tax returns.  These nonemployer establishments are



incorporated into the economic censuses from IRS records; they are not



independently contacted by the Census Bureau.  Also, establishments



without payroll are included in the Census of Agriculture.  All other



coding systems code only units with employees.



 



     d.   SIC divisions.  Some systems are restricted to specified SIC



divisions or parts of divisions.  For example, the Census of



Agriculture covers only part of Division A (Agriculture, Forestry, and



Fishing).  The FTC Quarterly Financial Report system covers only



corporations whose primary activity is in mining, manufacturing,



wholesale trade and retail trade.  The inclusion of government units



varies.  They are not covered at all by IRS systems, but are covered



in part by several other systems.  The BLS Employment and Wages system



covers government employees at all levels, except for members of the



armed forces.



 



     e.   Size.  Industry coding in the economic censuses is limited



to employer establishments which exceed payroll cutoffs that vary by



industry.  These cutoffs are set to exclude the smallest



establishments within an industry from getting a census form.  The



census data, including industry codes for these small Establishments,



are taken from administrative records.  In the Census of Construction,



however, census forms are mailed to a  probability sample of



establishments below the established cutoffs, and sample estimates for



this group are included in the census totals.



 



     Table 2 on page 27 shows the coverage of the systems reviewed



with respect to criteria b., c., and d.  For this purpose, the six IRS



systems were grouped to form two "mega-systems": the Revenue



Processing and the Statistics of Income systems.



 



3.   Coding for a Sample or a Population



 



     The third aspect of coverage is whether or not sampling is used. 



If it is, the particular sample design will affect the frequency with



which coding is required and the potential for sharing industry codes



with other systems.  Examples of sample based systems are the IRS



Statistics of Income systems, the FTC Quarterly Financial Report



system, and the Census Bureaus Business Births coding system.



 



     Of all systems reviewed, the IRS systems (condensed in Table 2



from six to two systems) are the most complete, covering all SIC



divisions except J, Public Administration, and all forms of



organization except "government establishments" in the other SIC



divisions.



 





                                  28



 



     The most complete coverage of Division J, Public Administration,



is by the BLS Employment and Wages System, since most public as well



as private employers are covered by the Unemployment Insurance system. 



It should be-noted that the 1972 revision of the SIC changed the



principles for classification of "government establishments."



Previously, most of them had been classified under Division J,



Government; since 1972, each one is to be classified by its primary



economic activity, with only those not classified in other divisions



to be assigned to Division J, Public Administration.  One result of



this change is that the TRS systems, which do not include any



"government establishments" (since they are not taxed) , can no longer



be expected to have full coverage in all of the other SIC divisions.



 



     For employers, i.e,, businesses with one or more paid employees,



the BLS Employment and Wages and the SSA single-unit EI systems



between them should have virtually complete coverage of all SIC



divisions.  The BLS system excludes railroads and some "small"



agricultural employers (the cutoff varies by State); the SSA single-



unit system has only partial coverage of Federal, State and local



government employers and tax-exempt nonprofit organizations.



 



C.   Frequency and Timing of Initial Coding and Updating 



 



The extremes of this dimension can be represented by the IRS



revenue processing coding systems and the SSA single-unit EI System. 



In the IRS revenue processing systems, industry codes are assigned



annually to businesses reported on tax returns, without reference to



prior year codes.  In the SSA system, each covered employer is



assigned an industry code at the time of entry into the system, which



occurs when the employer applies for an EI number. This code is



generally retained in the system unless and until updated, primarily



by matching against economic censuses codes for the employers in the



file. These two approaches can be distinguished by the labels



"periodic, independent" for the approach represented by the IRS



systems and cumulative" for the approach represented by the SSA



single-unit system.  As another example, BLS has a tight schedule for



new code assignments, along with a three year cycle for updating. 



Many systems lie somewhere in between the extremes.  Where industry



coding is done for a sample of units in the target population, the



approach used will depend on whether and how much the samples for



successive time periods overlap.



 



D.   Classification System Used



 



     All of the systems studied use a classification scheme based



on the SIC.  Some systems which classify groups of establishments,



e.g., the IRS systems for corporations, use systems based



on the ESIC, which in turn ties into the SIC.



 



     For the systems reviewed by the Industry Coding Working



Group, the  following   assertion can be made:  While each



 



                                  29



 



classification system is based on the 1972 SIC 1/ or the 1974 ESIC



(which in turn is derived from the 1972 SIC), each system departs from



it in one or more respects.  These departures fall into three



categories:



 



          --   grouping of SIC categories 



          --   subdivision of four-digit SIC categories 



          --   addition of categories not covered by the SIC



 



     For the systems reviewed, grouping of SIC categories is more



common than subdivision.



 



     The SIC contains 1,005 four-digit and 421 three-digit codes.



The systems of IRS use a much smaller number of categories than



the others, currently in the neighborhood of 200 for each of its



6 systems.  The groupings vary by type of organization; there are



different groupings for sole proprietors, partnerships and



corporations.  For each organization type, the groups for the Revenue



Processing and Statistics of Income (SOI) systems are essentially the



same.  There are a few instances where IRS has subdivided SIC



industries.  For example, in the partnership systems, SIC Industry



7011, Hotels, Motels, and Tourist Courts has been divided into (1)



hotels, and (2) motels, motor hotels, and tourist courts.



 



     The BLS system uses most (971 of the 1,005) four-digit industry



codes.  In the 34 remaining industries, BLS experience is that four-



digit sic level coding is often unreliable because of conditions that



prevail in these industries, such as frequent fluctuations in employer



products or services or generally inadequate employer records.



 



     The SSA system also uses most of the four-digit industry codes. 



in the SSA systems, the full four-digit SIC Code is the preferred



code, except for major groups 01 (agricultural production -- crops)



and 02 (agricultural production -- livestock) , and division i (public



administration) , where only the two-digit detail is provided.  The



codes used for these groups are called "foldback" codes.  Thus, there



are 63 of the 1,005 SIC industry codes which are not used at all.  For



115 industries, "foldback codes" are used only if the employer does



not furnish enough information to code to the four-digit level;



followups for additional information are not attempted by SSA.  The



use of these foldback codes was especially heavy during a period in



the early 1970's when SSA was doing "dual coding" (assigning two codes



to each employer, one based on the 1967 SI.  and one based on the 1972



SIC) in preparation for conversion of their systems to the 1972 SIC. 



In summary, it seems fair to say that full SIC detail is lacking in



SSA's systems for 178 of the 1005 industries in the 1972 SIC.



 



 



1/ AS revised by the 1977 Supplement (Office of Federal Statistical



Policy and Standards, 1977b).



 





                                  30



 



     The Census Bureaus industry classification system for the 1977



Economic Censuses is described in its 1977 Industry and Product



Classification Manual (Bureau of the Census, 1977b).  The latest



version of this IPC manual for the 1982 Economic Censuses has recently



been released.  Census establishment codes carry full SIC four-digit



industry detail except when information available for classification



is incomplete, or when publication of establishment data for a



particular industry would disclose individual company operations. 



Industries affected by the latter restriction for 1977 are:



 



          (1)  Mercury, 1092, grouped with 1099



          (2)  Typewriters, 3572, grouped with 3579



          (3)  Electronic tubes, 3671 to 3673, carried as 3671.



 



     In addition, for economic censuses purposes, the IPC Manual



provides for subdivision of selected industries in SIC major groups



41, 42, 47, 50-59 and 70-89, i.e., in the areas of transportation,



wholesale and retail trade, and services.  The "sub-industries" are



identified by adding two digits to the four digit SIC code.  For the



1977 Economic Censuses, 83 four-digit industries in these major groups



were subdivided to form 256 six-digit sub-industries.  Two different



patterns have been followed in subdividing four-digit industries.  In



most cases, there is only one level of disaggregation for an industry,



i.e., the six-digit codes differ only in the 5th digit, and the 6th



digit is 0.  In a few cases, however, there are two levels of



disaggregation, i.e., one or more of the five-digit codes will be



subdivided by using different digits in the 6th position.



 



     All of the systems have conformed to SIC revisions; in addition,



many of them have introduced other changes from time to time, usually



in the direction of showing more detail.



 



E.   Classification Principles



 



     Given the general principle of adherence to the SIC, there



remain several conceptual issues to be dealt with in order to develop



the procedures to classify establishments or other units by industry



(Simmons, 1953).  These include:



 



          1.    Classification of units with multiple activities.



 



     Under some conditions, such units may be split and classified



separately.  This option is more likely to be used when reports are



filed solely for statistical purposes.  When it is not used the first



decision needed is what measure of activity to use.  Options include



gross receipts, value of sales, value of production, value of



shipments, and employment or payroll associated with each activity



covered by a separate SIC code.  A second decision is how to use these



measures to determine the principal activity.  one option is to simply



choose the 4-digit (or 6-digit if using IPC) category with the highest



value of the measure chosen.  An alternative sometimes used is a



hierarchical



 



                                  31



 



procedure:  choose first the SIC division which has the highest value,



next the major (2-digit) industry within that division with the



highest value, and so on until the 4-digit or 6-digit level is



reached.



 



     For establishments the main question is what measure of the



relative importance of different activities should be user? The 1972



SIC Manual (Office of Management and Budget, 1972) is F clear on this. 



It states that "Ideally, the principal product or service should be



determined by its relative share of



"value added" at the establishment" (p.  12).  Recognizing, however,



that data for value added for each product or service are difficult to



obtain, it recommends that the following data measures be used (SIC



Manual.  p.  12):



 



Division                                     Data Measure



 



Agriculture, forestry, and fishing,     Value of Production



hunting, and trapping (except



agricultural services)



 



Mining                                  Value of Production



 



Construction                            Value of Production



 



Manufacturing                           Value of Production



 



Transportation, communications,         Value of receipts or



electric, gas, and sanitary services    revenues



 



Wholesale trade                         Value of sales



 



Retail trade                            Value of sales



 



Finance, insurance, and real estate     Value of receipts



 



Services (including agricultural        Value of receipts



services) or revenues



 



Public administration                   Employment or-payroll



 



 



     The recommendation is qualified in two ways.  First, it is stated



that these measures should be used "when available." Second, it is



stated that in some instances, an industry classification based upon



the recommended output measure will not represent adequately the



relative economic importance of each of the varied activities carried



on at such establishments.  In such cases, employment or payroll



information should be used to determine the primary activity of the



establishments."



 



     Once relative (or absolute) values of the measures have been



obtained for each product or service by four-digit industry, the



establishment is coded to the industry with the largest share



 





                                  32



 



of the total, without regard to the shares of higher-level SIC



categories (industry groups, major industries, or divisions).



 



     To what extent are these recommendations followed in the systems



reviewed by the Industry Coding Working Group? Following is a summary



of the practices of the four major agencies.  It



 



will be seen that none of the agencies follows the SIC Manual in every



respect.



 



     BLS -- For all SIC divisions except Division J, public



administration, the source documents for industry coding ask for sales



or receipts.  The source document for government reporting units asks



for employment or payroll.



 



     Census -- According to the official description of industry



coding procedures for the SSEL (Bureau of the Census, 1979), the



recommended measures are used except in Division C, construction,



where value of receipts is used in place of value of production and



Division D, manufacturing, where value of shipments is used in place



of value of production.  It should be recognized, however, that the



specified measures are not available on a current basis for some units



in the SSEL, in particular, those that are out of scope of the



economic censuses or are not included in the mail portion of the



censuses.



 



     IRS -- Taxpayers are asked to provide codes and/or short



descriptions of their "principal activity,, which is generally defined



in the instructions as the one accounting for the greatest proportion



of sales or receipts.  There are two exceptions to this general rule. 



First, the tax schedule (Schedule r) for farm sole proprietors



contains entries for income (receipts) for each of several distinct



crop and livestock items, so that a more objective basis is available



for coding to industries within this division.  Second, starting in



tax year 1977, the instructions for the partnership tax return (Form



1065) have stated that the principal activity should be the one



accounting for the largest proportion of assets.  Before then, the



standard instruction to base principal activity on sales or receipts



was used.



 



     SSA -- Currently employers applying for an EI number are asked to



describe their "nature of principal business activity" without any



specific reference to the treatment of multiple activities.  Multi-



unit employers who provide data for their separate establishments or



reporting units are asked to provide percentages corresponding to the



principal activities of each one, listed in order of importance, but



the instructions do not say on what measures these percentages should



be based.  The report form also asks for number of employees engaged



in each activity.  In the coding process based on these reports, a



manufacturing industry code is preferred over all others if the



associated percentage is 20 percent or more.



 







                                  33



 



     Except for the SSA special treatment of manufacturing just noted,



all agencies assign the industry code for the category with the



greatest share of activity, using data by four-digit SIC industry or



the most detailed level contained in the system.



 



     One solution that has been proposed for the multiple activity



problem is to assign more than one industry code to establishments



with more than one activity.  The Census Bureau has developed but not



yet implemented a proposal that the SSEL include secondary activity



codes for each four- digit SIC activity with sales/receipts of



$100,000 or more (Bureau of the Census, 1979).  The record for the



establishment would carry a sales/receipts size class code



corresponding to each activity code.



 



2.   Time interval and reference period



 



     One year is the standard time interval for most systems.  The SSA



systems are an exception; the input document asks for a description of



the principal activity carried on, without any reference to a specific



time period.  Most systems use a calendar year, but in some systems



the reports are for tax years or fiscal years, which are not



equivalent to calendar years for all units coded.



 



     Another important consideration is the relationship between the



reference period for code determination and the period for which data



are collected and the code assigned.  This leads to the question of



updating, i.e., how often should industry codes be revised? There is



considerable variation both between and within systems as to the



frequency of updating industry codes, or refiling, as it is sometimes



called.



 



     When a system is used to produce aggregate data such as



employment, payroll, receipts, etc., classified by industry, the



reference period on which the industry code is based may not be the



same as the period covered by the data.  The.major industry coding



systems reviewed do, in fact, differ considerably in this respect. 



Following is a broad outline of the differing practices followed by



each of the four major industry coding agencies.



 



     IRS -- Returns are industry coded annually, based either on self-



coding by taxpayers, or coding from an activity description on the tax



return.  Thus, for data by industry from the IRS systems, the



reference periods for the data and the industry classification always



coincide.



 



     BLS -- Each resorting unit is classified initially when the



employer enters the unemployment insurance system.  It is BLS policy



that codes should be reviewed and updated on a fixed time schedule, as



follows:



 



 

     34



 



Type of Unit                                           Frequency 



 



Units with 500 or more employees,                      Annually



except government



 



All other units, except government                     Every 3 years



 



Government units                                       Every 5 years



 



     The timing of the 3-year cycles varies by SIC division, so that



review and updating is done for units in certain divisions each year. 



information leading to code changes may come from other sources



between regular updates; the extent of such changes and how well they



track actual changes is not known.  The source documents used for



initial coding and updates request relevant information on activities



for the most recent calendar year.



 



     SSA -- Each employer is classified initially at the time an



application for an EI number is filed.  The application form asks for



information about the nature of the business at the time ,of the



filing; there is no defined reference period.  Shortly thereafter,



eligible multi-unit employers are asked to submit activity information



for each of their reporting units, the situation with respect to



reference period being the same as for the original application form. 



For single-unit employers, the last general update was based on a



comparison with codes assigned in the 1972 Economic Censuses.  For



multi-unit employers, changes are based either on reports filed



voluntarily by employers or on correspondence initiated by SSA when



the units for which current wage reports are submitted do not match



those in the file.  Resources for such correspondence are limited.



 



     Since both the single and multi-unit employer files carry date



codes indicating the most recent update of the employer's industry



classification, it would be possible to tabulate each file to obtain a



distribution of employers by years elapsed since last update.



 



     Census -- Reference periods vary by coding systems.  For units



covered by mail (or interview) in economic censuses, the industry



classification has the same reference period as the data.  This is



also true in some but not all current surveys.  Perhaps the best



approach is to  consider the SSEL, which provides



the frame for all censuses and surveys and for the annual County



Business Patterns program.l/ For the larger multi-unit



companies, industry codes for their establishments are updated



annually in the Company Organization Survey.  Smaller multi-unit



companies are updated once between five-year economic censuses.



A.t the other end of the spectrum, industry codes for single.-unit



 



 



     1### - true for all units with employees.  IRS is the main source



of information for zero-employee units.



 



 



                                  35



 



employers outside the industry scope of the economic censuses (such as



those included in Division H, finance, insurance, and real estate, and



some industries in other divisions) and for those small employers who



are in scope but not included in the mail portion of the census will



in most cases be the original codes assigned to them by SSA when they



applied for EI numbers.



 



     In summary, most agencies use a one-year reference period for the



activity data on which industry classification is based, the exception



being SSA which asks for current activities with no defined reference



period.  Updating practices vary widely, both within and between



agencies.  (See Table 1 on page 23, Column B.)



 



3.   other considerations



 



     Some data users are troubled by the effects of sudden and/or



erratic changes in industry classification, especially when large



units are affected.  This has led to the application, in some systems,



of resistance principles.  After a preliminary code has been



determined using data from :the current reference period, the



preliminary code is compared with codes from one or more previous



periods.  If the preliminary code differs, from the prior one, it is



accepted only if certain threshold conditions are met.  Several of the



systems studied incorporate resistance principles.



 



     There is also the problem of the classification of certain



ancillary or auxiliary activities, such as central administrative



offices, manufacturers' sales branches, laboratories, and warehouses. 



Classification of these units is usually based on the activities of



the establishments they serve, as specified by the SIC Manual.



 



F.   Information Used as Input to Coding



 



Various sources of information are used as input for classification of



units by industry within the agency systems covered in this study. 



The two principal categories are agency source documents, and



information other than agency source documents.  The latter



encompasses prior codes assigned within the same agency and codes from



other agencies.  The referencing of codes and other information



available from commercial sources and contact with the company by



phone, correspondence, or in person are also methods of obtaining



additional coding information.



 



1.   Agency Source Documents



 



     The principal resource for assigning industry codes to units



within each system is usually the source documents used by the agency. 



The reason for this is that the codes from other agencies or



commercial business listings may not be fully compatible with the data



classification requirements of the



 







                                  36



 



receiving system because of differences such as the required level of



detail, coding principles, code inaccuracy and whether or not the



codes apply to the appropriate reference period.  Also, in many



situations code transfers are prohibited by law.



 



     A study of the source documents used for the different coding



systems shows a variation between agencies and in some cases within



agencies.  Lack of standards in this area could be one reason, but the



variation can, in most cases, be justified by the major differences



between each agency program's data requirements for the design of



their source documents, and whether industry coding is a primary or



supplemental consideration in this program.



 



     Some factors that an agency must consider in designing the form



are the type of information needed in order to obtain the desired



level of industry detail, the scope of instructions needed to secure



this information, and whether or not the form can be specialized to



cover specific industries.  It is also necessary to determine whether



the forms are to be self coded by the respondent, manually coded by



the agency's classifiers or coded by computer.  In addition, the



burden which completing the form places on the respondent must be



evaluated.



 



     A.  very important factor that should be noted is that



     often the coding source documents are designed primarily for



     other purposes.  For example, the Form SS-4, which is used as the



 



main coding source for SSA's single unit El coding system, is actually



an IRS form utilized by employers and others in applying for an El



number.  Another case would be the IRS' Statistics of Income coding



Systems where tax schedules, such as the Form 1120, are user for



industry coding.  Coding information is often a minor part of such



forms.



 



     In contrast, some other agency source documents are specifically



designed for the collection of industrial data.  These forms may vary



from the general purpose type to report forms tailored to a specific



industry.  Examples, of these latter types of source documents are the



various report forms used in the economic censuses.  These forms are



specialized to the industry which has been determined by codes



assigned from previous censuses or surveys, the Company Organization



Survey (COS) or Social Security Administration (SSA) records.  if a



code is not available and the kind of business cannot be determined



from the trade name or other reliable information, a more generalized



form is sent.



 



     In general, the principal difference among the source documents



is the nature and detail of coding information available on the



various forms used in each agency's system(s).  The type of



information requested on these forms for determining an industry code



ranges from brief descriptions of the principal business activity, or



pre-listed industry descriptions and codes for self-selection, to



percent distributions of gross sales or



 





                                  37



 



receipts by products or services.  Specific examples of these varied



kinds of information are: (1)pre-listed taxpayer-selected codes such



as on IRS Form 1120; (2) pre-listed kind of business activity check



boxes (with or without industry codes) on report forms used to



classify establishments lacking industry codes prior to mailing



industry-specific forms in the economic censuses; (3) respondent-



furnished descriptions of principal products or activities based on



percent of total sales on BLS Forms 3023-A and 3023-B (of which there



are different versions for each industry division); (4) principal



business activity on BLS Form SS-4 used in SSA's single-unit EI coding



system; and (5) sales distribution by industry on BEA's Form BE-12



used in their Benchmark Surveys. in the absence of an adequate



description of the unit's activities, some agency systems may use the



trade name as a coding source (e.g., Hilda's Beauty Shop, Bob's Cafe



or Johnson's Department Store).  This "name coding" is used in SSA's



coding of the Form SS-4.



 



     The following is a comparative analysis of the level of detail



available on source documents.  It provides a comparison by level of



source information detail based on the chart shown below and gives



examples for each category (See Chapter VI for actual source documents



and brief description of each).



 



                                        Level of source



     Category            Coding by:     information detail



 



     A                   Respondent     Not applicable



 



     B                   Agency         Low



 



     C                   Agency         Medium



 



     D                   Agency         High



 



     Category A (Selfcoded) -- The only systems which use self-coding



(i.e., coding by respondents) almost exclusively are the IRS revenue



processing systems for partnerships and corporations.  Some forms used



in BEA's Direct Investment (DI) Statistics Program also request



respondents to enter up to eight 3-digit codes which represent DI



industry Classifications under which they have sales.  However, final



code determinations are made and entered on the forms by BEA coders. 



Bureau of the Census forms, especially in the retail anti wholesale



trade and service areas, also frequently utilize pre-listed,



respondents elected descriptions and codes.  In most cases, responses



to these items are checked against other data furnished on the form in



order to determine what industry code to assign.



 



     The source documents for the above mentioned IRS systems are the



appropriate tax return forms for these two categories of taxpayers. 



The relevant data items and instructions from the partnership return



(IRS Form,1065) for tax year 1981 are shown as Exhibit 1, Chapter VI. 



The "Business Code Number" is to be



 







                                  38



 



entered by the taxpayer in Item C on the first page, using the



instructions and code list on page 12 of the 4nstructions.  The code



list provides a short description for each of the industries included



by IRS along with the appropriate codes.  Taxpayers are also asked to



give a brief description of their principal business activity and



principal product or service in Items A and B, respectively.  This



information is used very little in revenue processing, but to a



greater extent in the Statistics of income industry coding.



 



     An observed feature of self-coding is the potential for a high



proportion of incorrect codes immediately following a revision of the



Standard Industrial Classification.  Some evidence on this score is



presented in Chapter IV.



 



     Category B (Agency coded, low detail) -- The example for this



category is also taken from IRS.  Exhibit 2 of Chapter VI shows the



relevant data items and instructions from the 1981 tax return schedule



used for non farm sole proprietorships (IRS Form 1040, Schedule C). 



The primary data items used for coding are Item A, a two-part item



calling for brief descriptions of the "main business activity and its



"product" and Item B, the business name.  The instruction for Item A



is to "Report the business activity that accounted for the most



income...Give the general field as well as the product or service. 



For example, "wholesale-groceries' or 'retail-hardware,."



 



     For some returns, additional clues to the correct classification



may be found by examining other parts of the return, e.g., the kinds



of expenses (deductions) reported in Part 11 and the kinds of 



property listed in Schedule C-2, Depreciation.  Note, however, that



taxpayers are not required to show a breakdown of receipts or sales by



source, so there is no way even to check that the main activity has



been properly identified, let alone to apply the more complex rules



that are used for some combinations of activities.



 



     It may be noted in passing that IRS Form 1040, Schedule F and



Form 4385, which are used for farm sole proprietorships, do require a



breakdown of sales or income from different kinds of crops and



livestock production.  This is probably sufficient to put these source



documents in Category D.



 



     Other source documents classified as providing a low level of



input detail were certain ones used by the Census Bureau as a



preliminary to more precise coding of later documents based on the



economic censuses or current surveys.



 



     Category C (Agency coded, medium detail) -- The main example for



this category is the Form SS-4 (Application for Employer



Identification number).  The complete Form SS-4 and the relevant



section of the instructions for it appear as Exhibit 3 of Chapter II. 



This is an IRS form Used by SSA to classify all employers for the



single-unit employer file.  (Codes for



 



 



                                  39



 



establishments or reporting units of multi-unit employers are based on



a more detailed form which is sent to eligible employers following



receipt of the initial application.) The primary data item used for



industry classification is Item 14, Nature of Principal Business



Activity.  The instructions for this item give examples of the kinds



of descriptions desired for various SIC divisions.  other items which



may assist in classification are:



 



          Items 1 and 4 -- Name and Trade name.



 



          Item 10   --   Type of organization.



 



          Item 16   --   Breakdown of employees by type.



 



          Item 17   --   For manufacturers, principal product and raw



                         material used.



 



          Item 18   --   To whom does the employer sell most of his or



                         her products or services.



 



     These items, especially 17 and 18, cover certain of the



key data requirements needed for classification that were not covered



in the Category B example.  The Form SS-4 is classified in the medium



rather than high detail category primarily because it does not provide



any breakdown of multiple activities.  Several earlier versions of the



SS-4 did include an item asking manufacturers to list their three



principal products and to give the percentage of total value of



products represented by each of these.



 



     Category D (Agency coded, high detail) -- Within this category,



the amount of detail and the general approaches used vary, so it will



he useful to give more than one example.



 



     The source documents which provide the most information for



industry coding are the mail questionnaires used in the quinquennial



economic censuses.  These questionnaires call for detailed information



and are tailored to different groups of SIC industries hence they



include the specialized inquiries needed to assign industry codes



within those groups.  Special procedures are, of course, needed to



handle questionnaires returned by establishments which are



inappropriate to their activities.



 



     Exhibit 4 of Chapter VI shows one questionnaire for the



1982 Census of Retail Trade -- Tires, Batteries, Parts, Accessories,



(Form CB-5502). This questionnaire was mailer to establishments



believed  to be in Census industry and Product Classification



categories 553110 (tire, battery and accessory dealers) and 553120



(other auto and home supply stores).  The "mailout" code, i.e., the



latest IPC code for that unit from the Standard Statistical



Establishment List (SSEL), will appear on the mailing label.  A "self-



designated" code will be determined on the basis of the respondent's



entry in Item 9, Kind of Business.  Normally, the final IPC code will



be computer-



 







                                  40



 



assigned, based primarily on the merchandise lines data (Item 11), but



also taking into account other relevant items on the form, including



dollar volume of business (Item 5), class of customer (Item 7), method



of selling (Item 10) and a specific inquiry on sales and receipts from



retreading tires (Item 12a).  The mailout and self-designated codes



enter into the final code determination only when the data for the



items normally used are incomplete, ambiguous, or contradictory.



 



     Other forms that provide a high level of information for industry



coding are BLS Forms 3023-A (Industry Classification Statement) and



3023-B (Industry Verification Form), which are designed for each



industry and used for updating all industry codes.  They are also used



to update area, type of ownership, and auxiliary codes of existing



units covered by the Unemployment Insurance Employment and Wages (ES-



202) Program on a three-year refiling cycle. Form BLS 3023-A is used



sometimes by the state agencies to clarify or obtain additional



information necessary to assign SIC codes to new employer accounts. 



For both forms, there are separate versions for each industrial



division (including an " all industry" version).  Each form also



provides for the inclusion of other establishments reported by a multi



unit company.



 



     Exhibit 5 of Chapter VI shows BLS Form 3023-A7 (Rev. Dec. 1982),



which is one of the forms used to update industry codes for reporting



units currently classified in wholesale trade. Unlike other examples



discussed in this section, this form is designed primarily to get the



information needed for classification of the report  unit. The key



items on the form for this purpose are items B, D and E.  Item B



covers the identification of multiple products or activities of the



reporting unit, and the percent of total sales (value of receipts)



accounted for by each during the most recent calendar year.  Item D



identifies Central Administrative offices (CAO'S) and auxiliary units,



and item E asks for the principal class of customer, as an aid to



determining whether the unit is wholesale or retail.



 



     A final example in this category comes from the Federal Trade



Commission's (FTC) Quarterly Financial Report (QFR) Program.  (This



program was transferred to the Bureau of the Census in late 1982.)



Exhibit 6 of Chapter VI shows FTC Form 59-103 (rev. Oct. 1979), Nature



of Business Report.  The FTC uses two versions of this form, the one



shown, which is for the manufacturing division, and a second version



for the other SIC divisions included in the QFR Program (mining,



wholesale trade and retail trade).  The Nature of Business Report is



sent to all corporations which are about to enter the QFR sample for



initial determination of status, and, for updating purposes, to



certain corporations reentering or remaining in the sample.  Like the



BLS Form 3023, its primary purpose is to classify the reporting units



by industry. in addition, several questions are asked to determine the



current corporate structure of the reporting unit.



 





                                  41



 



     The key item on the form is item 3, n which the respondent is



asked to list products made, processed or assembled and/or sold, with



the percent share of gross receipts accounted for by each.  in



addition, information is requested on kinds of raw materials used and



processes used in production.  Unlike the BLS form, this form does not



provide any illustration of the level of detail desired in



distinguishing different product categories.



 



          2.    Information Other than Agency Source Documents



 



     As stated earlier, most agencies rely primarily on their own



source documents as input to their coding systems.  However, in



certain situations they may resort to other coding sources such as



additional contact with the company, prior codes assigned to the same



units within their own agency codes supplied by other agencies, and



codes and other pertinent information extracted from commercial



sources.



 



     The prior codes assigned by an agency are used for various



purposes.  Listed below are some of the uses and examples of agency



systems to which these situations apply.



 



     ---  Report form selection.  During the economic censuses the



          Census Bureau utilizes prior codes as a selection factor in



          determining the appropriate form to be mailed.



 



     --   Reference for manual editing.  Many of the agency coding



          systems reference prior codes during updating processes for



          purposes of reviewing code changes, determining accuracy of



          current codes and making final code determinations.  For



          example, prior codes for permanent sample units in FTC's



          Quarterly Financial Report (QFR) are available to the coders



          for determining code changes for large corporations.



 



     Codes supplied by other agencies are also used for various



purposes.  Some of these are listed below with examples.



 



     --   Report form selection. The Census Bureau uses industry codes



          from SSA records if no previous Census assigned codes are



          available to determine the appropriate report form to mail



          in the economic censuses.



 



     --   Coding of nonrespondents, and establishments not included in



          the mail part of the economic censuses. IRS Principal



          Industrial Activity (PIA) and SSA assigned codes are two of



          the various sources used by the Census Bureau for



          determining an industry code for these cases in the economic



          censuses.



 





                                  42



 



     --   Coding of units with incomplete data.  The Census Bureau



          references SSA assigned codes when classifying cases with



          insufficient information in the business births coding



          system.



 



     --   Updating procedures.  The Social Security Administration



          attempts to update its code files every five years through a



          coordination with census records based on codes resulting



          from the economic censuses (especially following a major SIC



          revision).  The last such update was based on the 1972



          Economic Censuses.



 



     Other sources of coding information are commercial business



listings (e.g., Dun and Bradstreet, Moody's, Thomas Register).  many



agencies use these as a source when there is insufficient information



to assign a complete industry code to a unit.  Some examples of the



different agency coding systems which utilize these references are:



(1) business births coding (Census), (2) single-unit EI file (SSA),



(3) Company organization Survey (Census), (4) economic censuses



(Census), (5) Quarterly Financial Report (FTC), and (5) Statistics of



Income -- Corporations (IRS).



 



 



     The final coding source (and indeed the first and preferred



source for large establishments and firms) by which an agency may



obtain coding information for a unit when there is insufficient



information is through additional contact with the company by phone,



written correspondence, or in person.  This is done for most of the



systems and, as a case in point, for the Unemployment Insurance (UI)



Employment and Wage Program (Bureau of Labor Statistics).  Here the



State may send a BLS-3023 form (for new accounts), contact the



employer by phone or make a personal visit in order to obtain the



needed information.



 



     The wide variation among the coding sources used by the various



agencies affects the uniformity of codes assigned to the same units in



different systems.  Greater standardization of the coding systems in



this area would seem feasible at this time, but only for agencies



which have similar data requirements and have the resources needed to



code at the agreed level of detail.



 



G.   Coding Procedures



 



The procedures developed for use within the different coding



systems encompass a variety of activities.  These include:



 



     -    The methods by which the industry codes are assigned (i.e.,



          manual, computer-assisted, automated).



 



     -    Treatment of missing data.



 



     -    Data entry.



 







                                  43



     -    Quality assurance procedures (i.e., manual quality control



          and computer consistency checks).



 



     The following provides descriptions of procedure types available



under each of these functions and examples of how they are used. It



shows that wide variations exist between the procedures for the



systems studied.  The fact that these differences will affect the



comparability of codes between agencies is self-evident.



 



     1.    Methods of assigning codes.



 



     There are three principal methods by which the initial



industry codes are assigned.  Of these, manual coding is the most



frequently used.  The other methods used are "automated coding" and



"computer-assisted coding," which is also a form of manual coding.  At



this time the Census Bureau is the only agency which makes use of



"computer assisted coding." Listed below are basic descriptions of the



procedures which apply to each of these methods:



 



     --   Manual Coding.  Under this method the classifier manually



          assigns an industry code directly to the source document (or



          other form used for data entry purposes) based on



          information supplied by the respondent and other available



          sources such as commercial references or prior codes.



 



     --   Computer-assisted Coding.  This system was developed by the



          Census Bureau to assist the  coder during manual operations



          by computerizing the basic coding routine.  This system is



          being used in several phases of the 1982 Economic Censuses



          processing.



 



          Under this method, the coder, who is working at an



          interactive computer terminal, is first required to select



          the major SIC division which relates to the activity



          description and/or trade name supplied on the source form. 



          Then the coder selects a "key word" based on the same



          information and enters it into the terminal.  it possible,



          the system matches the "key word" to one or more verbal



          descriptions of SIC industries.  These industry descriptions



          are then displayed, with their associated code, for the



          coder to select the description and code which is



          applicable.  If the coder is unable to assign a code at this



          point, the system will then direct the coder through several



          routines until a code is derived.  If this fails the case is



          referred to an analyst for review.



 



          In addition to its coding functions, this method was also



          developed to improve the training of



 



 



                                  44



 



          coders, increase consistency, and provide a flexible



          mechanism for continuous updating of descriptions and codes



          in the system and IpC Manual.  it is also the first step



          towards a fully automated system of coding through the



          development of a comprehensive dictionary of industry



          descriptions.



 



     --   Computer/Automated Coding.  Currently no coding system



          studied by the Working Group is fully automated; however,



          two agencies (Census Bureau and IRS) are using largely



          automated coding procedures.  within the Census Bureau



          systems (e.g., the mail portion of the economic censuses,



          Census of Agriculture for farms with sales of $2,500 or more



          and other periodic surveys such as the Annual Survey of



          Manufactures) which have implemented this method, this is



          done by using computerized data on receipts or sales by type



          of product or service to assign and place in the records for



          each unit an industry code, according to a programmed set of



          rules.  Starting with tax year 1981, IRS's SOI programs have



          used largely automated procedures for generating current



          year Soi codes.  Procedures vary by type of return and tax



          year.  For most returns, the automated coding process



          derives the current year SOI code either from the prior year



          SOI code or from the current year revenue processing code. 



          Manual coding is used only on an exception basis.



 



     The following lists the agencie