Federal Committee on Statistical Methodology
Office of Management and Budget
FCSM Home ^
Methodology Reports ^

 

  Statistical Policy Working Paper 19 - Computer Assisted Survey Information Collection


Click HERE for graphic.

 

 



                  MEMBERS OF THE FEDERAL COMMITTEE ON



                        STATISTICAL METHODOLOGY




(April 1990)   Maria E. Gonzalez (Chair) Office of Management and Budget     Yvonne M. Bishop Daniel Kasprzyk Energy.Information Bureau of the Census Administration Daniel Melnick Warren L. Buckler National Science Foundation Social Security Administration Robert P. Parker Charles E. Caudill Bureau of Economic Analysis National Agricultural Statistical Service David A. Pierce Federal Reserve Board John E. Cremeans Office of Business Analysis Thomas J. Plewes Bureau of Labor Statistics Zahava D. Doering Smithsonian Institution Wesley L. Schaible Bureau of Labor Statistics Joseph K. Garrett Bureau of the Census Fritz J. Scheuren Internal Revenue Service Robert M. Groves Bureau of the Census Monroe G. Sirken National Center for Health C. Terry Ireland Statistics National Computer Security Center Robert D. Tortora Bureau of the Census Charles D. Jones Bureau of the Census           PREFACE     The Federal Committee on Statistical Methodology was organized by the office of Management and Budget (OMB) in 1975 to investigate methodological issues in Federal statistics. Members of the committee, selected by OMB on the basis of their individual expertise and interest in statistical methods, serve in their personal capacity rather than as agency representatives. The committee conducts its work through subcommittees that are organized to study particular issues and that are open to any Federal employee who wishes to participate in the studies. Statistical Policy Working Papers are prepared by the subcommittee members and reflect only their individual and collective ideas.   The Subcommittee on Computer Assisted Survey information Collection investigated the use of computers in collecting survey information. This report covert the different ways in which small computers can be used to improve data collection.- For example, the report describes computer assisted telephone interviewing (CATI), computer assisted personal interviewing (CAPI), data collection using touchtone telephones, and voice recognition. More than most working papers the relevance of the information in this report will age very quickly.   Various methodological issues are also addressed in this report. For example, issues discussed include human-machine interfaces, software development, hardware planning, and computer security.   The Subcommittee on Computer Assisted Survey Information Collection was chaired by Terry Ireland of the National Computer Security Center, Department of Defense.   i     CASIC Subcommittee Members     C. Terrence Ireland, Chair National Computer Security Center (Defense)   Thomas Anastasio National Computer Security Center (Defense)   Martin Baum National Center for Health Statistics (Health and Human Services)   William Blackmore Energy Information Administration (Energy)   Richard Clayton Bureau of Labor Statistics (Labor)   Ann Ducca Energy Information Administration (Energy)   Ralph Gillman Energy Information Administration (Energy)   Maria E. Gonzalez, Ex officio Office of Management and Budget (Executive Office of the President)   Stuart Katzke National Institute of Standards and Technology (Commerce)   George Kraft National Institute of Standards and Technology (Commerce)   Cathy Mazur National Agricultural Statistical Service (Agriculture)   John Sietsema National Center for Education Statistics (Education)   ii     Acknowledgments   The idea to develop a Statistical Working Paper on the use of computers to support the collection of survey information was first put forward by Yvonne Bishop of the Energy Information Administration. Ms. Bishop has a special interest in data collection techniques that do not involve an interviewer. With the advice of members of the Federal Committee on Statistical Methodology (FCSM), Maria Gonzalez organized a subcommittee with an expanded scope to examine a range of computer methodologies that supported the collection of information, the subcommittee on Computer Assisted Survey Information collection, (casic) . The members of the CASIC Subcommittee further expanded the report to include the three important methods of data collection, Computer Assisted Telephone interviewing (CATI), Computer Assisted Personal Interviewing (CAPI), and Computer Assisted Self Interviewing (CASI). For each related technological area from software interfaces to computer security, the, CASIC Subcommittee investigated and wrote sections of the working paper that showed the application of these areas to CATI, CAPI, and CASI. The CASIC Subcommittee thanks the members of the FCSM for their advice and comments on several drafts of the working paper. Special thanks go to Charles Caudell (HASS) and Joe Garrett (Census). for their in depth comments on the various drafts.     iii COMPUTER ASSISTED SURVEY INFORMATION COLLECTION (CASIC)   TABLE OF CONTENTS   Part I. Executive Summary 1 A. Introduction 1 B. Computer Assisted Survey Information Collection 2   Part II. Introduction 3 A. Objectives, Scope, and Users 3 B. Federal Information Processing Standards 8 C. Organization of Report 9   Part III. Options for Automated Statistical Surveys 11 A. Computer Assisted Telephone Interviewing (CATI) 11 B. Computer Assisted Personal Interviewing (CAPI) 15 C. Computer Assisted Self Interviewing (CASI) 17   Part IV. Methodological Issues 25 A. Human-machine Interfaces 25 B. Software Development 32 C. Data Collection Programs 36 D. System Interfaces For Data Conversion 41 E. Computer Security 44 F. Hardware Planning 50 G. Network Planning 54   Part V. References 63   Part VI. Appendices 67 A. Costs 67 B. Quality Improvements offered by CASIC 73 C. Survey Examples 78 D. Taxonomy 94 E. Glossary 96   v     I. Executive Summary   I.A. Introduction   Surveys have used computers since the Bureau of the Census obtained the UNIVAC I. Since that breakthrough,, the power of rapid calculating has been applied to almost every phase of the survey process, including sample design, sample selection, and estimation. The most important implication of these applications is that survey practitioners can now consider a growing range of techniques that were not affordable, or even thought of, before the availability of inexpensive and fast calculating capability.   The last major survey operation to benefit from automation is data collection. Computers were first applied to collection using mainframes to control certain aspects of telephone collection, and computer Assisted Telephone Interviewing (CATI) was born. The first applications of CATI provided a flood of research worldwide evaluating the impact of this technique on the survey error profile and costs. CATI is now used to help interviewers in all collection activities, including scheduling calls, controlling detailed interview branching, editing and reconciliation, thus providing much greater control over the collection process and reducing many sources of error. Simultaneously, a tremendous storehouse of information is captured by the computer to provide additional insight into the data collection process.   In just two decades, CATI has become a standard collection vehicle grounded strongly in a firm body of research.   The ongoing advances in computer technology, and particularly the arrival of microcomputers, continue to offer survey practitioners more fertile ground for improving the quality of published data. The first portable computers were quickly pressed into service to duplicate the advantages of CATI in a. personal visit environment. Thus, Computer Assisted Personal Interviewing (CAPI) grew from the seeds of CATI.   While CATI and CAPI represent advances for surveys requiring interviewers, microcomputers are now finding important roles in self- administered questionnaires, where interviewers are not needed. These roles take advantage of more advanced technology and the widespread availability of technology to allow respondents to complete the questionnaire without the assistance of an interviewer. Prepared Data Entry (PDE) allows respondents that have a compatible microcomputer or terminal to access and complete the questionnaire directly on their screen.   Touchtone Data Entry (TDE) allows respondents to call and answer questions posed by a computer using the keypad of their touchtone telephone for well-controlled and inexpensive collection. As an extension of this approach, recently developed techniques in   1         Voice Recognition Entry (VRE) allow respondents to answer questions by speaking directly into the telephone. The computer translates the respondent's answers into text for verification with the respondent and then stores the text in a data base.   These and other collection methods will continue to evolve out of the work now underway. New technology will assuredly bring more options for survey practitioners to consider.   The use of these collection methods, while bringing needed improvements in the quality of collected data, has created other challenges. These automated collection methods are made possible through the close interaction of statisticians, subject matter experts and colleagues in the computer sciences. To use these methods effectively, each profession must learn and use the models and techniques of the other professions. This close relationship will continue to grow, with advances in each field supporting advances in the others.   The goal of this report is to profile several automated survey collection methodologies and provide a glimpse of what future technological advances may offer to survey operations.   The selection of one or more of these collection methods depends on a clear understanding of computer applications. Software and hardware selection can be essential to success,, as may be the use of networks for the computers. As with any survey method, the need to assure the confidentiality of the data gathered and stored by the computers is critical.   This report discusses several data collection methodologies now being used in Federal agencies in terms of procedures, impact on quality and costs. It also discusses the significant issues surrounding the use of advanced technologies to augment survey data collection.     I.B. Computer Assisted Survey Information Collection (CASIC)   For this report, the Subcommittee defines Computer Assisted Survey information Collection as those information gathering activities using computers as a major feature in the collection of data from respondents, and in transmitting of data to other sites for post-collection processing. It is in this area of survey operations that technology is now having the greatest impact.   2         II. Introduction   II.A. Objectives, Scope, and Users   The Subcommittee on Computer Assisted Survey Information Collection was established in October 1988 to document and discuss the status and potential use of advanced technology for collecting statistical data, for its transmittal to central processing sites, and the conceptual and practical issues surrounding implementation. High quality published data begins with collecting high quality data from respondents. Much of survey processing addresses, and compensates for, weaknesses in the quality of the collected data and absence of uncollected data. The survey questionnaire, received on time, completely filled out and accurate, can reduce post-collection errors and.their related costs.   The Computer Assisted Survey Information Collection Subcommittee of the Federal Committee on Statistical Methodology has studied the various implications of the vast computing power now available to support statistical surveys and is providing this information for use throughout the Federal Government.   Objectives   The primary objective is to describe emerging methods of interactive electronic data collection and transmission, potential benefits, and current examples of their use in Federal surveys. This report also covers techniques and appropriate references to the literature.   A secondary objective is to consider specific methodologies and related issues stemming from the use of computer assisted statistical surveys. Also addressed are other practical considerations involving human-machine interfaces, software design, hardware features, data transmission and computer security. The issues involve such f actors as quality, costs, and respondent reaction to computerized surveys.   Some advantages of automated surveys are:   a. improved data quality from (1) the introduction of automated questionnaire branching, editing features, and computer utility support; and (2) a shorter processing path from data collection to data processing (e.g., reduced keying errors because keying of the paper questionnaire is no longer necessary).   b. improved timeliness of data capture by the elimination. of some data entry steps and of extensive editing.   3           c. increased flexibility in data gathering (e.g., for conducting multiple version questionnaire surveys involving question reordering and different natural languages).   In deciding which collection method to use, quality is a relative idea that is affected by a tradeoff between cost and benefit. The choice of a data collection method is usually based on a combination of performance and cost factors. Together they determine affordable quality. For traditional collection methods, these factors and the decision-making process are usually well-known. Now, as technology progresses, new methods are being tested that expand the array of potential collection tools and challenge the survey-designer to reevaluate old cost/performance assumptions.   These semi-automated collection applications fall naturally into 3 areas: (1) Computer Assisted Telephone Interviewing (CATI) where the interviewer and respondent talk over a telephone, limiting their personal interactions while maintaining the substantial flexibility provided by a telephone; (2) Computer Assisted Personal Interviewing (CAPI) where the interviewer and respondent talk directly across the table, although this direct access comes with the cost of additional logistical problems; and (3) Computer Assisted Self Interviewing (CASI), a newly coined phrase to describe situations where the interviewer is replaced by interaction with the computer. - Subcategories include Prepared Data Entry (PDE) where the respondent uses a computer terminal; and Touchtone Data Entry (TDE) and more recently, Voice Recognition Entry (VRE) where the respondent interacts with a computer over a phone line.   However, computer applications are not limited to obtaining data from respondents. In addition, the prompt transmittal of reported data to the processing facility and the conversion of data   to proper formats are important to the publication of timely and relevant information.   New options will encourage reconsideration of old assumptions about quality, cost, and technology. Decisions made years ago in an era of fewer alternatives should be reviewed periodically. Many factors can change in a short period. Only a few years ago, automation costs were driven by the scarcity of mainframe hardware capacity. Now the labor involved in developing specialized systems dominates automation costs. Portable and desktop microcomputers were not widely available at the beginning of this decade. Now, widely available, inexpensive and powerful, they are an assumed part of the work environment. The tough questions involve the selection of the appropriate system configuration.   The general goal of this report is to challenge Federal survey managers to reconsider their operations in light of recent changes   4           in survey methods available, or attainable through new technology, and to reassess their methods of providing information to the public that is accurate, timely and relevant.     Scope   Automated data collection includes three major groups of people: the respondents, the interviewers, and the designers and developers of the system and procedures for collection. This report covers the essential factors involved in successfully including the requirements of each group.   The survey operations considered in this report include the computer-related activities of design - and development of the questionnaire, interviewing, data entry, editing and follow-up for nonresponse or edit reconciliation, data transmission and data conversion.   The critical activities of sample design, sample selection and estimation are not included in the scope of this report. Still, the choice of an automated collection method is important to these activities. This choice must be an integral part of the survey design. For example, the decision to use CATI to improve collection of time critical data may provide the sample designer with additional flexibility to consider techniques that require rigorous sample control or complex questionnaire branching logic.   Respondents   The respondent must be considered the primary user of any survey vehicle, whether automated or not, and all aspects of the response environment must be developed with the respondent in mind. The cooperation of respondents is the single most critical factor in survey operations, and they must be treated with the greatest care. Even one-time surveys must strive to leave the respondent with the feeling of contribution and importance, and the willingness to participate in future surveys. Thus, our primary job is to consider computer-related techniques that allow the respondent to answer the survey completely and accurately in a natural environment.   Automated collection methods provide survey managers with opportunities to improve control and reduce sources of error. These methods also can be designed to capture workload and performance data in the background while interviews are conducted. However, these - features must not interfere with the natural interactions during the interview.   The transition to automated surveys presents additional challenges. For example, in a switch from mail questionnaires to CATI, the surveyor must work with the respondents to remove their 5         uncertainties about the transition in order to retain their continuing cooperation.   The arrival of a variety of automated self-response methods involving computerized questionnaires presents new challenges for ensuring that the respondent is sufficiently knowledgeable and comfortable dealing directly with the computer. As always, the respondent must be trained in the use of the collection process. Whether by simple instructions or more formal procedures manuals, the surveyor must work diligently to develop simple, clear directions for use, or risk losing the full cooperation of the respondent. For example, in the use of PDE, respondents must interact directly with computer displays. This requires understandable questions, adequate help facilities, and a clear set of allowable answers. Finally, just as managers must worry about interviewers' illness, absence, vacations, and vacancies, designers of automated self-response systems must include emergency back-up procedure to assure that respondents can complete the survey.   The design of the human-machine interface requires a clear understanding of what the respondent expects. Do people react to questions differently when presented on paper compared to telephone interviewers and still differently if posed from computerized displays or computerized voices? Also, what information is lost by changing from personal visits, where the interviewer can assess a variety of non-verbal clues, to telephone collection, or automated self-response where voices are not directly heard? What are the differences in application of these techniques in household versus establishment surveys?   While new automated methods provide many features attractive to survey designers, new responsibilities come with their use. The respondent must be assured of the confidentiality of the data provided. Confidentiality is the cornerstone of respondent cooperation, from the interview through final processing, estimation, and storage of microdata. Whereas face-to-face interviews provide an environment where the respondent can assess and control access by others, use of telephone collection and transmission of self-reported data creates new problems in confidentiality. The integrity and authenticity of the respondents answers during the transmission process is a related issue. The ability to transmit large volumes of data from remote sites may only partially solve collection, problems in some surveys that require actual signatures and protection of the transmitted data.     Interviewer   The second most important user is the interviewer. The systems provided to help in the interview process must be easy to use, must work consistently and must provide improvements in the interview environment. Early use of CAPI required interviewers to 6           carry the first generation of portable computers to the respondent's home. These heavy machines were often left in automobiles until the interviewer could decide that the respondent was home. The result was reduced productivity and higher costs.   Interviewers must believe that computer assistance will improve their effectiveness. They need to be convinced that the computer is simply a tool to speed and simplify their work. CATI, CAPI and CASI support specific wording for each question, and simplify moving to the next question, which is often dependent on previous answers. However, these systems can be over-developed so that interviewers are left little or no discretion for judgment or contribution. The result may be low morale, indifference, deviation from established procedures, and high turnover rates.   System Designers   The third important user is the system designer who may use the computer environment to design the survey and to lay out the procedures for its use. Besides the ease of use to both respondent and interviewer, the decisions made early in the development process carry over to the ongoing use and maintenance of the system for years. The design environment is similar to that used in any software development process. Software tools that support this "software engineering" process should give flexibility to the designer and provide, for long-term maintenance of the survey.   System designers face difficult choices, such as building customized systems from scratch versus linking standardized "off the shelf" software packages. The inevitable limitations must be compared against reduced maintenance and lower start up costs.   7         II.B. Federal Information Processing Standards   Today, more than ever, information is the force that drives the activities of the Federal Government and information processing systems are the mechanisms that process, store, and transfer this information. Information processing standards play an increasingly important role in the strategies of Federal Agencies to make more effective use of their information processing systems by providing needed interoperability of systems and equipment, portability of data and software, and methods for protecting data and computers from accidental and intentional harmful events. CASIC systems, like other Federal information processing systems will be more effective if they implement standards that provide for interoperability, portability, and security.   Within the Federal Government, the National Institute of Standards and Technology (NIST) has the responsibility of promulgating Federal Information Processing Standards and Guidelines for hardware, software engineering, electronic document interchange, data management, ADP operations, computer security, and ADP related telecommunications. in addition, NIST develops conformance tests for its standards where appropriate. Developers of computer assisted statistical survey systems should use NIST's standards and guidelines whenever possible during - the design, implementation, and operation of their systems. A reference to NIST's standards program and available standards and guidelines can be found in Section V under the heading of "Standards." Additional information about NIST's program may be obtained from:   Program Coordination and Support Group National Computer Systems Laboratory Building 225, Room B151 National Institute of Standards and Technology Gaithersburg, MD 20899 Telephone: (301) 975-2833     8       II.C. organization of the Report   This report is intended to provide reference and guidance for survey practitioners across the Federal Government in planning and refining data collection methods. By sharing information and experiences, others may gain and add to the effectiveness of governmental survey activities. The potential audience is much broader than those involved in statistical surveys. Many of the methods described and the technological issues discussed are applicable to any information collection activity, including the collection of management information, program cost, productivity,, and workload data.   Part III covers the 3 major areas of CATI, CATI, and CASI where the computer supports survey information collection. Each major application is defined and current survey application experiences are described. Each discussion describes the impact on specific survey error components and potential for future applications.   Part IV provides a discussion of broad technological and developmental issues in the use of computer assisted surveys. The areas selected for consideration are: the human-machine interface; software development; data collection systems; systems interfaces for data conversion; computer security; hardware planning; and network planning which includes electronic mail.   Part V contains references organized by categories consistent with the organization of the report.   Part VI contains the appendices. Appendix VI.A provides a discussion of cost measurement relating to use of computers to collect survey information. Appendix VI.B provides a general discussion of the improvements of quality that can be expected with the use of computers. Appendix VI.C provides a series.of survey efforts currently underway, with a point of contact for additional information. Appendix VI.D lays out a suggested classification model for surveys that depend on computer support. It is consistent with the various models in the body of this report. Appendix VI.E contains a glossary of words in active use where computers and surveys come together.   9       III. options for Automated Statistical Surveys   III.A. Computer Assisted Telephone Interviewing (CATI)   Definition Computer Assisted Telephone Interviewing or CATI is a computer assisted survey process that uses the telephone for voice communications between the interviewer and the respondent.   CATI replaces the traditional paper-and-pencil questionnaire interviewing. The questionnaire is displayed to the interviewer by the computer who then relays the question over the telephone to the respondent. The answers are given to the interviewer for entry into the computer. The collections of questions are structured so that computer examination of previous answers can be used to select the next question in sequence. Computer-generated help facilities can be initiated by the interviewer on command.   The interview environment can be computer generated or handled manually by the interviewer. As the CATI systems grow in sophistication, many manual functions will be taken over by the computer: sampling unit selection, scheduling of telephone calls, automatic dialing, and callbacks to respondents who are not reached on the initial call.   Data collected by CATI should have significantly fewer errors than manual methods because the interviewer can validate directly respondent's data that fails internal and historical edit checks. Time and cost requirements for data collection, validation, and data conversion should be reduced. Computer controlled questionnaires make it possible to use more sophisticated designs than can be administered with paper-and-pencil forms. They can include complex logic structures and questions finely tailored to the circumstances associated with a specific sampling unit.     Examples of Current Use   The exact number of CATI installations throughout the world is unknown. It probably is more than 1,000 considering the number of countries, universities, and private sector vendors and survey research installations involved in surveys. In 1988, the U.S. Government had 51 cooperating CATI centers.   Both opinion and factual data are collected using CATI. Most questionnaires contain a mix of these data types. Questionnaires range from several questions with very little data validation to several hundred questions customized for specific respondents providing the ability to collect conveniently the same data in different respondent environments.   11           The National Agricultural Statistics Service (KASS) within the United States Department of Agriculture (USDA) executed its first CATI questionnaire (Multiple Frame Cattle Survey) during 1982 in California using four workstations and completing 100 interviews. The questionnaire consisted of 41 questions. Today the largest known CATI questionnaire is the December Agricultural Survey. it is used in 14 states with questionnaires customized for each state. This survey has over 200 questions with production items recorded in units convenient to the respondent and converted to a common unit for data validation and recording purposes.   Today, HASS conducts a total of nine recurring CATI surveys. The surveys are monthly, quarterly and annual. In 1988, NASS completed 125,000 CATI interviews using 183 data collection work stations in 14 remote sites located in state statistical offices. Besides the recurring CATI activity, NASS conducted three special data collections in 1988 and two already were scheduled for 1989. The questionnaires were developed over a very short period. Training time was short. The data collection period was somewhat short (3 days - 2 weeks). NASS found that CATI lends itself very well to applications with short implementation schedules. Field testing of the questionnaires is efficient because once a problem area is identified, the questionnaire can be modified and tested on another respondent in generally less than an hour.   Also, the Bureau of Labor Statistics (BLS) currently uses CATI in 17 States to collect monthly data on employment, hours and earnings from 6,000 respondents. BLS further uses CATI (1) to collect Consumer Price Index (CPI) housing data; (2) to. collect hours at work and hours paid as an input to productivity measures; and (3) for special purpose studies to support Department of Labor initiatives. In addition, BLS uses CATI methods to conduct telephone record check surveys to improve data quality.   Computing Environment   The Uses of CATI are limited only by the capability of telephone technology and the use of personal interviewers. CATI is one of several phases of the total data collection process. It can be used for nonresponse follow-up where initial contact is made by CATI, mail or capi.   The ability to use varied data collection techniques is contingent upon the ability to develop computer questionnaires with common software that can support the various data collection options. Common software is important to assure the same data is collected and the same validations are applied.   The computer has to be responsive in delivering sample units and questions-to the interviewer. The computer response times for both interviewer and respondent must be less than what they would 12           perceive as an unnecessary delay. For example, experience has shown that longer than a second between questions is too long for an impatient respondent. Longer than half a second wait for the display of the next question is too long for the interviewer. During this period the computer may be required to access several databases and do complex mathematical computations which would include logical decisions affecting subsequent questions.   The computer must deliver a different sampling unit in less than 10 seconds, and ideally in less than five. During this period the machine may have to query several potential respondent queues that relate to scheduled callbacks in different time zones,- to previous busy signals to be retried every 15 minutes; to special handling of specific respondents by specific interviewers; to the generation of new sampling units; and to the disposition of the completed interview as correct.   The software that drives the questionnaire must be easy for the interviewer to use. Question paths through a questionnaire must be simple and easy for the interviewer to handle. Menus with abbreviated questions or questionnaire areas are desirable. Skipping back to an earlier question, changing that answer, and establishing another routs through the questionnaire must be easy and quick to do. Commands must be standardized for use in related surveys to enable "second nature" reactions by the interviewer in any given situation.   The design of a CATI questionnaire poses problems beyond the design of standard questionnaires. If the designer has problems developing the questionnaire, the interviewers will almost surely find it difficult to use. The objectives of the survey questions in a computerized questionnaire may be no more complex than questions used in pencil-and-paper surveys. However, the flexibility provided by automated question paths makes their design more difficult as the possible sequences of questions must be worked out during design. Paths and branching must be worked out in advance and there may be significant differences in question wording and in their number. Automatic sampling unit management can pose some difficult logic problems for the automated survey designer. Data validation using historical or internal data correlations is a complex logic problem, but is essential for recurring surveys. Well designed computer environments provide the interviewer with the ability to review the respondent's answers for correctness and to annotate unusual circumstances.   Before the computer questionnaire designer can begin, the questions must be developed by the survey staff using knowledge of statistical theory and specific subject matter. This survey staff also must be well versed in face-to-face, self-administered, and telephone questionnaire design. In the face-to-face interview the interviewer can offer explanations of the question, then probe for additional information; and if necessary,, provide the respondent 13           with the paper version of the questionnaire. The respondent can study, read ahead, reflect, and finally answer with a clear understanding of the meaning of the question.   For a self-administered questionnaire, the respondent no longer has the benefit of the interviewer, but still can examine the questionnaire in detail. in telephone interviewing the respondent may not have the form in-hand and thus may be missing the visual clues needed to understand the question. Therefore, questions used in telephone interviewing should be structured using single concept questions. Some simple applications rely less on posing very structured questions and more on a "forms-screen" approach. This approach replicates the survey form on the computer screen. Edit failures may be highlighted, perhaps with a different color, and the interviewer is trained to ask probing questions to reconcile suspected inconsistencies in the responses.   14           III.B. Computer Assisted Personal Interviewing (CAPI)   Definition   Computer Assisted Personal Interviewing (CAPI) is a personal interview conducted usually at the home or business of the respondent using a portable personal computer. In many respects it differs from CATI only in the presence in the same room of the interviewer and the respondent. As with CATI, the questionnaire is programmed into the computer with all the necessary logic to control the question path -- the logical flow of the questions based on such factors as previous answers -- and provides both for computer generated editing by pointing out inconsistencies to the interviewer and for direct editing by the interviewer. The system must be self-contained as the interviewer does not have immediate access to supervisory assistance or to other data sources. The interviewer reads aloud each question as it appears on the screen and records the respondent's answer in the computer while providing interactive assistance to the respondent.   Examples of Current Use   CAPI is currently being used by the National Center for Health Statistics (NCHS) for the implementation of the National Health Interview Survey (NHIS). The Census Bureau is performing the field data collection for NCHS. The NHIS is a household survey conducted in approximately 50,000 households per year. CAPI has been used to collect a portion of the survey data: the AIDS supplement questionnaire that requires approximately 15 minutes to complete. The 1990 Health Promotion and Disease Prevention Questionnaire of the NHIS will be fielded in January 1990. Major tests of CAPI have been conducted by the Bureau of the Census and the Research Triangle Institute. National Analysts conducted a nationwide CAPI for the USDA sponsored 1987 Nationwide Food Consumption Survey. The Bureau of Labor Statistics used CAPI for establishment record check surveys. National opinion Research Center also is experimenting with CAPI. In Europe, CAPI has been used by the Netherlands Central Bureau of Statistics to collect data for the Netherlands Labor Force Survey. The U.K. Office of Population Censuses and Surveys has also carried out a major test of CAPI. Most of these efforts are at an early stage of CAPI development.   Potential Uses   CAPI can be used for all household surveys and establishment surveys, and the software can be used f or any of the other automated data collection mechanisms. As the technology improves to provide lighter computers with longer battery life and user friendly software, CAPI will be used more often, particularly for quick turnaround surveys. Procedures for developing CAPI   15         questionnaires are similar to those for CATI. However, greater emphasis must be placed on help features because the CAPI interviewer cannot rely on nearby experts.   The type of resources and expertise needed to apply CAPI technology to a survey are dependent on the availability of a good authoring system. If an authoring system is readily available, the CAPI, survey instrument can be prepared by the typical survey instrument designer with little or no computer experience. Computer programming assistance will be needed to write the case management and output portions of the software. Usually these portions of the survey vary with each survey or survey instrument; therefore they must be custom programmed. On the other hand, if an authoring system is not available, the entire CAPI instrument must be custom programmed with either a general purpose language or a special purpose CAPI language. In either case, computer programming expertise is required. The level of expertise is dependent on the language selected. in addition, the survey instrument preparation will require the services of a survey instrument designer who will need to work very closely with the computer programmers.   16           III.C. Computer Assisted Self Interviewing (CASI)   Definition   Computer Assisted Self Interviewing (CASI) has been introduced into this report as a category to cover a new but growing area of computer assisted surveys that involves data collection without the direct presence of an interviewer. CASI can take several different forms that are differentiated by the collection method. These include Prepared Data Entry (PDE) where the respondent answers questions displayed on a computer terminal; Touchtone Data Entry (TDE) where the respondent answers computer generated questions by pressing buttons on a telephone; and Voice Recognition Entry (VRE) where the respondent answers questions by speaking directly into a telephone. We consider each in turn.     Background   Self-response data collection has always been used for many surveys that are mailed out. This form of self-response collection features simplicity in administration leading to low initial overhead when compared to CATI and CAPI. However, mail self-response necessarily involves a reduction in control over the collection process. It is difficult f or the survey practitioner to assess the status of the collection effort, e.g., whether the responses are in transit or still in the respondents' hands. Extensive mail or telephone follow-up involves great costs, perhaps offsetting the original simplicity of mail, and-risks ongoing cooperation, especially if the response is "in the mail."   In annual or quarterly surveys, mail may be the appropriate vehicle. In time critical surveys, the characteristics of mail collection leave wide gaps in control. Computer Assisted Self- Response methods now being introduced into surveys hold great promise to maintain the advantages of mail self-response, while improving control and the ability to intervene in the collection process.     Definition - Prepared Data Entry (PDE)   Prepared Data Entry (PDE) places the respondent in direct contact with a computerized questionnaire through a computer terminal. in a sense the computer is acting as the interviewer in a manner similar to CATI or CAPI interviewers.   The respondent uses a personal computer or terminal to fill out interactively the survey questionnaire. As each item appears on screen, instructions and definitions for that item appear on a split screen or are accessible by pressing a help key. As data are entered, range and consistency checks are automatically applied and 17         anomalies pointed out to the respondent. The response to previous items may control the question path of the questionnaire. Because of the lack of an interviewer to help the respondent, the guidance provided by the program must be substantial and the computer literacy of the respondent is essential, at least at this stage of development.   This category of automated data collection programs includes a rapidly expanding set of respondent initiated data entry and transmission methods. These methods are directly dependent upon the computer and telecommunications hardware available to the data providers. Individuals, small businesses, or reporting agents can enter data into a personal computer in response to pre-programmed floppy disks and mail the disks to the collecting agency. Firms with modems can transmit the data through telephone lines directly to the collecting agency's mainframe, or via an electronic mail service. Larger firms with mainframes can download the data to a PC, then either transmit directly from the PC over a modem to the agency's mainframe or place the data on a diskette and mail it to the agency.   These methods eliminate the need for rekeying the data and suffering the risk of data entry errors. The transmission methods using telephone lines save several days in each collection cycle by eliminating dependence on the physical transportation of machine- readable data whether by mail or special couriers. The data must be checked to detect and correct errors introduced during transmission.   Examples of Current Use   In the early 1980's, the Internal Revenue Service (IRS) decided that the electronic transmission of returns by tax preparers to IRS would be both a practical and cost-beneficial alternative to the mailing of paper tax returns when a refund is claimed. According to the Agency, the benefits of electronic filing would include: (1) reduced manual labor costs required to process, store, and retrieve returns; (2) faster processing and retrieval of tax data; and (3) reduced interest IRS must pay to taxpayers who file timely refund returns that are not processed on time by the IRS.   Further, IRS reports show that electronically transmitted returns are processed with significantly fewer errors than paper returns. According to IRS figures for the 1988 filing season, as of April 29, 1988, 20 percent of paper returns processed by IRS had errors and only 5.5 percent of those filed electronically had errors. For taxpayers, electronic filing can mean refunds up to 3 weeks sooner, and because IRS can deposit these refunds directly into taxpayer bank accounts, refunds may arrive 3 to 4 days earlier   18           than that. For tax preparers, the ability to provide electronic filing services to taxpayers promises a competitive business edge.   The Petroleum Supply Division (PSD) of the Energy Information Administration (EIA) decided in 1987 to investigate electronic forms submission to collect the Petroleum Supply Reporting System (PSRS) survey forms. Ten of the major petroleum companies who file the mandatory "Monthly Refinery Report" were contacted to assess their PC and communications capabilities. The respondents contacted showed interest in investigating the use of PC's to collect this data. Often they were already using PC's for business, personal or academic purposes. The respondents either had a PC in their office area or had access to one in another office. Software such as Lotus 1-2-3 and dbaseIII could usually be found on these PC'S. Some PC's were equipped with communications capabilities and those respondents were already using telephone lines for company reporting. It appeared to be the appropriate time for the PC to enter the PSRS data collection process.   Early in 1988, PSD developed the Petroleum Electronic Data Reporting option (PEDRO) and began providing its respondents with a software diskette by which they could create an electronic image of the form on a PC screen and enter their data in the appropriate cells. Firms having the necessary software capabilities can use their data base to feed data directly to the electronic survey form eliminating keying and transcription errors. User-friendly software with help functions has been added to data entry functions to provide quick reference to definitions, conversion factors or other information to speed the completion of the survey form. This eliminates the need to search hard-copy files for survey forms instructions, product definitions, conversion tables, etc.   Definition -- Touchtone Data Entry   Touchtone Data Entry (TDE) has been used for many years in the private sector for a growing range of applications. TDE, also known as voice response, is used for banking by telephone, call routing, college class registration and "talking yellow pages" to name just a few. The process is simple. The caller initiates a call to a computer which asks a series of questions. The caller answers using the touchtone keypad and the tones are recognized by the computer. The process offers inexpensive collection because there are little ongoing labor costs after development.   In a survey environment, TDE may be applied where the desired responses are numerical, or when responses can be linked to a numerical code, such as "yes" is "1" and "no" is "0." As in other applications, the respondent initiates the call to the collection computer which controls the flow of the interview. The computer asks questions in either a synthesized voice or from a file of   19           digitized phrases prerecorded by a human speaker. After each question, the respondent keys the answer. The computer also repeats each entry for verification directly with the respondent, and an acknowledgement is required, such as "1" equals "correct."   TDE offers many advantages over other collection methods. In repetitive surveys, the respondent retains a single form for monthly or quarterly calls, reducing.the costs of both postage and the labor involved in mail handling, both outgoing and incoming. Costs for data entry and data verification are eliminated. Most importantly, the uncertainty about sample status is minimized. The status of the sample can be assessed through analysis of the received calls versus the list of active TDE respondents. Informed judgments can be made about the timing and extent of the nonresponse workload. No time. is. lost while survey forms are in the mail or waiting for data entry. This is especially important for time-critical surveys.   TDE also offers convenience for the respondent. The computer is always available to accept the calls. For busy respondents who are frequently out of the office or away from home, in meetings or traveling, this feature may be preferable to scheduling calls in advance and risking interruptions and repeated callbacks. TDE reporting may require less time than CATI.   TDE has some limitations that should be carefully addressed in each survey environment. First, not all respondents have touchtone phones. Thus, implementation of TDE would likely be in combination with other collection modes, adding to the complexity of survey management. As with mail collection, the respondent also may need to be reminded to call in, although a simple advance notice postcard has proven very successful when properly timed.   Examples of Current Use   The only known survey application of TDE is the Current Employment Statistics (CES) survey at the Bureau of Labor Statistics (BLS). The CES program covers over 300,000 non-farm business establishments monthly. The, data items are few, essentially employment, hours paid, and earnings, and the CES is conducted by mail in conjunction with each state, the District of Columbia, Puerto Rico, and the Virgin islands. collection of CES data is time critical. Preliminary estimates are published after 2 weeks of collection. Thus, the time lost due to the variability of the mails has A severe impact on response rates.   Initial experiments were done using CATI. Large scale tests of CATI collection, involving 13 states and over 5000 respondents monthly, successfully showed the ability to collect data from the vast majority of respondents in time for the first publication. More than half the CATI sample was drawn from chronically late   20       respondents. Response rates are routinely 85 percent versus 50 percent for mail.   The higher costs of CATI stimulated interest in TDE self- response. The results of small scale tests in 4 states suggest that TDE can retain high response rates over a sustained period. Calls average less than 2 minutes, and about 25 percent of respondents are given short reminder calls just before the collection deadline. BLS is expanding TDE use to over 15 states during 1990.   Procedurally, the combination of advance notice postcards, timed to arrive during the reference period, and short nonresponse calls provide a strong, inexpensive collection process. TDE respondents receive a package of materials that explain the new collection method, how it differs from mail and telephone collection. First-time TDE users are requested to call the computer on a test basis using special codes before they are asked to submit real data. The machine readable data are uploaded to mainframes for further editing and reconciliation.   The respondents chosen for the first TDE tests were drawn from those under CATI collection. In this way the higher costs of CATI can be offset by savings from TDE. Other TDE tests targeted mail respondents who generally reported on time.   The widespread use of touchtone systems has spawned an industry- wide working group to standardize features (e.g., the key on the telephone) to simplify user access.     Definition -- Voice Recognition Entry   Voice Recognition Entry (VRE) is Just developing as a technology. The characteristics of VRE are essentially the same as TDE. The respondent initiates the call to the computer, but instead of using the touchtone keypad, the respondent speaks to answer, in this application the spoken digits 0 through 9 and "yes" and "no." Both "oh" and "zero" are recognized.   There are two essential features for VRE systems. First, they should provide speaker independent recognition, meaning that almost any voice can be recognized without any "training" of the system. Some systems require extensive training of the software for each voice. While this is used in some office dictation systems, it is probably impractical for survey operations. Also, systems should provide for rapid entry of responses using continuous or connected digits. These features are commercially available for both microcomputers and minicomputer applications.   VRE also has limitations in application. First, VRE is only applicable to respondents with access to a phone, a small but   21       unavoidable problem. Recognition accuracy is the primary determinant of respondent acceptance. The system in use at the Bureau of Labor Statistics was designed using speech profiles drawn from the mid- western states. Dialects from other regions may reduce the accuracy of the recognition leading to respondent frustration and low acceptance. Early test results suggest that recognition remains high in Maine, the home of a very difficult dialect for the speech interpreting algorithms. More testing is planned to decide the limits of current technology. Improving, recognition accuracy is the primary objective of the companies involved in speech research and development.   Development of VRE is presently limited because there are few current applications to provide advance training and public acceptance. Early results suggest that respondents familiar with TDE and VRE prefer the later as more "natural." This finding points out the differences in questionnaire design. TDE questions ask respondents to "enter" data, whereas VRE respondents are asked questions in a manner similar to CATI because the responses are spoken. Recently, experiments using voice recognition have begun to appear, conveniently providing training for future survey respondents. Also, the similarities between TDE and VRE may minimize acceptance problems.   Both TDE and VRE applications at BLS use short questionnaires. These techniques may limit the length of the survey, but this requires testing. They provide convenience and low costs, but respondents may balk at long lists of questions and the current limitation on the range of allowable answers to numbers and a few words. VRE offers a variety of interesting research problems in speech recognition and natural language understanding. These systems have not yet come into widespread use.     Examples of Current Use   The BLS is now conducting tests of voice recognition in the CES survey. The procedures will parallel those used for TDE and will assess the effectiveness of VRE for the entire U.S. population. They will examine any limitations involving multiple telephone systems, geographic distances, and respondents, acceptance. Acceptance by respondents has been high.   Potential Uses   These computer assisted self-response methods have wide potential applications. Ideal surveys are repetitive, short and numerical, especially if the data are entered into a computer before the call is made.   22       TDE has been considered for screening eligible respondents from the population. Since eligibility is usually determined by very few criteria, a mailed form could direct the respondent to call in the answers to one or two questions to a central computer. After entering the unique identification number, the respondent would answer these questions. Then the survey manager would use the machine readable file for nonresponse follow-up and subsequent sampling.   BLS is considering TDE for pilot tests of survey supplements and other special one-time surveys to reduce costs and add valuable control, to augment or replace the traditional mail process, and to gain experience in the design and use TDE systems. The logical extension of existing TDE and VRE technology is the linking of them into a single system. For example, respondents call the system which then asks the respondent to respond by touchtone. If the tone is not recognized, the respondent is automatically switched to a VRE component. A third feature would be available to record changes in the respondent's attributes (e.g., name or address), or to record open-ended responses for later transcription -- voice mail.   Self-response methods are not limited to survey applications. Any ongoing project that collects cost, workload or other management data could use self-response methods for inexpensive collection. For example, a large copier company uses TDE for collecting billing information. Equipment renters are required to call in the monthly usage levels by entering copier usage as touchtone data. The computer then generates a bill in response to the touchtone entry. Also, the U.S. Postal service uses TDE to link callers to prerecorded tapes covering the most frequently asked questions. The BLS will begin using similar technology to answer routine inquiries for economic information.   Future   Voice technology is still being developed. "The NIST report argues that the most natural mode of data collection is not paper or keyboards, but speech" (William Nicholls, 1989). Recorded voices are currently being used in some surveys. Speech technology includes voice simulation which is useful today in TDE applications. While numerical and very limited vocabulary are being used in data collection, it will be some time before automated speech systems will be used to recognize free-form human speech in. a telephone interview or in a personal interview setting.   Summary   Some items to consider when deciding between data collection methods are as follows:   23         1. CATI offers cost saving over the personal interview setting and would be useful f or a large complex survey environment. However, it misses people without telephones.   2. CAPI retains the benefits of a personal interview setting where response rate is important, and does not require a telephone.   3. TDE is cheaper than CATI, but cannot handle the complex survey, and respondent acceptance is a concern.   4. PDE is typically used in an establishment survey. It does not require a separate-key entry stage, but requires respondents to have access to a terminal, typically a PC.   5 . VRE will see only specialized application in the medium term.   Whichever technique is selected, the integration of the electronic data collection method into computer based survey system should be considered. For example, address; labels and other administrative items must be created from the sample database, then the interview proceeds, editing is done, and the resulting data are fed into the analysis or summary system.   Also, the decision maker should consider whether to use a single or mixed mode of data collection. Two examples of mixed modes are the Census" integrated CATI/CAPI design, or the BLS" integrated TDE/CATI design. William Nicholls comments that "In the long run, the best data collection strategy for establishment surveys may prove to be a readiness to accept whatever combination of methods the respondent finds most convenient." The creation of new technologies and improvements to existing technologies will continue to have an effect on data collection methodology.   24       IV. Methodological Issues   IV.A. Human-Machine Interfaces   Introduction   The design of the interface between a person and a computer can decide the success or failure of the interaction. Although the situation is improving, there is generally too little attention paid to the effect of interface design on user performance. Interface design is often not considered until the last stages of software development when the total design has already been "locked-in."   Automated surveys will involve people with widely differing abilities using machines ranging from manual data-entry devices to powerful,computers. Interface issues will reflect this diversity in people and machines. There is no one interface that will satisfy all needs. The relative importance of a given interface issue will depend entirely on the context of person-machine environment. Nonetheless, there are some guiding principles of user interface design.   CASIC benefits from consideration of user-related factors in interactive systems, interaction styles, interaction devices, response time considerations, system messages, printed manuals, online help, tutorials, and development styles. Many of these topics involve detailed consideration of how to present the computer power to the user. For example, interaction styles can be broken down into command languages that the user must learn before using the computer, menus that guide the user through the necessary procedures, and the direct manipulation of objects whose icon representation appears on the screen. similarly, interaction devices can take on many forms -- keyboards, function keys, pointing devices, speech recognition, displays, printers, etc.   Techniques for automated information collection include CATI CAPI, computer assisted self-response surveys, and prepared data submission on tape. Except for tape submission, these techniques involve user interface design considerations. All must be successfully used with little or no training. The user interface must be "self-evident." Error recovery is important. The user must be protected from making errors wherever possible. When it is possible for the user to err, the recovery procedures must be positive, helpful, and easy to follow.   User of the Interface   It is essential to determine who the user of the interface will be before designing the interface. In automated statistical surveys, a user may be a well trained and highly motivated survey     25       professional. At the other end of the range, the user may be a first- time or only grudgingly cooperative survey respondent. Even within somewhat narrow user populations, there will be differences among users that can affect the usefulness of the interface. It may not even be possible to design an interface that perfectly suits a single user because the user is subject to changes over time due to personal factors, new experiences, and changing needs. A user-interface design team should include an applied psychologist to help determine the psychological profile and needs of the user. The personality, training, and experience of the potential users are large factors in determining the most appropriate interaction style or styles for the user interface.   Interaction Styles   The choice of interaction style is also affected by the hardware to be used in the survey. Survey techniques that make use of computers with standard input/output devices can use command languages, menus or direct manipulation. Command languages are used to interact directly with the operating system of the computer. They allow a wide range of system functions -- storage, deletion, copying and printing of files -- to be done. The cost is a steep learning curve to master the commands. Command languages, while hard to learn, are also easy to forget. They can be intimidating to novice users who realize that information can be lost or damaged by poorly chosen commands. On the other hand, a person familiar with command languages can work rapidly and effectively. For some people, mastery of a command language is a source of pride which provides a sense of satisfaction and motivation for good job performance.   Menu selection represents another approach to interaction style. Menus present the user with a set of only those choices that are appropriate at a given time. The choices are often numbered or lettered so the user can choose by entering the appropriate number or letter from a keypad or keyboard. Sometimes the choices are keyed to the first letter of the line containing the choice. Then, the designer must be sure to avoid duplicate use of the starting letters. Some menus use pointing devices such as cursor keys, a trackball, a joystick, or a mouse to highlight choices. The user moves the pointing device to make a choice, then pushes a button to make the selection. Also, menus may offer only single-line choices. For example, a menu may ask for confirmation of a request by entry of y (for yes) or n (for no).   Menus are often organized hierarchically in graphs - data structures used to represent relationships among objects. Family trees are a form of graph that show the relationships of a person to other family members. Airline route maps are graphs that show paths the airline follows in flying between locations. With menus, the user is essentially "flying" by making selections from the   26       graph of menus (the technical term is "walking"). Selection of one item from a menu takes the user on a different path through the graph than does selection of another item. Graph structures can ease the design problem for complex user interfaces, but also can lead to user confusion. The user must be able to maintain a sense of location in relation to previous choices made. The user also must be given easy access to "escape hatches" if an unwanted path (undesired choices) has been walked on the graph. CATI and CAPI designs rely heavily on complex branching structures to control the interview. The menus and list of allowable responses must be clear, exhaustive and enable the interviewer to retain effective control.   Direct manipulation (DM) interfaces offer a third approach to interaction style. in DM, the user is given the impression of directly interacting with the objects of interest. As an example of a DM interface, consider a modern word-processing system. The screen representation of the document is made to be as close to the appearance of the finished document as possible. This is sometimes called WYSIWYG, (pronounced "whizzi-wig"), for "What You See Is What You Get." The user operates directly on the screen representation of the document and immediately sees the results of the operation. Many commercially available graphical interfaces show how far DM can go toward helping the user. A mouse is typically used as the pointing device to objects on the screen. A typical screen object is an icon that symbolically represents the object. To delete a file, for instance, the user simply points to the file name and "drags" it over to a trashcan icon.   Menu selection and direct manipulation are important user interface techniques in situations that involve novice users with little opportunity for training. Although the interfaces must accommodate novice users, they also must be flexible enough to avoid frustrating more experienced users. Direct manipulation can accommodate novice and experienced users equally. Menu systems should allow experienced users to "select ahead" or to revert to a command language style of interaction.   Survey techniques that do not use more-or-less standard computers will raise unique interface issues. Alphabetic input, such as name entry, in telephone keypad-entry systems raises the question of letter assignment to keys that have multiple letters on them. Disambiguation may be possible when the entries can be compared to a fixed list of permissible entries.   Speech recognition and synthesis devices have the potential for radically changing the preferred interaction style in user interfaces. Although speaker-independent recognition of free-form spoken natural language is still in the future, rapid technological advances are being made in the ability to recognize automatically a subset of articulated words. Advances are also being made in the ability to synthesize natural-sounding speech under computer   27         control. The best form of human-machine interfaces in any give situation or for any specialized group of users is still a research question. This can lead to degradation of the quality of the survey due to user errors and frustration.   Some survey techniques are already speech based. In CATI and CAPI, the user interacts with a speaking and listening person who is visually and manually interacting with a computer. The person conducting the survey uses common sense to interact with the respondent. Although there are substantial efforts to imbue a computer with common sense, practical use of this research remains in the future. Thus, the effective replacement of the human interviewer by a computer also remains in the future.   Error Avoidance and Recovery   Whenever possible, interfaces should be designed so that errors are not possible. The nature of potential errors in a given interface must be thoroughly understood to lessen the probability of their occurrence and the cost of recovering from them. When a particular sequence of operations is necessary to do a complex operation, the interface should be designed to combine the entire sequence into a single operation. This will reduce the number of operations required of the user (who probably thinks of the sequence as one operation anyway). All displays must have consistent layouts so the user does not have to spend time and mental energy scanning the screen for information.   The interaction style can have a profound effect on errors. Properly designed menu systems can reduce errors by simply not offering poor choices. Choices offered must be clearly labelled. The consequences of a choice must be shown before the choice is made. There must be consistency between menus. For example, a choice common to all menus (such as Cancel Menu), must appear in the same place in each menu and must have the same consequence (such as reversion to the previous menu).   Error messages should be designed to help the user. The messages should be specific, positive in tone, and constructive. They should tell the user what can be done to correct the error. Whenever an error is made, the user must have a clear and easily followed path to recovery. This not only reduces the seriousness of the consequences of the error, but increases the user's confidence even in the face of a few errors.   Adequate training can help to reduce errors and increase respondent acceptance. Certainly, respondents should be trained before using the system. Good training can be reinforced by providing on-line or telephone-accessible help and on-line tutorials. on-line or telephone-accessible help gives the user an   28       immediate reminder about proper operation of the system. On-line tutorials allow the user to review the correct procedures.   Design of Automated Form   In general, automated forms should not be automated versions of the manual forms they replace. They should be designed from scratch to consider to make use of opportunities and limitations introduced by automation. Sometimes, it might be appropriate to maintain the same "look and feel" between a manual form and its automated counterpart. For instance, user training might be reduced by minimizing changes. In these cases, the form designers should compare the benefits of staying with the old form with the costs of designing a new form.   Automation provides opportunities for higher productivity, lower errors, and greater user satisfaction over manual methods. Repetitive information can be automatically filled in from one form to another. Automatic editing for internal consistency and logical consistency should help to lower error rates. Automated forms also can provide on-line help and tutorials for the user.   Automated forms need not even look like paper forms. The user can be led through an interactive dialogue while the computer does the data formatting. Form fill-in is just one interactive style. Menu selection has already been mentioned as another style. Form designers should consider using hypertext, a recent development in interactive systems which provides a browsing environment. For example, the reader can display a definition simply by pointing at a word or phrase with a mouse. Hypertext would allow non-linear traversal of forms, as appropriate for the data being filled in. For example, in surveying for medical information, gender data can be used to steer the user around inappropriate survey questions.   Form designers should have a repertoire of techniques for designing and testing forms. Expert systems might be developed to help in form design and interaction design. Effort placed in designing expert systems would pay off handsomely in easing individual design tasks. Such systems also should produce forms that are more consistent and complete than forms produced in a paper environment.   Quality Measures   It is critically important to test user interfaces before presenting them to the users. Professor Ben Shneiderman of the University of Maryland has identified five goals that lend themselves to precise measurement:   29           1. Time to learn - how long does a typical user take to learn to use the system?   2. Speed of performance - how long does it take to carry out a benchmark set of tasks?   3. Error rate - how many and what kinds of errors are made by typical users?   4. Subjective satisfaction - how much do users like using the system?   5. Retention over time - how well do users maintain their knowledge?     It is not enough to guess how well a system meets these quality measures. It is essential to test the system. A testing laboratory is essential for any significant design work. Design groups may build in-house laboratories, or may seek help from existing laboratories. It often happens that persons who are skilled in computer programming, data collection techniques, or statistical methods are not fully aware of the skills and deficiencies of the user population. It is not a good idea to concentrate the entire design effort in the hands of task specialists. The human factors role must be an integral part of every design team. Large teams might include psychologists, sociologists, and other human factors specialists. Smaller teams should at least assign one team member the role of human factors specialist. If nothing else, this person can play "devil's advocate" to be sure the appropriate questions are raised.   Data about user performance under current conditions must be collected before beginning new systems. It will not be possible to determine the relative quality of a new system unless quantitative measures of the quality of the old system are available. The first task of the design team must be to develop guidelines for the design. Such items as menu selection formats, terminology, screen layout, data entry formats, error messages and recovery procedures, on-line help, and training should be considered and decided upon before any other significant design work is begun.   Rapid prototyping is a powerful technique which allows .iterative convergence to a design. Partial system implementations are made quickly, presented to potential users, and tested. Further development is based on these interim tests. Because each step in the development cycle is small, and tested incrementally, only small corrections in direction are needed at each step. Conceptual errors are quickly uncovered and are easy to correct. Rapid prototyping methods contrast sharply with the more conventional "waterfall" design methodology. The waterfall method requires detailed up-front specification of the design, with a 30           full-blown design f lowing down to a full-blown implementation. While this method may be appropriate in situations where the goal is clearly understood at the start, it has the disadvantage that changes made in any phase of the design tend to be large and expensive. This usually discourages change and leads to Acceptance of a lower,quality product or total abandonment of the design. A disadvantage of rapid prototyping is that formal specifications and documentation may never get produced in the flush of excitement over the rapidly evolving (and working) system. The waterfall methodology is appropriate as the final phase of a rapid prototype design. Because rapid prototyping quickly produces a working model and deep understanding of goals and tradeoffs, waterfalling can be effectively used to provide the missing rigor and discipline.   Evaluation must continue. even after a design has been completed and fielded. on-line suggestion boxes and trouble reports, designed right into the survey forms, provide easy channels of communication between the user and the designers. A user who suggests improvements or reports trouble should receive prompt responses and fixes. Large surveys might consider the use of a commercial bulletin board system as the communications medium for problems, suggestions, and fixes.   31       IV.B. Software Development   Introduction   There are two types of software that will be discussed in this section: software that helps in the creation of a survey questionnaire and software that makes up the actual programming code to execute the survey questionnaire in the field. This distinction is directly analogous to the usual notion of a highlevel programming language (e.g., FORTRAN, COBOL) in which you describe the problem in terms that humans can understand. This high-level description is then passed to a compiler that translates the description into an application program the computer can understand. For convenience, refer to the survey creation software as the survey definition process and to the use of the resulting application program as the survey application process.   Most of the discussion will relate to the creation software. Historically, software development for automated field data collection began with a mainframe application for CATI. As hardware technology progressed, CATI was moved first to a minicomputer and then to a microcomputer. The CAPI application became possible with the development of the "light weight" portable microcomputer. Software to produce an automated questionnaire is perhaps the most important and potentially the most costly ingredient in the automated field data collection equation. Ideally, such software should be available off- the-shelf . Although there have been several attempts to develop such software, success has been limited.     To date, the development of automated questionnaire software has been done in one of two ways. The questionnaires are custom programmed using one of a variety of general programming languages (e.g., Pascal, C, FORTRAN), or they are custom programmed using a specialized CAPI/CATI programming language.   The specialized languages generally provide a means to describe a variety of attributes: the question text; the answer text; the type of answer expected, (e.g., single, multiple, fill-in, free text) ; question paths (e.g., simple -- go to next question in order or complex -- based on the answers to previous questions, or some related calculation); response editing (e.g., restrictions to specific values or - range of values) ; and in some instances, screen layout design. In either case, the development of an automated questionnaire usually has required the skill of a computer programmer.   32       Flexibility   There are several issues that need to be considered in the development or purchase of existing software for automating field data collection of survey questionnaires. Among these considerations is the level of flexibility needed. Flexibility is defined in terms of the amount of control the automated questionnaire exercises over the conduct of the survey and in terms of the features available to design an automated questionnaire.   With respect to the control, consideration must be given to the extent the automated questionnaire will allow the interviewer or respondent to exercise control over the conduct of the interview. That is, should the person controlling the interview have the same control as in a paper-and-pencil conducted survey; total freedom to roam anywhere in the Questionnaire and change questionnaire answers at anytime or should the automated questionnaire be designed to limit the person collecting the data to a specific process and skip patterns or some level in-between? If so, what is that level? The answer to these questions is critical because the software selected, particularly if - it is a specialized package, might not have the specific capabilities needed to implement the desired design. The design of the questionnaire software also will be affected dramatically by the level of flexibility chosen.   With respect to software flexibility,. there are several capabilities that should be considered. These capabilities are:   1. The question types: open ended, closed ended, single value, multiple values.   2. Case management: administration of each questionnaire, e.g., status of completion, restart .incomplete questionnaire.   3. Back-up: ability to back-up to any question in the survey and change an answer, with the system thereafter automatically following the skip patterns implied by the changed answer.   4. Editing: ability to perform edits such as consistency, range, and specific value or values.   5. Screen manipulation: ability to create any screen design desired.   6. Comments: ability for person recording answers to record comments associated with any question.   33       7. Skip patterns: simple and complex, e.g., skip based on answers to previous questions or some arithmetic calculation.   8. Context sensitive help: ability to get help based on place in survey.   9. Rostering: ability to handle household member enumeration, identification, and skip patterns based on the individuals.   10. output format: form collected data is stored, e.g., a flat file.   ii. Accessibility of collected data: how easy is it to access the data, e.g., quality control.   12. Coding: ability to code collected data automatically or manually.   13. Authoring system: ability to create questionnaire and software to execute the survey questionnaire (program code) simultaneously with no computer programming skills.   14. Output reporting: reports about the functioning of the data collection process and about the actual data collected.     This list of features is not inclusive, but does contain the most important features determining the level of flexibility.     Range   There are several additional factors that are important to the decision of level of flexibility and software design. These factors are the size and complexity of the survey questionnaire and the period between major changes in the questionnaire or the preparation of an entirely new questionnaire. Complexity is defined by the number of different question types, complexity of skip patterns, and need for Fostering. Size and complexity are directly proportional to software development time. The shorter the period between major software developments, the greater the requirement for a user-friendly authoring system. An authoring system significantly decreases development time and decreases computer programmer dependency. The size of the questionnaire also may impact the hardware and software requirements. Several software packages have certain restrictions that may be affected by the size of the application.   34     Automated Forms Design   Unlike CAPI and CATI software, there are many off-the-shelf software packages that can produce automated forms f or computer assisted data entry. Many CAPI and CATI specialized software packages also can be used for this function.   Training   The amount and type of training required to use selected survey questionnaire development software is dependent upon the level of - user-friendliness of the software. For example, programming the questionnaire in Pascal would require considerably more skill and therefore more training than programming the questionnaire using an authoring system. Usually, it is necessary to have a skilled computer programmer working with the survey questionnaire designer in order to use the current software. Under these circumstances the questionnaire is most likely to be a pencil-and-paper questionnaire programmed for the computer rather than one designed for the computer. Computerized questionnaires will improve in quality as their designers come to understand and use the environment provided by the computer.   Software documentation for the specific survey questionnaire should be complete enough to insure easy revision of the questionnaire by someone other than the original author. For the general programming languages there are many software packages available to help in such documentation The liberal use of comments in the computer programming code also is a good way of providing additional documentation.   35       IV.C. Data Collection Programs   Introduction   When producing a survey, several factors will affect the selection of a data collection method. The three primary factors are cost of resources, the time available to collect, edit, and summarize the data, and the desired quality. Because it is unusual to have all three in abundance, trade-offs must be considered.   Several other important factors relate to the design and operation of the survey, and will affect the cost timing and quality factors. First, the survey may be one-time or ongoing. A one-time survey may want to maximize quality for a fixed cost, where an ongoing survey - may want to maximize quality for a minimized cost. With ongoing surveys automated capabilities can evolve over extended periods thereby spreading out the costs. The second factor is the target population, and whether it is a household or an establishment. The chance of finding PC's in establishments is greater than in households, although not all households have telephones. The third factor is the operational nature of the survey, that is whether the setup should be centralized or decentralized, and whether the PC's would be networked. Lastly, the sample size and complexity of the questionnaire is relevant.   The remaining nine factors relate to the characteristics of the technology used to collect data.   1. The Speed at which data may be entered is determined by the technology's hardware (such as XT, AT, or 386 PC's, disk speeds, and phone lines) and software (the complexity of the questionnaire and therefore the length of the program).   2. The Size of the machine can refer to its weight or ungainliness (which is important in situations where it must be moved around) or its available memory (which limits the amount of data and the complexity of the program that can be stored on the machine).   3. The portability of a computer's software is important in situations where data collection is carried out on different computer systems.   4. The Type of Display selected may be based on environmental factors (where conditions are indoors and usually fixed, or outdoors and variable therefore screen color is important), and on the complexity of the questionnaire (and therefore screen size).   36         5 . The Mode of Data Entry varies from keyboard, to push button phone, to voice data entry.   6 . Data verification is based on the importance of quality, the complexity of the data, and other factors as hardware speed and available memory.   7. The Database Generation refers to the way in which the data is brought together and integrated with the rest of the survey system. This may mean using telecommunications, or simple computer tasks.   8. The Hardware selected is based on cost, amount of time available, data quality desired, power of the machine, amount of memory, and other available features.   9. Training is important in any survey, and the amount of time available and the background of the staff dictates the technology chosen.     The priorities of these factors and the relationships between them help to decide which data collection strategy to use. A discussion of these factors with regards to CATI, CAPI, and other methods, follows.     CATI   Introduction   In a CATI interview, the interviewer is helped by an interactive computer system. It provides data quickly and offers good reliability, but a substantial cost investment is required to purchase and set up the system. The cost investment may be greater than other electronic data collection techniques, but it saves money over face to face interviews, since data entry is combined with data collection. It also can be used for follow-up of nonrespondents or edit failures, or key in of mail questionnaires. It can be used in a household or establishment survey with complex questionnaires (typically a new or infrequent survey where time series interruptions will not cause problems, and where sample size is large, or small and used over a longer period). It can be operated in a centralized or decentralized manner, but it requires the respondent to have a telephone.   Hardware: The first generation consisted mostly of mainframe based systems, but the current generation consists of either multiuser minicomputer systems, or distributed systems over a PC local area network (LAN). The minicomputers are often UNIX-based and   37       used mainly in large centralized facilities that require greater resources to pay for specialized support staff. The PC's are mostly DOS-based and are used in multi-location f acuities. An added benefit of PC,s (even in large facilities) is that many clusters of networks can be used, and PC's can be added one at a time (lower initial cost).   Speed: With minicomputers, the speed between questions could slow as the number of interview stations increases, or if another computer intensive program is run. With PC's on a LAN, the speed between interviews could slow as more stations are added to the network. Eventually, faster computers will solve this problem.   size: The organization of the system (centralized or decentralized) and the hardware (minicomputers or PC's) will affect size requirements. The system can range from a single stand-alone PC to 100 or more workstations on a mainframe system. The PC's and minicomputers usually have from 5 to 60 networked workstations.   Portability: The software should run on multiple hardware platforms with different operating systems. It should be written in a portable language and use common user interface standards. Today, software costs are increasing while hardware costs are decreasing. Portable software should provide a cost savings across different hardware platforms.   Displays: The use of color can aid the interviewer, but the Color Graphics Adaptor (CCA) standard is not clear enough for use over a long time. Either the non-composite monochrome, the higher resolution Extended Graphics Adaptor (EGA), or the very high resolution Video Graphics Array (VGA) standard should be used. However, EGA and VGA are more expensive.   Data Entry: screens can be item based, screen based, form based, or a combination of these. Movement between items can be forward only, or forward and backward. Most systems have question skipping and branching capabilities, interviewer notes can be added, and the interviewer can resume at the point where the previous session ended.   Data Verification: The data quality is improved by incorporating longitudinal (historical) editing, arithmetic calculations, range, and consistency checks.   Database Generation: Outputs consist of an audit trail and response data. Often numeric and open ended data is stored separately, then linked by respondent number. Some systems include cross- tabulation capabilities, and the ability to generate accurate and timely reports is a benefit.   38       Training: one benefit is that centralized supervision and monitoring is available (on-line and audio-visual). It helps the supervisor identify interviewers who need more training.     CAPI   Introduction   In CAPI, the equipment is less expensive than CATI, but travel costs are higher. It requires the same amount of time as personal interviews, but data quality is improved and the separate data entry step is deleted. One advantage of the personal interview setting is that it causes higher response rates.   Hardware: The following criteria can be used to evaluate potential portable computers: interview duration and complexity, memory capacity, weight, power source,and duration, screen size and legibility, disk type and capacity, speed, serviceability (important because service centers might not be locally available), portability, durability, price, ease of use and software compatibility.   Speed: The speed depends on the computer hardware and complexity of the questionnaire.   Size: A larger portable computer would be needed to put a complex questionnaire in- 2 languages. Even a small portable computer is not necessarily portable as many have complained that they are too heavy to carry around for very long. Electrical outlets are not always available. The battery power required for additional memory and for disk drives can add substantially to the weight requirements. Although small portable computers can be used on a table top or in one's lap, interviews conducted on the doorstep require handheld computers. That technology is coming but has yet to arrive for general use. A smaller portable computer, or one with a different keyboard would be needed for this environment.   Portability: As in CATI, the questionnaire writing software is often portable from one type of hardware to another.   Displays: Different portable computers have different size screens with various readability factors. The various lighting conditions that would be met in the field is also a factor. For example, a "back light" screen is required for dim lighting conditions. If the interviews are conducted outdoors, glare reflection is a problem.   Data Entry: often the software that was designed for CATI is also used for CAPI. It provides forward and backward movement, and incorporates skipping and branching between questions.   39       Data Verification: Similar to CATI, improved data quality results from reduced clerical and machine activities, and being able to incorporate various editing techniques.   Database Generation: Data output can be consolidated more rapidly due to reduced clerical and machine activities. Data transmission options are mail, courier, or phone lines. Data security and the quality of phone lines may be a factor against using phone lines.   Training: Basic interview skills are considered very important (even more so than computer knowledge) . With this assumption, training should on the computer and questionnaire details. Training materials can include a tutorial (helps coordinate the different learning rates), self study materials, and hands on practice with interviews. Good software and manuals are also important.     CASI   Data collection using TDE requires the respondent to have a touchtone telephone, and a dedicated computer with a multiple phone line capability at the other end. one benefit to the respondent is the convenience to call in at any time.   Existing TDE systems limit editing primarily because of limits on hardware capacity, lack of visual clues and restriction to push buttons on the telephone. However, the computer can synthesize the answer and play it back to the respondent thereby providing the opportunity to verify or correct the answer. TDE offers lower cost than CATI (less labor and mail costs with key-entry costs born by the respondent), and the data quality is good. TDE has been able to retain very high response rates over long periods when coupled with appropriate nonresponse prompting.   VRE again requires only a telephone and carries a cost profile similar to TDE.   Surveys which use PDE require the respondent to have access to a microcomputer. Data can be entered using the keyboard or a file containing the data can be imported. Displays are typically an electronic image of the form on the screen. Error checking and other edits can be included, after which the data is transmitted back to the required agency where it is combined with other data. Computer - security issues are important here. Integrity checks to make sure the data received is the same as the data sent must be part of the system. Appropriate manuals and other training materials including on-line help should be provided. This type of data collection would be worthwhile in an establishment survey where respondents report data monthly, quarterly, or over a given period.   40       IV.D. System Interfaces for Data Conversion   Introduction   Automated submission of data has the benefit of reducing reporting errors because a keying step can be eliminated. Traditionally, respondents entered data onto paper forms which were mailed to central site where they were keyed into a computer system. With automated data submissions, intermediate keying steps can be eliminated.   Automated data transmission requires hardware and software compatibility between the respondent site and the Federal site. In recent years the number and types of software and hardware options have greatly multiplied into the current myriad of products and technologies on the market. Due to these developments, Federal agencies are often looking at heterogeneous sources f or data transmission.   Federal agencies conduct many surveys with many types of respondents. These data sources, such as state and local governments and businesses, will increasingly have capabilities for reporting data in an automated way. Many now have personal computers (PC's) while others have only mainframes available. Complexity arises as Federal agencies, looking at a mix of hardware and software technologies available at respondent sites, must select the best way to collect data from these heterogeneous sources.     Planning for system Interfaces   Managers of data collection projects can expect interface problems, but these problems can be minimized by good planning. Knowledge about the availability of communications capability, hardware, and software at respondent sites will aid managers in their planning for system interfaces for data collection.     Communications Capability   Perhaps the most important issue for system interfaces is communications. Communications may be thought of as networking or as linking technologies together. With networking capability, data can be transmitted across telephone lines or special private line arrangements such as local area networks (LAN's). See the section on Networks Planning in this report for a discussion of networking issues. A related issue is maintaining the confidentiality of data transmitted in such a manner. See the section on Computer Security in this report.   41       Hardware   Hardware is needed at both the respondent site and the Federal site for data transfer. The type of hardware available at the respondent site will often decide what options the Federal survey managers will offer for submitting data. It may be necessary for the Federal site to have hardware for data conversion available, for example, hardware to read both 5 1/4 inch and 3 1/2 inch diskettes. Also, communications may need to be set up between hardware devices. The section on Hardware Planning in this report discusses these issues further. Three common types of hardware links are discussed below.   Mainframe to Mainframe: Data can be transmitted from one mainframe to another via a communications network. Either the respondent or the Federal site can specify record layout and formatting instructions for data submission. Front-end processors can do data conversion before the data are sent to the host computer. Another option is submission of a computer tape in a specified format.   PC to PC: A link between two PC's can be established using a network system. Another way to transmit data from one PC to another is to mail the data on diskette. The record layout and diskette format would be agreed upon by the respondent and the Federal site. Because diskette sizes vary, the Federal site may need conversion hardware and software to read diskettes of different sites. Another option is to provide software on a diskette to the respondents.   Mainframe to PC: This type of hardware link combines the options described above. Again, a link can be established using a communications network. If the PC is at the respondent site, a diskette with software may be provided to set up the PC to send data over to the mainframe in the-appropriate format.   Software Compatibility   Although Federal survey managers usually cannot provide hardware to respondent sites to use for data transmission, they often can provide software for this purpose. If the respondent's software is used, the Federal site must have the same software or be able to convert the data to the correct format. Not only can different software products be incompatible, but two versions of the same software product can be incompatible. One version may have a higher level of functionality than the other. Again, there must be planning for document transfer. See the section on   42       Software Development in this report for more guidance on planning for software compatibility.   43       IV.E. Computer Security   Introduction   Computer security refers to the continued operation of computer applications at acceptable levels of risk to the organizations) being supported by the applications. Risk is usually measured in terms of potential loss, specifically losses that occur from:   1. Disclosure of information to unauthorized parties (i.e., loss of confidentiality),   2. Modification or other adverse actions that affect the expected quality of information (i.e., loss of integrity), and   3. Destruction or other adverse events that affect either the availability of the information when it is needed or the availability of the computer system to process that information (i.e., denial of service/loss of availability).   The types of losses described above can result from accidental and intentional events, as well as from natural hazards.   When estimating risk, it is important to consider direct losses (e.g., the cost to replace modified or destroyed information), as well as indirect losses (e.g., the inability of the organization to meet its mission which can lead to public embarrassment, congressional wrath, loss of lives, legal actions, competitive disadvantage, etc.). After estimates of risk are derived, it is necessary to select and implement cost-effective safeguards (e.g., physical, administrative, technical, management) to reduce these risks to acceptable levels.   With respect to automated statistical surveys, the types of losses discussed above can occur during data entry from the respondent, during transmission of the survey information to the host computer system, and within the host system. While the ideas discussed below are generally applicable to all of the survey types addressed in Section III of this report, this section will focus on surveys collected through or with the use of a computer where the following occurs:   1. Data entry using a terminal or computer system to collect the response information (i.e., not directly applicable to response information collected over the telephone). The data entry Process may "batch" the respondent's information for later transmission to the host computer for processing or may have the respondent connected   44       directly to the host system where the survey data is being captured in real-time (and may be processed in real-time).   2. Transmission of the response information over telecommunications lines/circuits, including future ISDN networks discussed above, and transmission on magnetic media (e.g., floppy disk) through public and private mail delivery services, and   3. Receipt and processing of the survey information by a host computer system.   Problem Areas   Data Entry   During the data entry process, the following issues need to be addressed with respect to computer security.   Identification and Authentication: Respondents and other users of computer systems that are used to collect survey information must be positively identified and authenticated to assure the validity of the survey and to hold users accountable for their accidental or intentional actions. While passwords are still the most widely used method of authenticating the users claim of identity, other methods such as biometrics and smartcards can be used when increased protection is desired--usually at increased cost. Passwords can be effective for authentication when used in accordance with FIPS 112, Password Usage Standard.   Access Control: Access to information on computer systems should be strictly controlled so that users only have access to information they are authorized to see or change. Most commercial computer systems provide mechanisms that support this function. Systems that appear on the National Computer Security Center's Evaluated Products, List contain operating system level access controls that provide protection from unauthorized disclosure of information. Access controls are important on multi-user systems that are used to collect survey data in order to prevent the survey data from being intentionally or accidentally read, modified or destroyed.   Accountability: Unless computer systems contain mechanisms for recording and analyzing users, computer security relevant actions, it will not be possible to hold users accountable for actions that cause computer-related losses. When users know that a computer system has an effective audit trail collection and processing mechanism, they are less likely to make mistakes or to attempt unauthorized access to information for fear of being caught. When survey data is collected on systems that provide   45       accountability mechanisms, it will be easier to determine if the survey data have been tampered with or have been disclosed to unauthorized users.   Confidentiality: Besides access controls discussed above for preventing survey data from being disclosed to unauthorized individuals, cryptography can be used to protect data while it is being stored in a computer system or on other magnetic media such as floppy disk or magnetic tape. FIPS 46, Data Encryption Standard (DES), defines the only government-wide standard for encrypting and decrypting unclassified computer data. Since the DES has also been widely accepted by the commercial sector, there are many off-the- shelf-products that can be purchased for implementing DES cryptographic protection.   Integrity: During data entry, the integrity of survey data can be affected by entering false/inaccurate data or by modifying data already entered. Approaches for addressing these issues include;   1. Editing through the use of error detecting- or correcting software that determines reasonableness of input data with respect to any number of criteria such as character composition of data input, numerical bounds checks, data dependent checks on previously entered data, etc.   2. Access control (see above) that prevents unauthorized users from gaining access to the survey data   3. Cryptographic check sum as defined in FIPS 113, Data Authentication Standard that places a cryptographic "seal" on the survey data for the purpose of detecting modification of the survey data from some initial state. This technique is useful when the survey data is stored-in computer memory or on magnetic media such as floppy disk or magnetic tape.   4. Accountability is the primary method for detecting modification to survey data by individuals who ARE AUTHORIZED (i.e., access controls do not apply) to access the data. While effective against both accidental and intentional modification, authorized users that intentionally modify data can subvert accountability controls if they have a high degree of technical knowledge about the computer system.   5. Software-engineering assurance techniques should be used in developing the data entry and other system   46       software to preclude errors from being introduced into the survey data through faulty software.   Restart/Backup/Recovery: It is necessary to plan for restart/backup/recovery activities whenever the data entry process is interrupted or the survey data is destroyed. Techniques such as maintaining backup files, permitting restart points in the data entry process, and planning for an alternative data entry processing capability are all directed at maintaining continuity in the data entry process.   Transmission   During transmission, the respondent's survey data are sent from the data survey system to the host system that will process the survey data. While authentication applies primarily to transmission of survey data through telecommunications networks, confidentiality and integrity techniques are applicable to telecommunications networks and mail delivery of magnetic media.   Authentication of host computers (e.g., the host computer of the data entry system) to the transmission network is required by and provided for most telecommunications networks to prevent unauthorized use of the network and to facilitate billing for network services. Sometimes, depending,on the sensitivity of the survey data, it might be necessary to have the transmission network authenticate itself to the data entry host system before sending such data over the network. In this way, the data entry system can be sure that the survey information is being sent over the actual network rather than being given to an intruder that is spoofing the data entry system into giving the intruder the survey data. If the network lacks capability for authenticating itself, then techniques used for confidentiality and integrity described below may be considered as alternative methods of protection.   Confidentiality: The most common technique for preventing disclosure of information within transmission networks is to use cryptography. As discussed above, the DES is the only government-wide standard for encrypting and decrypting unclassified computer data.   Integrity: integrity with regard to transmission of survey data is the assurance that the survey data has not been altered, either accidentally or intentionally, during the transmission process. Cryptographic checksum techniques, as described above in the section on Data Entry Integrity, are effective in providing this protection.   Availability/Reliability of Network Services: Sometimes, particularly in real-time data collection and transmission, continuity of the transmission service can be very important to the   47       success of the survey activity. Discontinuities due to the unavailability of the network or some of its intermediate nodes or due to noise in the transmission lines can result in survey data being lost, erroneous, or delayed. This could be particularly annoying to a-respondent that has to keep repeating the survey data entry process or is unnecessarily prompted for nonresponse. it is possible to minimize such problems by using networks that provide error detecting/correcting procedures, dynamic routing around unavailable nodes, and other services that assure network availability and reliability.   Host Computer System   Computer security concerns at the host computer are similar to those at the data entry computer. The reader should refer back to these discussions to supplement the material contained in the corresponding areas below.   Identification and Authentication: All users of the host system, including the respondent data entry system, should be required to identify and authenticate themselves to the host system to assure the validity of the survey and to hold users accountable for their accidental or intentional actions. The same authentication techniques that were discussed for the data entry system apply to the host system.   Access Control: Access to information on the host systems should be strictly controlled so that users only have access to information they are authorized to see or change; in particular only authorized users should be permitted to access survey data on the host system.   Accountability: The host computer system should contain mechanisms for recording and analyzing users, computer security relevant actions in order to hold users accountable for actions that cause computer-related losses, particularly losses to the survey data.   Confidentiality: Besides access controls discussed above for preventing survey data from being read by unauthorized individuals, cryptography can be used to protect data while it is being stored in the host system or on other magnetic media such as a floppy disk or magnetic tape. As with the data entry system, the DES should be used for this purpose.   Integrity: on the host computer, the integrity of survey data can be affected-by entering false/inaccurate data during the data