COMBINING REGISTER-BASED AND TRADITIONAL CENSUS PROCESSES AS A PRE-DEFINED STRATEGY IN CENSUS PLANNING

 

Olivia Blum

Israel Central Bureau of Statistics

 

Abstract

 

Traditional and register-based censuses are two options in carry out censuses. The ability to maintain and update relevant registers leads some countries to conduct administrative censuses, while those who lack this ability have no option but to perpetrate traditional processes. Limited resources and declining inclination of the public to cooperate imposes a search for an efficient process that relies more on existing data. However, optimizing the use of resources does not mean striving for a pure register based census. Rational decision-making demands a decomposition of the main theoretical and practical components of a census, a cost/benefit analysis of each component WITHIN the established structure and with regard to the interdependence and interactions amongst its parts. The end result, in most cases, is  a combination of both types of censuses rather than one or the other.  Moreover, the established statistical system is not a mere census system but rather a census information system where the data are rich and detailed. But, unlike the conventional notion of a census as a snapshot of a stock, performed in defined time intervals, it is a dynamic and continuously ‘breathing’ statistical body, that monitors the flows and generates a sequence of frequent stock-snapshots.

 

 

1          Introduction

 

The process of  population census-taking involves data collection from a defined population within defined geographic boundaries. These data, when processed, analyzed and transformed to statistical information, characterizes individuals and households,  demographically and socio-economically, in small population groups and within detailed geographic units.

 

Censuses, by definition, rely on data collected from all members of a relevant population. This is done either by a direct interaction with the population, as in conventional censuses, or indirectly, by a secondary use of existing administrative files. Since there are no automatic processes of population data generation, an active operation of collection is always involved. Consequently, the question of direct or indirect data-collection is addressing in reality a broader issue of who is in control of or who is responsible for, the data. The issue is often translated to power relations when conflicting interests are introduced.

 

The objectives of statistical offices are formulated in terms of statistical information, while the carrying out of administrative roles leads to accumulated data produced in the form of reports and measurements as recorded during a specific, issue-oriented interaction. The derived gap between the required statistical information and the raw, register or register-like data, is the divergence point of a methodological fork that has numerous paths:  one path ignores entirely the raw data and initiates target oriented projects (i.e. conventional census); another path is the option of statistical operations that rely solely on data collected by others (i.e. register-based census); while a third option includes hybrid  operations of varying degrees of the two former options, dwelling on the continuum between these two extremes.

 

When to use what option, depends on the interface between interests and needs of the statistical and the administrative organizations, expressed in a set of attributes: population covered, type of data-collected, time reference, frequency and timing of updating, maintenance and reliability of the produced files, their accessibility and their content flexibility. Furthermore, the available technology as well as the technological horizon are major factors in defining interests that create or eradicate conflicts engendered by human entanglement in data handling. Playing a role in the overall ‘data-game’ is also the environmental background, i.e. legislation and public opinion.

 

When the needs of both, the statistical and administrative organizations, coincide perfectly, the administrative apparatus is superior (Johansson, 1991). When it is not the case, it is possible to accommodate data-collection to both purposes by an ‘integrated data-collection’, where the authorities collect data for statistical and administrative ends (Denmarks, 1995). However, this harmony is very fragile. It may lead to a bureaucratic constellation, where more people are needed to enable the integration, while the very same people cause the bureaucratic mechanism to be cumbersome and inefficient. Furthermore, administrative files cannot be maintained at the same quality level, with all the needed statistical parameters, unless the statistical office does it itself, not only because of different interests, but also because of differential change of interests (Laihonen&Thomsen,1998). A corporate-integrated data management (Priest,1996b), applied on the inter-organizational level may tackle this problem, but even here, it is not a solution for  the long run, since it conveys blurry boundaries between different public institutions in a political power structure.

Moreover, no matter what solution is applied to enable the statistical use of administrative data, it  always invokes the need to control data collection and data quality, by continuous evaluation, based to some extent, on fieldwork operation rather than on “pure” data mining. Hence, administrative data as the only input for the production of census statistical information is an unfeasible utopia.

 

On the other end of the continuum, when no available administrative data is useful for statistical purposes, the statistical office is entrusted with all tasks associated with the production of census information. However, this distinct situation rarely exist. The magnitude, complexity and costs of population and housing censuses make it worthwhile to use existing administrative files to support, supplement or substitute, a single element or sets of elements, in the different phases of a census. Simple count of units, their partial demographic profile or any other attribute that can be linked partially or fully, on a micro or macro level, can be used functionally for more efficient and parsimonious statistical processes. This assertion may seem to embody a contradiction: although a census applies to the whole population, a partial use of registers as well as a use of partial registers is suggested. The tendency to hold a  “sterile intellectual position” in Sheuren’s (1992) terminology, makes it hard to see alternatives to conventional census taking, where partial use of registers is concerned. However, it is only a deceiving external cloak. Registers can be used for a single task, like imputation to supplement data not collected directly, or for completing under-covered population groups like young males, thus improving the overall census quality in terms of coverage and reliability. Furthermore, the registers themselves do not have to be comprehensive and full, since they can supplement census data with or without complementing each other.

 

While several European countries have already performed or tried to perform a register-based census, most countries over the world have usually performed conventional censuses with none or a minor use of administrative files:

Denmark and Finland have built statistical registers functioning as a census database, based on their administrative registers, by means of evaluation and modeling of estimators. Norway, Sweden, Austria, Belgium, Luxembourg and Switzerland are in a transition from conventional to register-based census, while the Netherlands is looking for another solution, other than register-based (Longva et al,1998; Laihonen&Thomsen,1998; Laihonen, 1999).

Germany and France although attempting the transition to a register-based census are still closer to a traditional one. A shift toward a register-based census as the main source of information is planned in Slovenia, while other countries, like Cyprus and the Baltic States, are trying to rationalize census taking by exploiting existing resources within a framework of traditional census.

Past censuses in Israel, although conventional, have used the population register intensively. The next census is expected to be planned under the rational assumptions of mixed processes, without deciding in advance the precedence of the source of information.   

 

In the following sections I would like to discuss the alternatives and its main theoretical and practical components of a combined census taking, and elaborate on the decision making factors to be included in the utility function, of the sources of information to be used.

 

 

2          Census-System or Census Information-System

 

2.1       Changing Goals in a Changing Reality

 

Censuses generate detailed information of population and housing, and as such come to serve two types of users: those whose research and analysis are implemented in policy planning and making, and those whose research is an end in itself. Both are conservative in their attitude to changes of census content, yet when expressing needs to be met, they would like to stretch the canopy and adapt to changing conditions. Although stability and consistency between censuses, as well as compliance to international recommendations, are kept for intra-national and inter-national comparison reasons (longitudinally and cross-sectional), census content is somewhat flexible and tuned to local culture and needs. However, while census content has varied, census goals have not been asked to accommodate to changing realities. Growing social differentiation and individualization implies growing complexity of the population, and a need for even more detailed information. Yet, this changing reality has resulted in decreasing willingness of the population to participate in common tasks (Germany Statistisches Bundesamt,1992). Expecting the census to be the same good old friend, solid and true, inspite of its ‘ugly warts and wrinkles’ (Farnsworth-Riche and Marx,1996), may lead to leaning on a broken reed. Some adaptations and alterations have to be considered.

 

Changes, in terms of census goals and objectives, can be either of overall strategy or of local tactics. In the supply and demand data-market, the need for change may rise from both sides. The demand as defined by the users is to be judged by their objectives and not by the data they would like to have in their offices. Sometimes they need less or different data than what they declare (Vliegen and Van de Stadt, 1989), and it is the role of the statistical office to identify the source, the type and the level of sensitivity and reliability of the data to be used for the declared purposes. Moreover, statistical offices are not demand-followers only, they may study past use and alter definitions of needs according to actual use. They may also anticipate future potential use. It means that the statistical offices have the ability and the obligation to shape the demand curve according to society needs that either have or have not been detected or articulated by the users beforehand. Changes from the demand side are usually addressed by local revision rather than by global transformation. It is more of changing tactics of investigation, by adding questions to the questionnaire or altering the configuration or substance of items that are already included in the questionnaire.

However, when the data supply is in problematic, a strategic change is called for. Limited resources and declining inclination of the public to cooperate call for a search for alternatives to data collection and processing.

 

Population data in all sources of information, derive directly or indirectly from the population. Indirect data are defined as such when they serve for a secondary use, meaning that they have been collected for different purposes. In the census arena, indirect data are administrative files, subject matter surveys, and censuses whose units are not individuals or households (agriculture census and such). 

 

2.2       Indirect Data-Collection Supporting a Conventional Census

 

In most countries it would not be possible to stay in a pure conventional census, based on direct data collection, because of the expanding needs for census information, increasing costs in absolute terms and per capita, and decreasing public cooperation (Schueren et al, 1992; Laihonen&Thomsen,1998). Furthermore, census operation is lengthy, and the data is provided in long intervals which is compounded by additional time lags between collection and dissemination. Thus, data collected for non census purposes are introduced to the census process, gradually.

 

Registers have already been used to improve coverage before, during and after the enumeration. Addresses known beforehand serve to prepare maps, enumeration routes and mail-lists, and to control coverage during data collection. Individual records may help just the same, to allocate reasonable enumeration portions to each enumerator, to pre-print information on the questionnaires or as a check-list during data-collection.

Registers can be used to reduce the data capture workload and to improve it in optical data entry system, by linking and comparing individual records in the register with the optically identified values of the census (Blum,1997).

 

Moreover, since in most censuses the socio-economic questions are addressed to a sample, and non-response is of homogeneous groups, registers are used to improve the quality of the data by serving as a sampling frame and for editing and imputation procedures. This type of use of registers reduces biases due to non-response and improves the estimates provided. It avoids the single-source output bias (Germany,1989; Harala,1996; Heihonen& Laihonen,1987; Huggins& Fay, 1988; Priest,1996a; Thompsen et al,1996). Registers are also used to increase the number of observations needed for small area and small population-groups estimates (Schaafsma-Harteveld, 1999; Slagter,1999; Leggieri, 1999).

In post-census activities, administrative files are used for evaluation purposes, as one among several sources of comparable data.

The above uses enable the reduction of direct data collection by addressing fewer questions to fewer people.

 

 

2.3       Direct Data Collection Supporting a Register based Census

 

In most countries, a full register-based census is not a feasible option either. This is because of legislation constraints or limited available sources of information, originating from and perpetuated by the lack of control over these sources.

 

Registers have to be evaluated on a constant basis, their content and coverage have to be adjusted to census purposes and their quality has to be kept.

Harald (1999) suggests that in the pursuit of high quality, the main role of Statistics Norway is to identify errors in the registers and to inform the authorities. Twenty percent of the total costs of the 2000 census in Norway, are allocated to the improvement of the registers.

Another aspect of the support needed for a register-based census, is the addition of variables that are not included in the existing registers. Several methods to collect the crucial variables are suggested by the Nordic and Benelux countries including:

integrated data collection in which data is collected for statistical reasons as well as administrative ones by the administrative authority,

random surveys with the possibility of attaching on ongoing surveys, designated surveys to target populations, and rolling sample surveys.

In addition, these countries suggest building new registers and conducting partial censuses, where limited issues are addressed to the whole population.

 

2.4       Toward a Census Information-System

 

In the second half of the 20th century, the evolving pattern of census alternatives that rely on indirect data collection, has been planned and perceived as a replacement of the traditional census processes. This limited perception ignores the wide spectrum of the new possibilities that the multiple source data enable. It should be considered as a bedrock for potential change in census objectives while developing and extending statistical options, and not as a mere replacement of the traditional census. Administrative censuses have not yet proven themselves to be a pure model of a secondary use of existing data. As a result, seeing the administrative census as a substitute is a source of new problems. Countries that have been trying to shift to a register-based census report problems of coverage and content deriving from the gap between interests: administrative files cover interest groups rather than the whole population, and variables of interest to the administrative authority are not necessarily variables of interest to the census information users. 

 

The ideal situation is not to have a complete register-based census as a final objective, but rather to optimize the use of available sources in order to avoid the faults of each and to take advantage of the merits of each. In such a setting, the census becomes just one of several sources (Longva et al,1998), in an all-embracing statistical system, whose life span depends on continuous activities of data-collection, evaluation and processing. This within an environment of accelerated transformation of: society and social values, economy and economic capabilities, policy and its implementation in the political arena, and of present technology and technological horizon. It is a shift from a census system to a census information-system where the data are rich and detailed. But unlike the conventional notion of a census as a snapshot of a stock, performed in a defined time interval, it is a dynamic and continuously ‘breathing’ statistical body, that monitors the flows and generates a sequence of frequent stock-snapshots.

 

The idea of a supported census, by either conventional or a register-based one, means that although each process and sub-process of both options has its own merits and liabilities, decision making is usually based on choosing one way census. The recruitment of supporting elements of the alternative census process, is introduced only when a problem is detected. This decision making process cannot be defined as a pure rational one, but rather as a bounded rationality that was pre-selected as such. The ideal-type of a rational decision-making relies on the decomposition of the main theoretical and practical components of a census, and on a cost/benefit analysis of each, WITHIN the established structure and with regard to the interdependence and interactions amongst its parts.

 

 

3          Building Blocks of the Decision Making Process

 

The main building blocks to be taken into account of in the decision making process are presented in the diagram, followed by a discussion of selective components.

The idea is that rational decision making means weighing the pros and cons of the use of different sources of data, on a micro as well as macro level. This should be done while taking into account the different interests and the differentiation of interests between data collectors, along time, and considering the alternatives not as enumeration of people vs. enumeration of files, but rather as a combination of both.

 

Building Blocks of the Decision Making Process

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


3.1       Legislation

 

Statistical offices draw their legitimacy and power from laws specifically legislated for their functioning. However, these laws also seek to protect people from the intrusion to their privacy and from the violation of their basic human rights, by the very same agencies who are endowed by law. This tension exists throughout the census activity and beyond; in data collection, processing, dissemination and actual use. When the use of multiple source data for census purposes is introduced, a new set of legal questions and derived legislation, follow suit. They involve the statistical bureau’s right to:

  1.      get and use, for statistical objectives, data collected for other purposes;

  2.      influence the data collected by other agencies;

  3.      build up and maintain registers;

  4.      add the same unique identification number to each record in all registers;

  5.      initiate designated statistical operations in the field (surveys or census-like);

  6.      inter-link different sources of information;

  7.      produce integrated statistical information;

  8.      pass on integrated information;

  9.      allow each individual the access to his/her personal information;

10.      and address security and storage issues.

 

Answering the set of questions pertaining to legislation issues is a prerequisite for multiple source census. However, positive answer is not required en-bloc but can be solved selectively. For example, aggregate statistical linkage is an option if identification numbers are missing, complementary fieldwork operation is a valid alternative when registers are missing and are not allowed to be built, and so on.

Yet, this logic, where most components are intertwined with each other, possibly serving as partial alternatives to the same end, implies a complex inter-dependency in which changes in one component affect others, causing the need for contingency plans.

 

Regulations have a stabilizing effect in a seeming wobbly situation, however, the complexity of the system makes it sensitive to changes in the regulations themselves.  Laws and regulations are needed not only to exploi