COMBINING REGISTER-BASED AND TRADITIONAL CENSUS PROCESSES AS
A PRE-DEFINED STRATEGY IN CENSUS PLANNING
Olivia Blum
Israel Central Bureau of Statistics
Abstract
Traditional and
register-based censuses are two options in carry out censuses. The ability to
maintain and update relevant registers leads some countries to conduct
administrative censuses, while those who lack this ability have no option but
to perpetrate traditional processes. Limited resources and declining
inclination of the public to cooperate imposes a search for an efficient
process that relies more on existing data. However, optimizing the use of
resources does not mean striving for a pure register based census. Rational
decision-making demands a decomposition of the main theoretical and practical
components of a census, a cost/benefit analysis of each component WITHIN the established structure and with regard to the interdependence and
interactions amongst its parts. The end result, in most cases,
is a combination of both types of
censuses rather than one or the other.
Moreover, the established statistical system is not a mere census system
but rather a census information system where the data are rich and detailed.
But, unlike the conventional notion of a census as a snapshot of a stock,
performed in defined time intervals, it is a dynamic and continuously
‘breathing’ statistical body, that monitors the flows and generates a sequence
of frequent stock-snapshots.
1 Introduction
The process of population census-taking involves data
collection from a defined population within defined geographic boundaries.
These data, when processed, analyzed and transformed to statistical
information, characterizes individuals and households, demographically and socio-economically, in
small population groups and within detailed geographic units.
Censuses, by definition,
rely on data collected from all members of a relevant population. This is done
either by a direct interaction with the population, as in conventional
censuses, or indirectly, by a secondary use of existing administrative files.
Since there are no automatic processes of population data generation, an active
operation of collection is always involved. Consequently, the question of
direct or indirect data-collection is addressing in reality a broader issue of
who is in control of or who is responsible for, the data. The issue is often
translated to power relations when conflicting interests are introduced.
The objectives of
statistical offices are formulated in terms of statistical information, while
the carrying out of administrative roles leads to accumulated data produced in
the form of reports and measurements as recorded during a specific,
issue-oriented interaction. The derived gap between the required statistical
information and the raw, register or register-like data, is the divergence
point of a methodological fork that has numerous paths: one path ignores entirely the raw data and
initiates target oriented projects (i.e. conventional census); another path is
the option of statistical operations that rely solely on data collected by
others (i.e. register-based census); while a third option includes hybrid operations of varying degrees of the two
former options, dwelling on the continuum between these two extremes.
When to use what option,
depends on the interface between interests and needs of the statistical and the
administrative organizations, expressed in a set of attributes: population
covered, type of data-collected, time reference, frequency and timing of updating,
maintenance and reliability of the produced files, their accessibility and
their content flexibility. Furthermore, the available technology as well as the
technological horizon are major factors in defining interests that create or
eradicate conflicts engendered by human entanglement in data handling. Playing
a role in the overall ‘data-game’ is also the environmental background, i.e.
legislation and public opinion.
When the needs of both,
the statistical and administrative organizations, coincide perfectly, the
administrative apparatus is superior (Johansson, 1991). When it is not the
case, it is possible to accommodate data-collection to both purposes by an
‘integrated data-collection’, where the authorities collect data for
statistical and administrative ends (Denmarks, 1995). However, this harmony is
very fragile. It may lead to a bureaucratic constellation, where more people
are needed to enable the integration, while the very same people cause the
bureaucratic mechanism to be cumbersome and inefficient. Furthermore,
administrative files cannot be maintained at the same quality level, with all
the needed statistical parameters, unless the statistical office does it
itself, not only because of different interests, but also because of
differential change of interests (Laihonen&Thomsen,1998). A
corporate-integrated data management (Priest,1996b), applied on the
inter-organizational level may tackle this problem, but even here, it is not a
solution for the long run, since it
conveys blurry boundaries between different public institutions in a political
power structure.
Moreover, no matter what
solution is applied to enable the statistical use of administrative data,
it always invokes the need to control
data collection and data quality, by continuous evaluation, based to some extent,
on fieldwork operation rather than on “pure” data mining. Hence, administrative
data as the only input for the production of census statistical information is
an unfeasible utopia.
On the other end of the
continuum, when no available administrative data is useful for statistical
purposes, the statistical office is entrusted with all tasks associated with
the production of census information. However, this distinct situation rarely
exist. The magnitude, complexity and costs of population and housing censuses
make it worthwhile to use existing administrative files to support, supplement
or substitute, a single element or sets of elements, in the different phases of
a census. Simple count of units, their partial demographic profile or any other
attribute that can be linked partially or fully, on a micro or macro level, can
be used functionally for more efficient and parsimonious statistical processes.
This assertion may seem to embody a contradiction: although a census applies to
the whole population, a partial use of registers as well as a use of partial
registers is suggested. The tendency to hold a
“sterile intellectual position” in Sheuren’s (1992) terminology, makes
it hard to see alternatives to conventional census taking, where partial use of
registers is concerned. However, it is only a deceiving external cloak.
Registers can be used for a single task, like imputation to supplement data not
collected directly, or for completing under-covered population groups like
young males, thus improving the overall census quality in terms of coverage and
reliability. Furthermore, the registers themselves do not have to be
comprehensive and full, since they can supplement census data with or without
complementing each other.
While several European
countries have already performed or tried to perform a register-based census,
most countries over the world have usually performed conventional censuses with
none or a minor use of administrative files:
Denmark and Finland have
built statistical registers functioning as a census database, based on their
administrative registers, by means of evaluation and modeling of estimators.
Norway, Sweden, Austria, Belgium, Luxembourg and Switzerland are in a transition
from conventional to register-based census, while the Netherlands is looking for
another solution, other than register-based (Longva et al,1998;
Laihonen&Thomsen,1998; Laihonen, 1999).
Germany and France
although attempting the transition to a register-based census are still closer
to a traditional one. A shift toward a register-based census as the main source
of information is planned in Slovenia, while other countries, like Cyprus and
the Baltic States, are trying to rationalize census taking by exploiting
existing resources within a framework of traditional census.
Past censuses in Israel,
although conventional, have used the population register intensively. The next
census is expected to be planned under the rational assumptions of mixed
processes, without deciding in advance the precedence of the source of
information.
In the following sections
I would like to discuss the alternatives and its main theoretical and practical
components of a combined census taking, and elaborate on the decision making
factors to be included in the utility function, of the sources of information to
be used.
2 Census-System or Census Information-System
2.1 Changing
Goals in a Changing Reality
Censuses generate
detailed information of population and housing, and as such come to serve two
types of users: those whose research and analysis are implemented in policy
planning and making, and those whose research is an end in itself. Both are
conservative in their attitude to changes of census content, yet when
expressing needs to be met, they would like to stretch the canopy and adapt to
changing conditions. Although stability and consistency between censuses, as
well as compliance to international recommendations, are kept for
intra-national and inter-national comparison reasons (longitudinally and
cross-sectional), census content is somewhat flexible and tuned to local
culture and needs. However, while census content has varied, census goals have
not been asked to accommodate to changing realities. Growing social
differentiation and individualization implies growing complexity of the
population, and a need for even more detailed information. Yet, this changing
reality has resulted in decreasing willingness of the population to participate
in common tasks (Germany Statistisches Bundesamt,1992). Expecting the census to
be the same good old friend, solid and true, inspite of its ‘ugly warts and
wrinkles’ (Farnsworth-Riche and Marx,1996), may lead to leaning on a broken
reed. Some adaptations and alterations have to be considered.
Changes, in terms of
census goals and objectives, can be either of overall strategy or of local
tactics. In the supply and demand data-market, the need for change may rise
from both sides. The demand as defined by the users is to be judged by their
objectives and not by the data they would like to have in their offices.
Sometimes they need less or different data than what they declare (Vliegen and
Van de Stadt, 1989), and it is the role of the statistical office to identify
the source, the type and the level of sensitivity and reliability of the data
to be used for the declared purposes. Moreover, statistical offices are not
demand-followers only, they may study past use and alter definitions of needs
according to actual use. They may also anticipate future potential use. It
means that the statistical offices have the ability and the obligation to shape
the demand curve according to society needs that either have or have not been
detected or articulated by the users beforehand. Changes from the demand side
are usually addressed by local revision rather than by global transformation. It
is more of changing tactics of investigation, by adding questions to the
questionnaire or altering the configuration or substance of items that are
already included in the questionnaire.
However, when the data
supply is in problematic, a strategic change is called for. Limited resources
and declining inclination of the public to cooperate call for a search for
alternatives to data collection and processing.
Population
data in all sources of information, derive directly or indirectly from the
population. Indirect data are defined as such when they serve for a secondary
use, meaning that they have been collected for different purposes. In the
census arena, indirect data are administrative files, subject matter surveys,
and censuses whose units are not individuals or households (agriculture census
and such).
2.2 Indirect Data-Collection Supporting a
Conventional Census
In most countries it
would not be possible to stay in a
pure conventional census, based on direct data collection, because of the
expanding needs for census information, increasing costs in absolute terms and
per capita, and decreasing public cooperation (Schueren et al, 1992;
Laihonen&Thomsen,1998). Furthermore, census operation is lengthy, and the
data is provided in long intervals which is compounded by additional time lags
between collection and dissemination. Thus, data collected for non census
purposes are introduced to the census process, gradually.
Registers have already
been used to improve coverage before, during and after the enumeration.
Addresses known beforehand serve to prepare maps, enumeration routes and
mail-lists, and to control coverage during data collection. Individual records
may help just the same, to allocate reasonable enumeration portions to each
enumerator, to pre-print information on the questionnaires or as a check-list
during data-collection.
Registers can be used to
reduce the data capture workload and to improve it in optical data entry
system, by linking and comparing individual records in the register with the
optically identified values of the census (Blum,1997).
Moreover, since in most
censuses the socio-economic questions are addressed to a sample, and
non-response is of homogeneous groups, registers are used to improve the
quality of the data by serving as a sampling frame and for editing and
imputation procedures. This type of use of registers reduces biases due to
non-response and improves the estimates provided. It avoids the single-source
output bias (Germany,1989; Harala,1996; Heihonen& Laihonen,1987;
Huggins& Fay, 1988; Priest,1996a; Thompsen et al,1996). Registers are also
used to increase the number of observations needed for small area and small
population-groups estimates (Schaafsma-Harteveld, 1999; Slagter,1999; Leggieri,
1999).
In post-census
activities, administrative files are used for evaluation purposes, as one among
several sources of comparable data.
The above uses enable the
reduction of direct data collection by addressing fewer questions to fewer
people.
2.3 Direct Data Collection Supporting a
Register based Census
In most countries, a full
register-based census is not a feasible option either. This is because of
legislation constraints or limited available sources of information,
originating from and perpetuated by the lack of control over these sources.
Registers have to be
evaluated on a constant basis, their content and coverage have to be adjusted
to census purposes and their quality has to be kept.
Harald (1999) suggests
that in the pursuit of high quality, the main role of Statistics Norway is to
identify errors in the registers and to inform the authorities. Twenty percent
of the total costs of the 2000 census in Norway, are allocated to the
improvement of the registers.
Another aspect of the
support needed for a register-based census, is the addition of variables that
are not included in the existing registers. Several methods to collect the
crucial variables are suggested by the Nordic and Benelux countries including:
integrated data
collection in which data is collected for statistical reasons as well as
administrative ones by the administrative authority,
random surveys with the
possibility of attaching on ongoing surveys, designated surveys to target
populations, and rolling sample surveys.
In addition, these
countries suggest building new registers and conducting partial censuses, where
limited issues are addressed to the whole population.
2.4 Toward a Census Information-System
In the second half of the
20th century, the evolving pattern of census alternatives that rely on indirect
data collection, has been planned and perceived as a replacement of the
traditional census processes. This limited perception ignores the wide spectrum
of the new possibilities that the multiple source data enable. It should be
considered as a bedrock for potential change in census objectives while
developing and extending statistical options, and not as a mere replacement of
the traditional census. Administrative censuses have not yet proven themselves
to be a pure model of a secondary use of existing data. As a result, seeing the
administrative census as a substitute is a source of new problems. Countries
that have been trying to shift to a register-based census report problems of
coverage and content deriving from the gap between interests: administrative
files cover interest groups rather than the whole population, and variables of
interest to the administrative authority are not necessarily variables of
interest to the census information users.
The ideal situation is
not to have a complete register-based census as a final objective, but rather
to optimize the use of available sources in order to avoid the faults of each
and to take advantage of the merits of each. In such a setting, the census
becomes just one of several sources (Longva et al,1998), in an all-embracing
statistical system, whose life span depends on continuous activities of
data-collection, evaluation and processing. This within an environment of
accelerated transformation of: society and social values, economy and economic
capabilities, policy and its implementation in the political arena, and of
present technology and technological horizon. It is a shift from a census
system to a census information-system where the data are rich and detailed. But
unlike the conventional notion of a census as a snapshot of a stock, performed
in a defined time interval, it is a dynamic and continuously ‘breathing’
statistical body, that monitors the flows and generates a sequence of frequent
stock-snapshots.
The idea of a supported
census, by either conventional or a register-based one, means that although
each process and sub-process of both options has its own merits and
liabilities, decision making is usually based on choosing one way census. The
recruitment of supporting elements of the alternative census process, is
introduced only when a problem is detected. This decision making process cannot
be defined as a pure rational one, but rather as a bounded rationality that was
pre-selected as such. The ideal-type of a rational decision-making relies on
the decomposition of the main theoretical and practical components of a census,
and on a cost/benefit analysis of each, WITHIN the established structure and
with regard to the interdependence and interactions amongst its parts.
3 Building Blocks of the Decision Making
Process
The main building
blocks to be taken into account of in the decision making process are presented
in the diagram, followed by a discussion of selective components.
The idea is that rational
decision making means weighing the pros and cons of the use of different
sources of data, on a micro as well as macro level. This should be done while
taking into account the different interests and the differentiation of
interests between data collectors, along time, and considering the alternatives
not as enumeration of people vs. enumeration of files, but rather as a
combination of both.
Building Blocks of the Decision Making Process
3.1 Legislation
Statistical offices draw
their legitimacy and power from laws specifically legislated for their
functioning. However, these laws also seek to protect people from the intrusion
to their privacy and from the violation of their basic human rights, by the
very same agencies who are endowed by law. This tension exists throughout the
census activity and beyond; in data collection, processing, dissemination and
actual use. When the use of multiple source data for census purposes is
introduced, a new set of legal questions and derived legislation, follow suit.
They involve the statistical bureau’s right to:
1.
get and use, for
statistical objectives, data collected for other purposes;
2.
influence the data
collected by other agencies;
3.
build up and maintain
registers;
4.
add the same unique
identification number to each record in all registers;
5.
initiate designated
statistical operations in the field (surveys or census-like);
6.
inter-link different
sources of information;
7.
produce integrated
statistical information;
8.
pass on integrated
information;
9.
allow each individual the
access to his/her personal information;
10.
and address security and
storage issues.
Answering the set of
questions pertaining to legislation issues is a prerequisite for multiple
source census. However, positive answer is not required en-bloc but can be
solved selectively. For example, aggregate statistical linkage is an option if
identification numbers are missing, complementary fieldwork operation is a
valid alternative when registers are missing and are not allowed to be built,
and so on.
Yet, this logic, where
most components are intertwined with each other, possibly serving as partial
alternatives to the same end, implies a complex inter-dependency in which
changes in one component affect others, causing the need for contingency plans.
Regulations have a stabilizing effect in a seeming wobbly situation, however, the complexity of the system makes it sensitive to changes in the regulations themselves. Laws and regulations are needed not only to exploi