Survey Attrition:
A taxonomy and the search for valid instruments to correct for biases
Peter Dolton*
Maarten Lindeboom**
Gerard J. Van den Berg***
Most longitudinal surveys suffer from attrition at least some of which may not occur at random from the sample. Attrition may cause a bias in estimates based only on data from respondents. We use a unique data set that combines panel survey information of individual workers with administrative records. The reasons for survey attrition are coded into different behavioural categories. The administrative records provide information on individual labour market behaviour and personal characteristics for the complete sample (i.e. the sample participants and the non-respondents). We show how attrition is heterogenous in the sense that the separate reasons for attrition are clearly different and test how many distinct reasons there are and how they should be grouped. We examine the implications of attrition for the distributions of variables in the survey and explore the possibility of using interviewer information or an item non-response score to questions (on the first wave of the survey) as valid instruments to correct for selection bias due to attrition.
Keywords: Attrition bias, instruments, unemployment duration,
partial likelihood.
January, 2000
Please do not quote without permission of the authors.
* University of Newcastle-upon-Tyne
**Free University Amsterdam and Tinbergen Institute
***Free University Amsterdam, Tinbergen Institute, and CEPR.
Address for correspondence: Department of Economics, Free
University Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands.
1 Introduction
Attrition is a commonly encountered phenomenon in longitudinal surveys. Whether attrition affects the statistical analysis of the survey data depends on the variables one is interested in. If one aims to use the survey to estimate the fraction in the population with a certain characteristic, then a systematically high or low attrition among those who have this characteristic biases the estimate. If one aims to estimate a model, and the only difference between the sample of respondents who remain in the survey and the initially selected sample is in the distribution of explanatory variables on which one conditions in the analysis, then attrition does not affect the estimation results. Of course, this requires that the process governing survey attrition is unrelated to unobserved determinants of the endogenous variable of interest (i.e. the variable whose values one aims to explain in the analysis). Indeed, attrition must be unrelated to measurement errors in the data on the endogenous variable.
If attrition behaviour is related to unobserved determinants of the variable of interest, and if this ignored, then in general the estimation results are inconsistent. Empirical studies based on social survey data do not pay much attention to attrition, essentially for the reason that it is felt that there is nothing one can do about it. As Horowitz and Manski (1998) state, ``[With non-response,] the only way to identify population statistics is to make assumptions that determine the distribution of the missing data. A fundamental problem of empirical analysis is that such assumptions are untestable.'' Most studies merely provide the attrition rate or (sometimes) compare the marginal distributions of explanatory or endogenous variables among respondents to those in census data. The differences between the marginal distributions can be used to construct weights for the respondents, giving a higher weight to respondents who seem to be underrepresented. The underlying idea of this approach is that if the marginal distributions among respondents are similar to the corresponding population distributions then, hopefully, the conditional distributions of the endogenous variable given the explanatory variables are also similar.
This paper pursues the character of unit non-response in a panel survey in more detail with an unusually informative sampling frame that combines administrative data with survey data. From administrative records, a random sample of long-term unemployed workers in the UK was taken, and a survey was conducted among these workers some months after that. The survey response rate is about 56%. Of the respondents who participated in the first wave, about 71% also participated in the second wave, held about a year later. We refer to Dolton, Lindeboom & Van den Berg (1999) for an analysis of the non-response to the first wave of the survey. The analyses in this paper concerns non-response to the second wave of the panel survey, conditional on participation in the first wave of the survey. In the remainder of this paper we will use the terms ‘non-response’ and ‘attrition’ interchangeably to refer non-participation of in the second wave of this survey. Of interest for our analyses is that in the survey the reason for non-response to each wave was coded into 22 different categories. For example, they indicate whether the individual has moved their geographical location, or whether they refused to be interviewed.
The individual records in the survey data and the administrative data are linked. The administrative data contain useful information on some personal characteristics of the individual. This enables us to study the effect of these characteristics on the reason for not participating in the survey, and, by implication, the difference between the distributions of these characteristics in the administrative data and among respondents. This provides a taxonomy of the reason for attrition. In general, social surveys do not contain any information on the reason for non-response or on characteristics of non-respondents, so that their relation cannot be studied.
The reasons for non-response may be informative for the behaviour of non-respondents. In particular, it is plausible that they are related to individual labour market behaviour. For example, if someone has moved then this may be due to the fact that he found a job somewhere else. If an individual refuses to cooperate with the survey interview then this individual may also be reluctant to apply for a job or that rigorous questioning regarding their availability for work could induce them to sign-off receiving unemployment benefit (UB).
The administrative data contain information on actual labour market behaviour of all individuals in the original sampling frame (i.e., respondents plus non-respondents). In particular, they supply the date at which the individual leaves unemployment. This allows us to study the relation between unemployment duration and (the reason for) leaving the survey. Basically, the administrative data provide us with a unique insight into the behaviour of the sample drop-outs, and, in particular, allows us to see to what extent it differs from the behaviour of those who remain in the sample. The effect of non-response on the estimated unemployment duration distribution follows from a comparison of the estimate based on data from respondents who remain in the sample to the estimate based on data from the sample respondents and non-respondents. In the latter case we allow the distribution for non-respondents to vary with the reason of non-response, in order to detect whether the reason for non-response is related to unobserved determinants of unemployment duration.
We find that the duration distribution varies significantly with the reason of non-response. Needless to say, this may be of interest for agencies that run surveys, as well as for the researcher modeling the length of unemployment spells who is not so well endowed with data as in the present example. In particular, the results facilitate a categorisation of types of individuals who are likely to be a non-respondent in future waves of a survey, and, more innovatively, they enable a targeting of individuals whose non-response is likely to distort the empirical analysis of individual labour market behaviour with survey data.
Having established that an attrition taxonomy is valuable, there is a clear implication for handling attrition problems in other studies which are not so well endowed with data as ours. Our data allow us to check for the validity of instruments that can be used to correct for attrition in behavioural models. For correction of the selection bias generated by attrition, it is essential to have instruments that affect attrition behaviour but that do not affect the distribution of the endogenous variable of interest. If any candidate instrument fails to satisfy either of these conditions they will be invalid. In the remainder of the paper we explore the possibility of using alternative identifers for attrition. The candidates we examine for this role are a score on item non-response to questions on earlier waves of the sample to proxy for the latent variable propensity not to respond and interview information as a valid instrument to correct for selection bias due to attrition. The form of the interview information is of two kinds: firstly, we know how long the previous interview at wave I of the survey took and secondly we know the identity of each interviewer. This interview duration may act as a proxy for the disutility the previous experience caused the respondent and hence influence their likelihood of responding at wave II. The identity of the interviewer is useful as individual differences in interviewer style and personality may have a bearing on the experience for the respondent and hence influence the likelihood of future response. We find that item non-response scores and interview duration are not valid instruments but that the interviewer identity information is a valid instrument. The next step is to use this instrument to correct for attrition bias.
One possibility is to use this instrument in the set-up of a sample
selection model and to compare the outcomes of this model with the outcomes
of the model that is estimated on the full (i.e. untruncated) sample. An
alternative is to devise sample weights based on this instrument, to correct
for the different types of attrition. The latter approach is far more attractive
to use in non-linear models, such as our duration model for unemployment
duration. Moreover, in the context of a weighting procedure, it is very
straightforward to deal with different types of attrition (such as in our
case). It will be very difficult to correct for different types of attrition
in a non-linear sample selection model
The rest of the paper is organised as follows. The next section describes
the data sources. Section 3 discusses the non-response categories. The
way in which attrition varies with explanatory variables and the way in
which it affects the estimated distribution of unemployment durations is
presented in Section 4. In section 5 we test whether interviewer information
is a valid instrument and we devise sample weights based on this instrument.
We subsequently explore whether a re-weighting of the truncated sample
with the weights based on the interviewer information helps in correcting
for the distortions in the distributions due to survey attrition. Section
6 summarises and concludes.
2 The Data
In 1989 the Policy Studies Institute in the UK was commissioned by the UK Employment Service to evaluate the impact of the ``Restart'' policy program for unemployed workers. At the time, the Restart program consisted of compulsory six-monthly meetings between the unemployed individual and a counsellor of the Employment Office, for each unemployment benefits claimant in the UK. During these interviews, the counsellor offers advice on job search, and he may place workers in contact with employers or training agencies. If the individual does not attend a Restart interview or is deemed not to be available for work then their case is referred to an adjudication officer and they may be faced with the possibility of having their benefits reduced or suspended. Over the period of July to September 1989 over 270,000 such adjudication decisions were made and in 57 were stopped. The main aim of the program is to reduce the amount of time people spend unemployed, and to reduce their dependency on unemployment benefits.
To avoid confusion, it must be stressed from the outset that the Restart interviews are not survey interviews. For the purposes of the present paper, the Restart interviews are only relevant in that the planned date of the first Restart interview (6 months after entry into unemployment) affects the sampling design. In particular, to evaluate the Restart program, a random sample of 8925 unemployed workers were identified in March 1989, who would approach their 6th month of unemployment in the period March-July 1989. Individuals were retained in the sample even if they subsequently did not attend a scheduled Restart interview. Every Employment Service office throughout Britain was contacted while constructing the sample, in order to eliminate regional biases. Individuals were selected for the sample from the inflow lists, on the basis of their National Insurance (NI) numbers. This is known to result in a random 5 percent sample. Of this set, a control group of 582 people was randomly chosen again by means of previously specified NI digit sequences. Members of the control group, although eligible for a Restart interview, were not asked to attend the initial Restart interview. The existence of a random control group allows for the evaluation of the impact of the program without having to deal with the issue of self-selection.
For the sample of 8925 individuals, administrative information on a few personal characteristics, such as sex, age, and travel-to-work area, was retrieved from the Employment Services. The information on an individual's travel-to-work area was linked to the National Online Manpower Information System (NOMIS) data, in order to obtain data on local labour market conditions. In addition, the data are linked to the Joint Unemployment and Vacancies Operating System (JUVOS) Cohort database collected by the Employment Service. The JUVOS data provide accurate administrative records on the claimant's unemployment history from 1982 up to January 1995. Unfortunately, the administrative data do not record the destination state upon exit out of unemployment. This could be employment, a training programme or simply signing off the claiming of unemployment benefit (to obtain benefits, one needs to register at the Employment Service). However, by comparing the administrative data to the survey data for respondents, Dolton and O'Neill (1996b) show that most exits out of unemployment amount to a transition into employment.
After excluding individuals with spells substantially longer or shorter than 6 months in April 1989, and excluding those who lacked either JUVOS data or the travel-to-work area information, we are left with a sample of 8012. Of these, 512 are members of the control group.
About 6 months after the identification of the full sample (i.e., in September 1989), a survey organisation (Social and Community Planning Research, or SCPR) conducted a survey of these individuals. The survey was intended to supply additional information on background variables and job search behaviour of the individuals. Detailed information was obtained on subsequent work history, personal characteristics, the Restart interview, previous employment history, search behaviour and benefit income. This survey was conducted between September and October 1989. Of the original sample of 8925 individuals, 5200 individuals completed the survey. Of the sample of 8012 (see the previous paragraph), 4708 completed the survey. Of these 4708 respondents of the first wave, 3352 also participated also in the second wave. We are interested in attrition between wave I and wave II of the survey. We refer to Dolton, Lindeboom & Van den Berg for an analysis of the non-response to the first wave of the survey.
Table 1 below presents means of variables for the individuals in our
sample. The first column reports means of the total sample of 8012 (i.e.
those who were invited to the Restart interview and the controls). The
second column refers to information of the respondents to the first wave
of the survey, the third column to the second wave. It is clear from this
table that relatively minor effects of non-response are found on the mean
of the variables. More specifically, the average age of the respondent
is higher than that of a non-respondent. The unemployment duration variable
is measured from a fixed point in time: September 1989, this is the time
around the first interview. It is interesting to see that the unemployment
duration of individuals who will respond to the survey is higher than the
unemployment duration of the total sample. This indicates that, on average,
individuals with lower exit rates remain in the sample, implying that survey
non-response may be selective with respect to unemployment duration.
Table 1: Variable means in the total sample and among the respondents
Variable Total sample Wave I Wave II
Age 32.62 33.62 34.20
Female 0.30 0.32 0.30
Local unemployment
change (decline in U rate)
0.35
0.35
0.35
Living in an inner city area 0.20 0.18 0.20
Member of control group 0.06 0.07 0.08
Unemployment duration beyond selection date 302.26 314.53 340.5
Uncensored cases 0.03 0.03 0.06
# cases
8012
4708
3352
Although the information in this table may reveal some aspects of the
effect of attrition, it still provides at best a partial picture. Moreover,
sample non-response may occur for a variety of reasons. It is conceivable,
for instance, that individuals who change address, did so because they
have found a job in another area. On the other hand, badly motivated people
may have difficulties finding a job and may be less inclined to participate
in a survey, especially when this survey is about job search behaviour
and labour market prospects. The first type of individuals experience on
average high exit rates out of unemployment, whereas the latter type of
individuals may have on average lower exit rates. For that reason it may
be interesting to have a closer look at different types of non-response
and the relative importance of this non-response typology in our sample.
We do this for the second wave of the survey, conditional on participation
in the first wave of the sample. This is the sample set-up which most researchers
are faced with.
3 A Taxonomy of Survey Attrition
3.1 Reasons for attrition
Table 2 provides a list of reasons for non-response as coded for each individual initially selected for the survey. The first column refers to the code associated with the type of non-response. The second and third columns give counts of individuals in each of the separate categories for both waves. Sample non-response is an absorbing state, i.e. individuals who do not participate in the first wave, do not return in the second wave.
The coding of 22 reasons for non-response may be informative, it is however less convenient to use this information in any multivariate analysis. We therefore recode the information into four different categories. The motivation for this split follows directly from the work of Goyder (1987) and Waterton and Lievesley (1987). In addition the organisation of the tracing process followed a logical sequence of stages as presented in Figure 1. Hence we used the logic of this process with the detailed reasoning provided in Table A0 with respect to the categorisation of each reason.
We distinguish between non-response due to refusal of the individual to cooperate (labeled as REFUSER), non-response because the individual had moved residence (labeled as MOVER), non-response due to inability of the survey agency to contact a respondent, for reasons not directly associated with mobility (e.g. the respondent is never at home; labeled as NO CONTACT) and finally non-response for other reasons which are no fault of the non-respondent (e.g. ill health, respondent does not speak English; labeled as NO FAULTER).
REFUSER and MOVER are reasons for non-response that are to a large extent initiated by the individual. It can be expected that with respect to the unemployment duration those who left the sample for reasons which placed them in the MOVER group, will have, on average, higher exit rates. The REFUSER group may include less motivated individuals with poor labour market opportunities. It is conceivable, though perhaps unlikely, that this category also contains individuals who are actively searching, or perhaps even already have found a job and therefore are less inclined to participate in the survey.
The NO CONTACT group are defined from codes associated with a loss of contact between the agency running the survey and the individual respondent. Though the loss of contact could be initiated by the individual respondent, it is presumed that an agency, to a large extent, can affect such drop-out by providing more intensive tracing efforts. An example of one of the codes in this category is "not contacted, never in". It is unclear how many times the agency tried to contact the individual, but it is possible that the number of attempts to contact individuals determines the frequency in this category. "No trace of address" or "address vacant or derelict" are examples of other codes in the NO CONTACT group. The category NO FAULTER is simply defined as those non-respondents whose non-response was unavoidable due to circumstances beyond their control. Examples are "Ill health", "Could not speak adequate English", "Lost in the postal system" etcetera.
Table 3 provides counts of the attrition categories under the above
mentioned grouping definitions. From the table it can be seen that, for
instance, 211 of the non-respondents are movers and that 193 persons refused
to cooperate, 151 could not be contacted and 60 have reasons for non-response
which are no fault of their own.
Table 2: Reasons for attrition, conditional on participation in the
first wave
Variable
Code
Wave II
No trace of address
1
5
Address vacant or derelict
2
58
Premises demolished
3
0
Business/industrial premises only
4
1
Remote address (not issued to interviewer)
5
1
Mover - follow-up address given
19
16
Mover – follow-up address not known
20
216
Respondent deceased
21
7
No contact at address
22
111
Complete refusal to interview
23
1
Address given is Benefit office
24
0
Interview obtained
51
3352
Refusal to office
70
1
Not contacted (e.g. never in)
71
191
Personally refused interview
72
188
Broke appointment and not re-contacted
73
82
III (at home) during survey period
74
6
Away/in hospital during survey period
75
38
Incapacitated
76
2
Refusal on behalf of respondent
77
56
Named respondent could not speak adequate English 78
1
Other reason for non-response
79
371
Lost in Postal system
80
2
Table 3: Counts for aggregated attrition categories
Category
number
MOVER
(codes 19, 20, 22)
211
REFUSER
(codes 70, 72, 73, 77)
193
NO CONTACT (codes 1, 2, 3, 4, 5, 23, 24, 25,
71)
151
NO FAULTER (codes 21, 74, 75, 76, 78,
79, 80)
60
___________________________________________________________________________
In the empirical analyses we will relate these non-response categories to a range of exogenous variables and the unemployment duration. This will provide us more insight into the effect of non-response on a range of exogenous variables and a relevant endogenous variable such as unemployment duration. Before we do this, some remarks remain.
Firstly, our grouping of the attrition categories is to a large extent driven by our interest in the unemployment duration variable. The groups are constructed in such a way that it may be expected that the within group codes are relatively homogenous in their effect (except of course for NO FAULTER). This should avoid the potential offsetting effects of separate codes within each category take place. We therefore believe that this way of grouping the data helps us to better understand the forces that drive non-response and its effect on the variable of interest. Moreover, the "endogenously driven" grouping may help us in our search for suitable instruments to correct for non-response.
Secondly, it has to be noted that as both MOVER and REFUSER are defined as being individual initiated attrition, they may therefore be relevant for a specific group of unemployment recipients. Self selectivity effects are expected to play an important role in these groups and it will in general be difficult to find good instruments to correct for this. The loss of contact between the agency and the respondent (NO CONTACT), on the other hand can to a large extent be influenced by the agency running the survey and, to a certain extent, is expected to act more generally on the sample of unemployment recipients. It has to be noted however, that extra efforts of the agency may influence the composition of the sample directly.
Finally since we have imposed this fourfold taxonomy from the outset
then we must test its validity. We can do this formally in the next section
where we present the estimation results.
3.2 Attrition and personal characteristics
The appropriate statistical model for the examination of the multiple reasons for attrition from the sample is multinomial logit analysis. Using this estimation it is important to check to see if the model estimated satisfies the Independence of Irrelevant Alternatives (IIA) assumption which is a consequence of this model. This can be checked using a the test suggested by Hausman and McFadden (1984) which is a variant of the Durbin-Hausman-Wu test. In addition it is important to verify that the number of alternatives which have been modeled are the most satisfactory representation of the data. This is particularly important if there is no good theoretical basis for a particular grouping of the outcome variable in the data or, as in this case, we are attempting to summarize 22 possible reasons into a much smaller but coherent typology. The various alternative configurations can be tested using the Cramer-Ridder (1991) test . Once this test was used on our data and categorisation of the attrition reason we established that our a priori reasoning of dividing the attrition into four basic reasons was too sophisticated. Indeed the data suggests via the results of the Cramer-Ridder test that the NO FAULTER and NO CONTACT groups should be grouped. The results in Table 4 report the final specification which was accepted after a series of Cramer-Ridder tests (described in the Appendix). We label this group ALL OTHERS. The alternative fourfold specification is reported in Table A4 in the Appendix.
The Table relates the reason for non-response at the second questionnaire of the survey is related to a range of characteristics. Table A2 of the Appendix reports a logit analysis for survey attrition. In Table 4, the respondents serve as the reference group and hence the effects presented are relative to this group. For the category MOVER and REFUSER groups, the age variables imply a quadratic relationship between non-response and age. These results imply that the probability of non-response rise up to middle years and decrease thereafter. This effect is not significant for the non-response of the ALL OTHERS group. Apart from age, being a female, being married, being in the control group and the local unemployment rate change between 1988 and 1990 seems to matter. With respect to the latter variable, large impacts are found on the probability to drop-out in the ALL OTHERS category. Females, married individuals and members of the control groups remain longer in the sample. Under the REFUSER category there is also some evidence that self declared problems with literacy may play a role in attrition.
Included in Table 4 are two additional variables that are labeled ‘Potential Attrition Identifiers’. The aim of including these variables is to try to find factors regarding the survey or interview conditions which might correlate with non-response type but be independent of our endogenous variable, namely the unemployment duration. These results are discussed further in Section 4 when we describe the search for valid instruments.
We also estimated a logit model in which all exogenous variables were
excluded. The results of this logit can be used to test whether non-response
is random. The likelihood ratio test for joint significance of the included
regressors yields a statistic of 70.8. From this it can be concluded that
this hypothesis is strongly rejected.
Table 4: Non-response in the first wave and explanatory variables: results from a multinomial logit
Variable MOVER REFUSER ALL OTHERS
Constant 1.786 (2.95) -2.035 (3.26) -1.644 (2.74)
Personal Characteristics
Age
0.061 (2.29)
0.034 (1.41)
0.027 (1.18)
Age2
-0.003 (3.66)
-0.001 (2.08)
-0.001 (1.51)
Female
-0.296 (1.74)
0.114 (0.71)
-0.989 (4.82)
Married
-0.670 (2.91)
-0.034 (0.17)
-0.851 (3.87)
Literacy Problems
0.066 (0.28)
-0.607 (1.90)
0.121 (0.54)
Education
-0.241 (1.55)
-0.168 (1.05)
-0.189 (1.23)
Number of Children
0.042 (0.40)
-0.124 (1.14)
0.094 (0.94)
Black
0.644 (1.64)
-0.013 (0.02)
0.352 (0.82)
Asian
0.067 (0.18)
0.134 (0.34)
0.113 (0.31)
Exogenous Factors
Local Unemp.change
0.198 (0.13)
-1.630 (1.07)
5.085 (2.09)
Inner city area
0.042 (0.23)
0.026 (0.14)
-0.087 (0.25)
Control group
0.763 (2.16)
0.196 (0.76)
0.106 (0.24)
Potential Attrition Identifers
Interview Duration
-0.002 (0.38)
0.007 (1.08)
-0.003 (0.65)
Item Non-response
-0.436 (1.31)
-0.416 (1.20)
-0.169 (0.66)
_____________________________________________________________________________________
Explanatory note: The respondents are taken as reference group. The
absolute t-values are in parentheses. Coefficients are in bold if they
are significant at the 10% level.
3.3 Attrition and the Distribution of Unemployment Duration
The potential survey participants are a random sample of workers approaching their 6th month of unemployment duration in the period March--July 1989. The survey, which was conducted in the fall of 1989, was set up to analyze the effect on job search outcomes of a Restart interview. As we saw in the tables of Section 2, 4708 out of the original set of potential respondents participated in the survey and of these 4708 respondents, 3352 also participated in the second wave held about 6 months later. This section examines the distortions on the distribution of unemployment duration due to attrition between the two waves. Our unemployment duration variable is defined as the duration of unemployment onwards from the first interview date. We take a fixed point in time, namely the first of September 1989. It is interesting to know that from the 4708 respondents, 1896 had already found a job by the time that they were interviewed (i.e. 2812 were still unemployed).
Basically, one can distinguish two reasons for a relation between non-response (and the reason for non-response) and unemployment duration. First of all, job search behaviour and the behaviour towards survey participation may be affected by the same underlying individual-specific characteristics. An individual with a relative dislike for social contacts may refuse to cooperate with the survey interview and may also be reluctant to apply for a job. An individual who spends a lot of his time searching for a job may not want to spend time with a survey interview. The second reason for a relation between non-response and unemployment duration is that the acceptance of a job makes it more difficult for the agency to contact the individual. Job acceptance may entail a movement of the individual to another geographical location - which could easily be out of the scope of the survey. Also, the individual may be away from home more often. This relation is fundamentally different from the first-mentioned explanation, as the causal effect runs directly from job acceptance to non-response, and this effect does not depend on the presence of unobserved characteristics.
Van den Berg, Lindeboom and Ridder (1994) examine the relation between unemployment duration and non-response in a model for attrition from a longitudinal survey, conditional on response in the first wave of the survey. Their data do not enable a distinction between the two reasons for such a relation. This is because it is not observed whether an individual accepts a job between the last wave in which he participates and the first wave in which he does not participate. In the present paper we are in a position to distinguish empirically between the two reasons for a relation. This is because we always observe the moment of exit out of unemployment. If most non-respondents exit unemployment in between the two waves then this indicates that there is a direct causal effect from exit out of unemployment to non-response. If, on the other hand, we observe a relation between non-response and exit out of unemployment at dates after the survey date, then this indicates that the relation works by way of unobserved individual-specific characteristics.
It is to be expected that the reason for attrition is also informative with respect to the relation between attrition and unemployment duration. For example, a person who drops out of the sample because they are a MOVER may have a direct causal relation between their unemployment duration and attrition, whereas an individual in the REFUSER category may have unobserved characteristics which play an important role.
A simple way to examine the consequence of survey non-response is to apply a basic non-parametric method such as the Kaplan-Meier estimate of the duration distribution. Our sample contains ongoing spells of unemployment duration, as individual spells are included who are still unemployed at the date of the survey. We therefore have to adapt the Kaplan-Meier estimate of the exit rate (or hazard) accordingly. An appropriate way to deal with a sample of ongoing spells is to focus on the distribution of unemployment duration, beyond the selection date (the residual duration, denoted as r), conditional on the duration elapsed up to the date of selection (the elapsed duration, denoted as p). In practice the adaptation of the Kaplan-Meier estimate of the hazard amounts to an appropriate redefinition of the risks sets at the observed times that a failure occurs.
With flow data for instance, the risk set at a point, say t, includes durations that are equal to or exceed t. So the risk set Rt satisfies Rt=åiI (ti > t ), where I(.) is the indicator function and ti the duration of individual/spell i. In case of the distribution of the residual duration conditional on the elapsed duration, a spell ( t=r+p) is included in the risk set if additionally t exceeds the elapsed duration p . So the risk set for the residual duration conditional on the elapsed duration, Ri* , satisfies in this case Rt*=åiI (ti > t , pi <t). Table A3 of the appendix reports survival probabilities for the different type of non-response groups that we consider. In Figure 2a and 2b the survivor functions are plotted.
There are clear differences in the survivor functions. The survivor function for the group MOVERS is uniformly below all other functions implying that the unemployment exit rates are higher for this group. Respondents who refuse to participate in the survey have substantially lower exit rates, though at the right tail of the distribution (for durations longer than a year) the survivor functions seem to come close to the survivor function of the survey participants. This is also the case for REFUSER. The group associated with attrition due to the loss of NO CONTACT or NO FAULTER between the individual and the agency has the lowest unemployment exit rates (i.e. the survival curve is above all others.
The log-rank test for equality of the survivor functions in Figure 2a yields a test statistic of 14.7, which exceeds the 95th percentile of the c 24 distribution, which equals 9.5. Equality of the survivor functions is therefore rejected.
Table 5 reports results of a partial likelihood analysis. Contributions to the partial likelihood function are based on the conditional probability that a spell i ends, given the risk set Ri, defined as the set of spells having the same duration as spell/individual i or longer. This conditional probability is a simple ratio of the hazard for i relative to the sum of the hazards of all individuals that are exposed to the risk. In the case where the commonly used proportional hazard (PH) assumption is adopted, factors common to all individuals cancel from the expression. Consequently, under this PH assumption, no specific form of the baseline hazard is required and therefore in principle an unrestricted non-parametric baseline hazard is allowed for. For instance if we define the hazard rate q ( ti , xi ;b ) = q0(ti )q0(xi ;b ) then the partial likelihood function can be written as :
where d i is an indicator that equals 1 if i is observed to make a transition. In our case where we base our partial likelihood on the distribution of the residual duration, conditional on the elapsed duration, the risk set needs to be modified. The modification is similar to the modification of the risk set for the Kaplan-Meier estimate. More specifically, Rpi is defined as the set containing all spells exceeding the length of spell i (ti ) of which the elapsed duration pi is smaller than tt In a way the redefined risk sets allows for delayed entry of specific items, where the delaying time depends on the elapsed duration. An alternative is to use a stratified partial likelihood approach (see Ridder and Tunali, 1990) in which the stratification is on the elapsed duration. This procedure allows for non-parametric baseline hazard for each stratum. We estimated these models. The main conclusions were not altered.
We included dummy variables for each type of attrition in the specification of the hazard. This is convenient, but it may be restrictive as it implies constant shifts between the hazard rates. Note that Figure 2a suggests that such a specification may be reasonable. Moreover, restrictiveness of the specification could be tested by comparing it to more flexible alternative specifications. We also used stratified partial likelihood methods, where stratification was taken with respect to the different attrition types. Estimates of regression coefficients of these methods were virtually identical to estimates following from a specification with attrition dummies.
All the tables below will also report sandwich estimators of the variance-covariance matrix as suggested by Lin and Wei (1989). The sandwich estimator of the variance covariance matrix is robust to misspecifications of the model. Consequently, a comparison of the standard errors from the conventional estimates of the variance-covariance matrix with the sandwich estimator provides an informal specification test. Table 5 below reports the results on the effect of non-response on the hazard rate of unemployment duration. We observe individual records from the start of the unemployment spell that led to the invitation of the Restart interview (1988--1989) up to 31 December 1994. Only 229 out of the 8012 cases (approximately 3%) are right-censored. We lack administrative information on the state of destination, so that uncensored cases include both job findings as well as transitions to out-of-the-labour force states. This limitation needs to be borne in mind when interpreting the results of Table 5.
The first column of Table 5 refers to the partial likelihood results for the complete sample (i.e. the respondents and the non-respondents) and non-response indicators. In general these indicators will not be available. Significant coefficients signal differences in the distribution of unemployment durations but also signal the presence of relevant (normally unobserved) heterogeneity. The second column concerns the results for the complete sample, but the non-response indicators are left out. A comparison of columns 1 and 2 reveals whether the exclusion of information on non-response distorts the other parameters of the model. The third column reports estimates based on the sample of individuals who participate in both waves of the survey. A comparison of this column with the two other columns provides us with insight in the extent of the bias in some coefficients, generated by survey non-response.
We start with a discussion of column 1. The coefficients on the variables Age and Age2 suggest initially declining exit rates that increase after middle age. Women are observed to have higher exit rates. This result, on face value, contrasts with the literature, but in reality the exits include exit to non-participation. Since the Restart interview induces those women who are not genuinely seeking a job to sign-off UB when cross-questioned about their availability for work it induces many to sign off. Married people and those with an educational qualification also experience higher unemployment exit rates. These variables imply monotonically decreasing exit rates. The local unemployment change variable has a large and significant positive effect on the exit rate. Better labour market conditions increase the exit rates of the unemployed. The inner city variable indicates that lower exit rates for inner city inhabitants. Finally note that the control group experiences no significantly different exit rates than those who had a Restart interview. Note that we measure the effect of the variable conditional on (still) being unemployed at the date of the first wave of the interview, a couple of months after the Restart interview took place. In previous analyses (Dolton, Lindeboom & Van den Berg (1999)), where we analysed non-response to the first wave, we found that the control group experienced significantly lower unemployment exit rates than those who had a Restart interview.
As expected MOVER has a large and significant positive coefficient.
Apparently, individuals who leave the sample for mobility reasons are also
the same individuals that experience higher unemployment exit rates. One
may argue that individuals change place of residence because they have
found a job, and that therefore a permanent shift in the exit rate (as
we modelled it) may not be the right way to capture this. If this were
the case, one would expect to find the larger part of the unemployment
durations to end prior to the interview date, and it would be appropriate
to model this relationship more directly. A glance at the data revealed
that this is not the case. Therefore, capturing the intrinsic mobility
behaviour of individuals with a permanent component, such as MOVER, (so
that it captures time constant individual heterogeneity) may be the most
appropriate way to do this. The coefficients relating to the ALL OTHERS
and REFUSERS are insignificant suggesting little bias may result in the
estimation of the unemployment duration equation from ignoring these kinds
of attrition.
A comparison of column 1 with column 2 reveals that few differences are to be found in the case where we omit the non-response indicators. In a way this tells us that omission of unobserved characteristics does not affect the other parameter estimates. Whether truncation of the sample with respect to non-response indicators (which is usually the case in surveys) is of influence can be judged from the parameter estimates in column 3. A direct comparison of the results show that the gender, married, the number of children, having a drivers license and the "change in the local unemployment rate" are variables that are affected the most. Note that, for instance, the effect of the local unemployment change variable could be best seen in respect to the results of Table 4 of the previous section were we found that survey response rates increase when local labour market conditions improve. If this variable has a positive effect on the exit rate, then indeed one would expect to find this effect to be exaggerated in the case where the model is estimated on survey participants alone.
One possible method of testing whether the Specification I is significantly
different from Specification II in Table 5 is to perform a Durbin-Hausman
Wu test. More specifically we need to assume that our unemployment duration
equation is consistently estimated even when attrition is present. In this
case having instrumental variables for the attrition may induce more efficient
estimates. This is the idea behind a test of Specification I against Specification
II, where the former is more efficient than the latter. Formally we would
test
the hypothesis that
the two sets of coefficients are not systematically different against the
alternative
that the coefficients
are different. We get a test statistic of
=6.16.
Since the critical value at the 95% level is 23.685 we reject
in
favour of
, implying that
there are no systematic differences in the coefficients which suggests
at face value that the data on unemployment duration can be adequately
modelled without including attrition dummies. This result may, not be surprising
given how similar the sets of coefficients look in Table 5 between the
alternative specifications. However it should be remembered that part of
the misspecification which is present (in excluding drop outs) will be
absorbed into the baseline hazard of the model. We investigated an intuitive
test for this by estimating an exponential hazard model in place of the
PH model. In the equivalent of Specifications I and III there was a greater
difference in the corresponding coefficients.
Further investigation of the differences in the unemployment duration can be studied by the estimation of separate partial likelihoods for each of the attritor groups separately. These results are presented in Table 6. It should again be re-iterated that in the usual kind of statistical analysis of unemployment durations we would not observe any of the unemployment durations of the individuals in these equations. Hence these results should be interpreted with care. Specifically these results are the unemployment hazards of those who have dropped out of the survey. The results show a clear difference between the MOVERS, REFUSERS and ALL OTHERS. Age and education are important factors in unemployment duration for MOVERS. Conversely women, married individuals and blacks have longer durations given they are in the REFUSER sample. The results also show that the unemployment durations of the ALL OTHERS group cannot be explained by any of the observable characteristics that we have in our data.
The implication of these results is that we should be cautious about
estimating hazard models on samples with attrition in the hope to that
we can infer validity about the whole sample of unemployed. The most specific
form of the bias derives from those who drop out of the sample because
they move away. This is not surprising as most people in this category
may be making their geographical move for job related reasons. However
it seems quite clear that we may mistakenly believe that attrition would
cause no substantive bias if we simply compare the coefficients of Specification
II and III in Table 5. However, when we have the (usually unknown) information
about unemployment duration by attrition type we can see from Table 6 that
the determinants of unemployment duration are quite different for these
three groups. It may well be that the net effect of erroneously joining
these three groups together (into a group of any person who drops out of
the sample) may average out these different influences and mask the heterogeneity
which exits between attrition types. Our results also suggest that the
attrition process is endogenous to the determination of unemployment duration.
Table 5: Unemployment duration with and without attrition IV estimates: results from a partial likelihood analysis
Variable
Specification I
Specification II
Specification III
Personal characteristics
Age
-0.041 (6.9/6.7)
-0.040 (6.9/6.6)
-0.045 (6.7/6.8)
Age2
0.001 (6.7/6.2)
0.001 (6.6/6.0)
0.001 (6.5/6.3)
Female
0.400 (9.3/9.4)
0.401 (9.4/9.5)
0.408 (8.5/8.8)
Married
0.383 (7.0/6.6)
0.386 (7.2/6.7)
0.421 (6.9/6.6)
Black
-0.120 (1.0/0.7)
-0.104 (0.7/0.5)
-0.168 (1.2/0.7)
Asian
0.081 (0.8/0.6)
-0.007 (0.6/0.4)
-0.007 (0.6/0.4)
Literacy
0.012 (0.2/0.0)
0.012 (0.2/0.0)
0.060 (0.8/0.7)
Education
0.227 (5.6/5.0)
0.223 (5.5/4.9)
0.216 (4.6/4.2)
Total number of kids -0.066 (2.6/2.7)
0.068 (2.6/2.4)
-0.081 (2.8/2.5)
Drivers licence
-0.044 (0.7/0.7)
-0.044 (0.7/0.7)
0.017 (0.3/0.2)
Mobile
0.283 (3.9/4.7)
0.282 (3.8/4.6)
0.296 (3.6/4.5)
Exogenous Factors
Local Unemp.change 1.520
(4.0/3.7)
1.488 (3.9/3.5)
1.247 (2.9/2.6)
Inner city area
-0.118 (2.4/2.1)
-0.121 (2.4/2.2)
-0.118 (2.0/1.8)
Control group
-0.073 (1.0/1.1)
-0.077 (1.1/1.2)
-0.089 (1.1/1.2)
Non-response types
MOVERS
0.188 (2.7/2.7)
REFUSERS
0.077 (1.0/1.3)
ALL OTHERS
-0.079 (1.1/2.9)
# cases 2812 2812 2197
Explanatory note:
The partial likelihood is based on the distribution of
the residual duration conditional on the elapsed duration. In parentheses
we report t-values based on the sandwich estimator of the variance covariance
matrix (Lin and Wei, 1989) and t-values based on the inverse of the hessian,
respectively. In specification I, the model is estimated on the full sample,
and dummy variables are included for the different non-response types.
Specification II is estimated on the same sample, but the non-response
indicators are excluded. Specification III refers to estimates on the sample
of survey respondents at wave II.
Table 6: Unemployment duration by Attrition Type: results from a partial likelihood analysis
Variable
ALL ATTRITORS
MOVERS
REFUSERS
ALL OTHERS
Personal characteristics
Age
-0.025 (2.01)
-0.054 (2.24)
-0.021 (0.97)
-0.024 (1.11)
Age2
0.001 (1.77)
0.001 (1.76)
0.003 (0.68)
0.001 (1.61)
Female
0.377 (3.86)
0.284 (1.62)
0.720 (4.65)
0.078 (0.38)
Married
0.262 (2.31)
-0.152 (0.74)
0.503 (2.45)
0.334 (1.50)
Black
0.102 (0.46)
-0.285 (0.86)
0.851 (2.96)
0.460 (1.14)
Asian
0.444 (2.25)
0.494 (1.51)
0.648 1.80)
0.214 (0.63)
Literacy
-0.183 (1.28)
-0.131 (0.51)
-0.002 (0.08)
-0.197 (0.76)
Education
0.254 (2.98)
0.429 (3.00)
0.262 (1.70)
0.151 (0.99)
Total number of kids
-0.018 (0.05)
-0.030 (0.43)
-0.086 (0.72)
-0.089 (0.77)
Drivers licence
-0.141 (3.66)
-0.105 (0.19)
-0.025 (0.66)
0.726 (0.89)
Mobile
0.194 (1.90)
0.096 (0.18)
0.217 (1.26)
0.567 (0.71)
Exogenous Factors
Local Unemp.change
2.840 (3.34)
5.253 (4.2)
2.110 (1.47)
1.842 (1.19)
Inner city area
-0.097 (0.91)
-0.107 (0.53)
-0.108 (0.60)
-0.153 (0.86)
Control group
-0.007 (0.03)
-0.011 (0.04)
-0.170 (0.55)
0.243 (0.621)
# cases 615 211 193 211
Explanatory note:
The partial likelihood is based on the distribution of
the residual duration conditional on the elapsed duration. In parentheses
we report t-values. The model is estimated on the sample who drop out of
the sample at Wave II by the reason of their attrition.
4 The Search for Valid Instruments to correct for sample attrition
For correction of the selection bias generated by attrition, it is essential to have instruments that affect attrition behaviour but that do not affect the distribution of the variable of interest. One candidate for the role of an instrument which has been shown to be valid in other datasets is an item non-response score. The idea of this variable is that each person has a latent unobserved variable - propensity not to respond to surveys. One possible proxy for such a variable is the tendency not to respond to individual questions on a previous wave of the questionnaire as this may indicate the first signs of a growing tendency of the individual not to respond. We computed such a variable from neutral questions on wave I and developed a score. (Details are provided in the Appendix). It is this score we use as a regressor in Table 4. The results indicate that such a variable does not have sufficient explanatory power in this data set to act as a valid instrument.
A second candidate as an instrument is the time spent in the face-to-face interview in the previous wave of the survey. The argument for using such a variable is that a previous, time consuming, experience of being surveyed may make the individual less likely to agree to respond in the next survey. This variable has also been found to be significant in previous studies e.g. Dolton, Taylor and Werquin (1999). The results reported in Table 4 however show that this variable is insignificant in the Restart data and hence not a candidate as an instrument.
Our final candidate as an instrument to correct for sample attrition
is information on the interviewer who performed the interview in the first
wave of the survey. The most flexible way to incorporate interviewer characteristics
is to use interviewer fixed effects (i.e. interviewer dummies). To test
for the usefulness of this instrument, we include the set of interviewer
dummies (170 interviewer dummies) as explanatory variables in the logit
for attrition between wave I and II. The likelihood ratio’s yields a test
statistic of 311, which exceeds the 95th percentile of the c2170
distribution, which equals 207. The set of interviewer dummies is jointly
significant. To establish that an instrument is truly valid one must also
verify that the variable in question does not correlate with the relevant
chosen outcome measure in order that the variable can act as an exclusion
restriction. Hence it can be tested by establishing if this set of interviewer
dummies also adds to the model for unemployment duration estimated on the
full sample (so sample survivors and drop-outs). In case of the partial
likelihood model all 211 interviewer dummies can be added to the specification.
The likelihood ratio statistic is 184, whereas the critical value of the
c2211
distribution
at the 5% significance level equals 253. This implies that joint significance
of the interviewer dummies is rejected. The rejectance probability is about
96%.
5 Conclusions
Most longitudinal surveys suffer from attrition at least some of which may not occur at random from the sample. Attrition may cause a bias in estimates based on data from respondents. We use a unique dataset that combines panel survey information of individual workers with administrative records. The reasons for survey attrition are coded into different behavioural categories. The Cramer-Ridder test supports only three distinct categories: these are those who move away, those who refuse to respond and all others. The administrative records provide information on individual labour market behaviour and personal characteristics for the complete sample (i.e. the sample participants and the non-respondents). We examine the implications of attrition for the distributions of variables in the survey. We find that attrition affects the distribution of exploratory variables such as age and gender, but also that of unemployment duration. Most importantly a Durbin-Hausman-Wu test supports the treatment of different attrition categories in the unemployment duration equation which shows that studies of unemployment duration based only on respondents and ignoring those who drop out of a survey could be biased. Specifically the lack of those who move away between wave I and wave II of a survey is significant as this type of person has significantly shorter unemployment durations.
Finally we suggest that interviewer effects could act as valid identifying instruments for attrition. Our result that previous interview duration and item non-response are distinct from results in other surveys and hence it is possible that attrition identifiers could be highly data specific. Hence what may be a valid exclusion restriction to identify attrition in one sample may not work in another data set.
References
Cramer , J.S. and Ridder, G. (1991) ‘Pooling States in the Mulitnomial Logit Model’, Journal of Econometrics, vol.47, pp.267-72.
Diggle, P. and Kenward, M.G. (1994) "Informative Drop-out in Longitudinal Data Analysis", Applied Statistics , Vol.43, pp.49 - 73.
Dolton, P., Lindeboom, M. and Van den Berg, G. (1999) ‘A Taxonomy of Survey Non-response and its Relation to Labour Market Behaviour’, in ‘The Creation and Analysis of Employer-Employee Matched Data’, Haltwanger, J., Lane,J., Spletzer, J., Theeuwes, J., and Troske, K. (eds)
Dolton, P. and D. O'Neill (1995), "The impact of Restart on reservation wages and long-term unemployment", Oxford Bulletin of Economics and Statistics , 57, 451 - 70.
Dolton, P. and D. O'Neill (1996a), "Unemployment duration and the Restart effect: some experimental evidence", Economic Journal , 106, 387 - 400.
Dolton, P. and D. O'Neill (1996b), "Restart and roundabouts: experimental evidence on the long-term effects of the Restart unemployment program", Working paper, University of Newcastle-upon-Tyne.
Dolton, P. and D. O'Neill (1996c), "The Restart effect and the return to full-time stable employment", Journal of the Royal Statistical Society,Series A , 159, 275 - 88.
Dolton, P. and Taylor, R. (1999) ‘A Taxonomy of Survey Nonresponse: A Case Study of the Long Term Unemployed’, Newcastle University Discussion Papers in Economics, no.99-04.
Goyder, J. (1987) ‘The Silent Minority. Nonrespondents on Sample Surveys’. Polity Press.
Hausman, J. (1978) ‘Specification Tests in econometrics’, Econometrica, vol.47, pp.153-162.
Hausman, J. and McFadden, D. (1984) ‘Specification Tests for the Multinomial Logit Model’, Econometrica, vol.52, pp.1219-40.
Horowitz, J.L. and Manski, C.F. (1998), "Censoring of outcomes and regressors due to survey nonresponse: identification and estimation using weights and imputations", Journal of Econometrics , 84, 37 - 58.
Lin D.Y. and L.J. Wei (1989), "The robust inference for the Cox proportional hazards model", Journal of the American Statistical Association , 84,1074 - 1078.
Potthoff, R.F., K.G. Manton and M.A. Woodbury (1993), "Correcting for nonavailability bias in surveys by weighting based on number of callbacks", Journal of the American Statistical Association , 88, 1197 - 1207.
Ridder, G. (1987), "The sensitivity of duration models to misspecified unobserved heterogeneity and duration dependence", Working paper, Groningen University.
Ridder, G. and W. Verbakel (1988), "On the estimation of the proportional hazard model in the presence of unobserved heterogeneity", Research Memorandum, University of Amsterdam.
Ridder, G. and I. Tunali (1990), "Family-specific factors in child mortality: stratified partial likelihood estimation", Research Memorandum, Groningen University.
Van den Berg , G.J., M. Lindeboom, and G. Ridder (1994), "Attrition in longitudinal panel data, and the empirical analysis of dynamic labour market behaviour", Journal of Applied Econometrics , 9, 421- 435.
Wang, R., J. Sedransk and J.H. Jinn (1992), "Secondary data analysis when there are missing observations", Journal of the American Statistical Association , 87, 952 - 961.
Waterton, J. and Lievesley, D. (1987) ‘Attrition in a Panel Study of
Attitude’, Journal of Official Statistics, vol.3, pp.267-82.
Table A1: Data Definitions and Means
| Variable | Mean | Definition |
| Age | 16.24 | Age over 18. |
| Female | 0.296 | 1 – if Female |
| Married | 0.454 | 1 if Married. |
| Literacy Problems | 0.106 | 1 – if the person has reading or writing problems |
| Education | 0.106 | 1 – if the person has an educational qualification. |
| Number of Children | 0.523 | Number of Children |
| Black | 0.023 | 1 – if Black. |
| Asian | 0.042 | 1 – if Asian. |
| Drivers Licence | 0.470 | 1 – if the person has a drivers licence. |
| Mobile | 0.474 | 1 – if the person has access to motorised transport. |
| Local Unemp.change | 0.346 | Change in the level of Unemployment Locally over last month. |
| Inner city area | 0.202 | 1 – if the person lives in an Inner City Area. |
| Control group | 0.079 | 1 – if the person is in the Restart control group. |
| Interview Duration | 16.24 | Length of restart interview in minutes. |
| Item Non-response | 0.069 | Item Non-response score calculated as below. |
Item Non-Response Score
The variable Item Non-response was calculated as a cumulative point score for questions not answered on the questionnaire.
i) Whether they live alone.
ii) What their ethnic origin is (where the alternative prefer not to say is not counted as non-response.)
iii) Do you have a telephone for receipt of incoming calls.
iv) Whether or not they can be willingly contacted in 6 months time.
v) Within your household are you or someone else responsible for owning or renting the property.
Hence our measure is a point score with a maximum value of 12. The frequency
distribution of this variable in our sample is:
| Item Non response | Frequency | Relative Frequency |
| 0 | 2644 | 94.03 |
| 1 | 152 | 5.41 |
| 2 | 13 | 0.46 |
| 3 | 1 | 0.04 |
| 5 | 1 | 0.04 |
| 9 | 1 | 0.04 |
| Total | 2812 | 100 |
Table A2: Attrition between wave I and wave II and explanatory variables: results from a logit analysis for non-response
Constant
0.713 (1.82)
Age
-0.036 (2.12)
Age2
0.001 (3.42)
Black
-0.314 (1.09)
Asian
-0.039 (0.17)
Female
0.356 (3.25)
Married
0.488 (3.46)
Literacy
0.042 (0.26)
Education
0.197 (1.96)
Total Number of Children 0.0009 (0.01)
Local Unempl.change
0.447 (0.48)
Inner city area
-0.041 (0.36)
Control group
0.318 (1.69)
Interview Duration
-0.002 (0.55)
Item Non-response
-0.436 (1.80)
Table A3 Survivor Function for different types of attrition based on Kaplan Meier estimates
Table A4: Non-response in the first wave and explanatory variables: results from a multinomial logit
Variable
MOVER
NO CONTACT
REFUSER
NO FAULTER
Constant -1.779 (2.94) -0.932 (1.31) -2.034 (3.25) -5.412 (5.06)
Personal Characteristics
Age
0.061 (2.28) 0.015 (0.54)
0.034 (1.41) 0.060 (1.50)
Age2
- 0.003 (3.66) -0.001 (1.04)
- 0.001 (2.08) -0.001
(1.35)
Female
-0.296 (1.74) -0.959
(4.02)
0.114 (0.71) -1.069 (2.77)
Married
-0.670 (2.91) -0.766
(2.95)
- 0.034 (0.17) -1.049
(2.64)
Literacy Problems
0.066 (0.23) 0.118
(0.45)
- 0.607 (1.90) 0.144
(0.37)
Education
- 0.241 (1.55) -0.263 (1.46)
-0.168 (1.05) -0.006 (0.02)
Number of Children
0.043 (0.40) 0.143
(1.24)
-0.124 (1.13) -0.073 (0.36)
Black
0.641 (0.39) 0.073
(0.54)
- 0.013 (0.03) 0.919 (1.44)
Asian
0.064 (0.37) -0.369 (0.49)
0.135 (0.34) 1.042
(2.00)
Exogenous Factors
Local Unemp.change 0.184
(0.13)
-2.326 (1.34)
-1.630 (1.07) 5.085
(2.09)
Inner city area
0.042 (0.23)
0.328 (1.64)
0.026 (0.14) -0.087 (0.25)
Control group
-0.764 (2.16)
-1.054 (2.27)
0.196 (0.76) 0.106
(0.24)
Potential Attrition Identifers
Interview Duration
-0.002 (0.01)
-0.007 (1.00)
0.007 (1.08) 0.003 (0.31)
Item Non-response
-0.436 (0.33)
-0.166 (0.54)
-0.416 (1.20) -0.145 (0.33)
Testing the Independence of Irrelevant Alternatives Assumption. (IIA).
A crucial assumption of the multinomial logit model is the IIA property. Testing whether a particular data set adheres to this assumption can be established by using the test suggested by Hausman and McFadden (1984). The suggested statistic is distributed as a chi squared distribution and has the form:
where
is the coefficient vector from
the consistent estimator.
is the coefficient vector from
the consistent estimator.
is the covariance matrix of
the coefficients from the consistent estimator.
is the covariance matrix of
the coefficients from the consistent estimator.
In this test statistic the covariance matrix is guaranteed to be positive
definite only asymptotically and assurance are not made about the diagonal
elements. Negative values along the diagonal are possible. Such situation
will cause the test statistic H to be negative. In this case we
can interpret this result as providing strong evidence that we cannot reject
the null hypothesis that the coefficients are not affected by the removal
of one of the alternatives. In the two tables below we report the test
statistics in both the 5 attrition regime and the 4 attrition regime dropping
each of the single alternatives in turn. Our results suggest that the IIA
is justified in both the 4 and 5 alternative model.
Table A5: Hausman IIA Test – 5 Alternatives
| Dropping Alternative | H Chi sq | Df | H0 |
| NO FAULTER | 0.32 | 42 | Accept |
| REFUSER | -1.02 | 42 | Accept |
| NO CONTACT | -0.45 | 42 | Accept |
| MOVER | 0.01 | 42 | Accept |
Table A6: Hausman IIA Test – 4 Alternatives
| Dropping Alternative | H Chi sq | Df | H0 |
| NO FAULTER | -0.55 | 29 | Accept |
| ALL OTHERS | -1.63 | 29 | Accept |
| MOVER | -0.07 | 27 | Accept |
Testing How many Attrition Regimes there should be.
Testing the valid number of attrition types can be assessed using the Cramer-Ridder test sequentially. Introducing some notation we assume that we wish to test whether there the j alternatives should be grouped in the classification. Assume there are individuals in j’th alternative and that the total number of individuals n in the sample is denoted by s. Cramer and Ridder (1991) showed that a valid likelihood ratio test could be constructed:
where
is the maximum loglikelihood if the estimates are constrained to satisfy
and
is the mximum likelihood
of the original unconstrained model. Clearly the most appropriate way to
proceed is to test the alternative of having 5 sets of separate coefficients
– one set for the reference group and one each for our 4 attrition types
treated. (Note for identification we only actually estimate 4 sets relative
to the reference group – which will be our sample respondents.) We would
then reduce the number of alternative types of attrition by goruping them
and testing them until we have a preferred number of alternatives using
this test. This we did systematically. The summary of the results are presented
in the Table below. Our conclusion from this analysis is that the NO CONTACT
and NO FAULTER groups should be joined together –leaving 4 groups in the
data to be used in our duration of unemployment analysis – namely : Respondents,
MOVER, REFUSER and ALL OTHER.
Table A7: Summary of Cramer-Ridder Test for the Valid number of attrition
types.
| H0 | H1 | Log Lr | LR | Critical (df)=Chi Sq | Accept |
| 5 regimes | 4 regimes –joining NO FAULTER and NO CONTACT | -2182.89 | 3.34 | (15)=7.261 | H1 |
| 4 regimes | 3 regimes –joining REFUSER, NO CONTACT, OTHER | -2204.98 | 47.52 | (30)=18.49 | H0 |
| 4 regimes | 3 regimes – joining MOVER with REFUSER, and NO CONTACT with OTHER | -2328.02 | 293.6 | (30)=18.49 | H0 |
| 4 regimes | 3 regimes – joining MOVER, NO CONTACT and OTHER. | -2196.07 | 29.7 | (30)=18.49 | H0 |
| 4 regimes | 2 regimes – joining all attrition types. | -2101.58 | 71.335 | (45)=30.1 | H0 |
Table A8: Attrition between wave I and wave II and unemployment duration: results from a partial likelihood analysis
Variable
Specification I
Specification II
Specification III
Personal characteristics
Age -0.042 (7.1/6.9) -0.045