Survey Attrition:
A taxonomy and the search for valid instruments to correct for biases

  Peter Dolton*
Maarten Lindeboom**
Gerard J. Van den Berg***

Most longitudinal surveys suffer from attrition at least some of which may not occur at random from the sample. Attrition may cause a bias in estimates based only on data from respondents. We use a unique data set that combines panel survey information of individual workers with administrative records. The reasons for survey attrition are coded into different behavioural categories. The administrative records provide information on individual labour market behaviour and personal characteristics for the complete sample (i.e. the sample participants and the non-respondents). We show how attrition is heterogenous in the sense that the separate reasons for attrition are clearly different and test how many distinct reasons there are and how they should be grouped. We examine the implications of attrition for the distributions of variables in the survey and explore the possibility of using interviewer information or an item non-response score to questions (on the first wave of the survey) as valid instruments to correct for selection bias due to attrition.

Keywords: Attrition bias, instruments, unemployment duration, partial likelihood.
 

January, 2000
 

Please do not quote without permission of the authors.
 

* University of Newcastle-upon-Tyne

**Free University Amsterdam and Tinbergen Institute

***Free University Amsterdam, Tinbergen Institute, and CEPR.

Address for correspondence: Department of Economics, Free University Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands.
 

1 Introduction

Attrition is a commonly encountered phenomenon in longitudinal surveys. Whether attrition affects the statistical analysis of the survey data depends on the variables one is interested in. If one aims to use the survey to estimate the fraction in the population with a certain characteristic, then a systematically high or low attrition among those who have this characteristic biases the estimate. If one aims to estimate a model, and the only difference between the sample of respondents who remain in the survey and the initially selected sample is in the distribution of explanatory variables on which one conditions in the analysis, then attrition does not affect the estimation results. Of course, this requires that the process governing survey attrition is unrelated to unobserved determinants of the endogenous variable of interest (i.e. the variable whose values one aims to explain in the analysis). Indeed, attrition must be unrelated to measurement errors in the data on the endogenous variable.

If attrition behaviour is related to unobserved determinants of the variable of interest, and if this ignored, then in general the estimation results are inconsistent. Empirical studies based on social survey data do not pay much attention to attrition, essentially for the reason that it is felt that there is nothing one can do about it. As Horowitz and Manski (1998) state, ``[With non-response,] the only way to identify population statistics is to make assumptions that determine the distribution of the missing data. A fundamental problem of empirical analysis is that such assumptions are untestable.'' Most studies merely provide the attrition rate or (sometimes) compare the marginal distributions of explanatory or endogenous variables among respondents to those in census data. The differences between the marginal distributions can be used to construct weights for the respondents, giving a higher weight to respondents who seem to be underrepresented. The underlying idea of this approach is that if the marginal distributions among respondents are similar to the corresponding population distributions then, hopefully, the conditional distributions of the endogenous variable given the explanatory variables are also similar.

This paper pursues the character of unit non-response in a panel survey in more detail with an unusually informative sampling frame that combines administrative data with survey data. From administrative records, a random sample of long-term unemployed workers in the UK was taken, and a survey was conducted among these workers some months after that. The survey response rate is about 56%. Of the respondents who participated in the first wave, about 71% also participated in the second wave, held about a year later. We refer to Dolton, Lindeboom & Van den Berg (1999) for an analysis of the non-response to the first wave of the survey. The analyses in this paper concerns non-response to the second wave of the panel survey, conditional on participation in the first wave of the survey. In the remainder of this paper we will use the terms ‘non-response’ and ‘attrition’ interchangeably to refer non-participation of in the second wave of this survey. Of interest for our analyses is that in the survey the reason for non-response to each wave was coded into 22 different categories. For example, they indicate whether the individual has moved their geographical location, or whether they refused to be interviewed.

The individual records in the survey data and the administrative data are linked. The administrative data contain useful information on some personal characteristics of the individual. This enables us to study the effect of these characteristics on the reason for not participating in the survey, and, by implication, the difference between the distributions of these characteristics in the administrative data and among respondents. This provides a taxonomy of the reason for attrition. In general, social surveys do not contain any information on the reason for non-response or on characteristics of non-respondents, so that their relation cannot be studied.

The reasons for non-response may be informative for the behaviour of non-respondents. In particular, it is plausible that they are related to individual labour market behaviour. For example, if someone has moved then this may be due to the fact that he found a job somewhere else. If an individual refuses to cooperate with the survey interview then this individual may also be reluctant to apply for a job or that rigorous questioning regarding their availability for work could induce them to sign-off receiving unemployment benefit (UB).

The administrative data contain information on actual labour market behaviour of all individuals in the original sampling frame (i.e., respondents plus non-respondents). In particular, they supply the date at which the individual leaves unemployment. This allows us to study the relation between unemployment duration and (the reason for) leaving the survey. Basically, the administrative data provide us with a unique insight into the behaviour of the sample drop-outs, and, in particular, allows us to see to what extent it differs from the behaviour of those who remain in the sample. The effect of non-response on the estimated unemployment duration distribution follows from a comparison of the estimate based on data from respondents who remain in the sample to the estimate based on data from the sample respondents and non-respondents. In the latter case we allow the distribution for non-respondents to vary with the reason of non-response, in order to detect whether the reason for non-response is related to unobserved determinants of unemployment duration.

We find that the duration distribution varies significantly with the reason of non-response. Needless to say, this may be of interest for agencies that run surveys, as well as for the researcher modeling the length of unemployment spells who is not so well endowed with data as in the present example. In particular, the results facilitate a categorisation of types of individuals who are likely to be a non-respondent in future waves of a survey, and, more innovatively, they enable a targeting of individuals whose non-response is likely to distort the empirical analysis of individual labour market behaviour with survey data.

Having established that an attrition taxonomy is valuable, there is a clear implication for handling attrition problems in other studies which are not so well endowed with data as ours. Our data allow us to check for the validity of instruments that can be used to correct for attrition in behavioural models. For correction of the selection bias generated by attrition, it is essential to have instruments that affect attrition behaviour but that do not affect the distribution of the endogenous variable of interest. If any candidate instrument fails to satisfy either of these conditions they will be invalid. In the remainder of the paper we explore the possibility of using alternative identifers for attrition. The candidates we examine for this role are a score on item non-response to questions on earlier waves of the sample to proxy for the latent variable propensity not to respond and interview information as a valid instrument to correct for selection bias due to attrition. The form of the interview information is of two kinds: firstly, we know how long the previous interview at wave I of the survey took and secondly we know the identity of each interviewer. This interview duration may act as a proxy for the disutility the previous experience caused the respondent and hence influence their likelihood of responding at wave II. The identity of the interviewer is useful as individual differences in interviewer style and personality may have a bearing on the experience for the respondent and hence influence the likelihood of future response. We find that item non-response scores and interview duration are not valid instruments but that the interviewer identity information is a valid instrument. The next step is to use this instrument to correct for attrition bias.

One possibility is to use this instrument in the set-up of a sample selection model and to compare the outcomes of this model with the outcomes of the model that is estimated on the full (i.e. untruncated) sample. An alternative is to devise sample weights based on this instrument, to correct for the different types of attrition. The latter approach is far more attractive to use in non-linear models, such as our duration model for unemployment duration. Moreover, in the context of a weighting procedure, it is very straightforward to deal with different types of attrition (such as in our case). It will be very difficult to correct for different types of attrition in a non-linear sample selection model
 
 

The rest of the paper is organised as follows. The next section describes the data sources. Section 3 discusses the non-response categories. The way in which attrition varies with explanatory variables and the way in which it affects the estimated distribution of unemployment durations is presented in Section 4. In section 5 we test whether interviewer information is a valid instrument and we devise sample weights based on this instrument. We subsequently explore whether a re-weighting of the truncated sample with the weights based on the interviewer information helps in correcting for the distortions in the distributions due to survey attrition. Section 6 summarises and concludes.
 
 
 
 

2 The Data

In 1989 the Policy Studies Institute in the UK was commissioned by the UK Employment Service to evaluate the impact of the ``Restart'' policy program for unemployed workers. At the time, the Restart program consisted of compulsory six-monthly meetings between the unemployed individual and a counsellor of the Employment Office, for each unemployment benefits claimant in the UK. During these interviews, the counsellor offers advice on job search, and he may place workers in contact with employers or training agencies. If the individual does not attend a Restart interview or is deemed not to be available for work then their case is referred to an adjudication officer and they may be faced with the possibility of having their benefits reduced or suspended. Over the period of July to September 1989 over 270,000 such adjudication decisions were made and in 57 were stopped. The main aim of the program is to reduce the amount of time people spend unemployed, and to reduce their dependency on unemployment benefits.

To avoid confusion, it must be stressed from the outset that the Restart interviews are not survey interviews. For the purposes of the present paper, the Restart interviews are only relevant in that the planned date of the first Restart interview (6 months after entry into unemployment) affects the sampling design. In particular, to evaluate the Restart program, a random sample of 8925 unemployed workers were identified in March 1989, who would approach their 6th month of unemployment in the period March-July 1989. Individuals were retained in the sample even if they subsequently did not attend a scheduled Restart interview. Every Employment Service office throughout Britain was contacted while constructing the sample, in order to eliminate regional biases. Individuals were selected for the sample from the inflow lists, on the basis of their National Insurance (NI) numbers. This is known to result in a random 5 percent sample. Of this set, a control group of 582 people was randomly chosen again by means of previously specified NI digit sequences. Members of the control group, although eligible for a Restart interview, were not asked to attend the initial Restart interview. The existence of a random control group allows for the evaluation of the impact of the program without having to deal with the issue of self-selection.

For the sample of 8925 individuals, administrative information on a few personal characteristics, such as sex, age, and travel-to-work area, was retrieved from the Employment Services. The information on an individual's travel-to-work area was linked to the National Online Manpower Information System (NOMIS) data, in order to obtain data on local labour market conditions. In addition, the data are linked to the Joint Unemployment and Vacancies Operating System (JUVOS) Cohort database collected by the Employment Service. The JUVOS data provide accurate administrative records on the claimant's unemployment history from 1982 up to January 1995. Unfortunately, the administrative data do not record the destination state upon exit out of unemployment. This could be employment, a training programme or simply signing off the claiming of unemployment benefit (to obtain benefits, one needs to register at the Employment Service). However, by comparing the administrative data to the survey data for respondents, Dolton and O'Neill (1996b) show that most exits out of unemployment amount to a transition into employment.

After excluding individuals with spells substantially longer or shorter than 6 months in April 1989, and excluding those who lacked either JUVOS data or the travel-to-work area information, we are left with a sample of 8012. Of these, 512 are members of the control group.

About 6 months after the identification of the full sample (i.e., in September 1989), a survey organisation (Social and Community Planning Research, or SCPR) conducted a survey of these individuals. The survey was intended to supply additional information on background variables and job search behaviour of the individuals. Detailed information was obtained on subsequent work history, personal characteristics, the Restart interview, previous employment history, search behaviour and benefit income. This survey was conducted between September and October 1989. Of the original sample of 8925 individuals, 5200 individuals completed the survey. Of the sample of 8012 (see the previous paragraph), 4708 completed the survey. Of these 4708 respondents of the first wave, 3352 also participated also in the second wave. We are interested in attrition between wave I and wave II of the survey. We refer to Dolton, Lindeboom & Van den Berg for an analysis of the non-response to the first wave of the survey.

Table 1 below presents means of variables for the individuals in our sample. The first column reports means of the total sample of 8012 (i.e. those who were invited to the Restart interview and the controls). The second column refers to information of the respondents to the first wave of the survey, the third column to the second wave. It is clear from this table that relatively minor effects of non-response are found on the mean of the variables. More specifically, the average age of the respondent is higher than that of a non-respondent. The unemployment duration variable is measured from a fixed point in time: September 1989, this is the time around the first interview. It is interesting to see that the unemployment duration of individuals who will respond to the survey is higher than the unemployment duration of the total sample. This indicates that, on average, individuals with lower exit rates remain in the sample, implying that survey non-response may be selective with respect to unemployment duration.
 
 


Table 1: Variable means in the total sample and among the respondents


Variable                                                             Total sample             Wave I                 Wave II


Age                                                                     32.62                     33.62                     34.20

Female                                                                  0.30                       0.32                       0.30

Local unemployment
  change (decline in U rate)                                     0.35                       0.35                        0.35

Living in an inner city area                                      0.20                       0.18                        0.20

Member of control group                                        0.06                       0.07                        0.08

Unemployment duration beyond selection date      302.26                   314.53                     340.5

Uncensored cases                                                   0.03                      0.03                         0.06


# cases                                                                 8012                     4708                         3352



 

Although the information in this table may reveal some aspects of the effect of attrition, it still provides at best a partial picture. Moreover, sample non-response may occur for a variety of reasons. It is conceivable, for instance, that individuals who change address, did so because they have found a job in another area. On the other hand, badly motivated people may have difficulties finding a job and may be less inclined to participate in a survey, especially when this survey is about job search behaviour and labour market prospects. The first type of individuals experience on average high exit rates out of unemployment, whereas the latter type of individuals may have on average lower exit rates. For that reason it may be interesting to have a closer look at different types of non-response and the relative importance of this non-response typology in our sample. We do this for the second wave of the survey, conditional on participation in the first wave of the sample. This is the sample set-up which most researchers are faced with.
 
 
 
 

3 A Taxonomy of Survey Attrition

3.1 Reasons for attrition
 
 

Table 2 provides a list of reasons for non-response as coded for each individual initially selected for the survey. The first column refers to the code associated with the type of non-response. The second and third columns give counts of individuals in each of the separate categories for both waves. Sample non-response is an absorbing state, i.e. individuals who do not participate in the first wave, do not return in the second wave.

The coding of 22 reasons for non-response may be informative, it is however less convenient to use this information in any multivariate analysis. We therefore recode the information into four different categories. The motivation for this split follows directly from the work of Goyder (1987) and Waterton and Lievesley (1987). In addition the organisation of the tracing process followed a logical sequence of stages as presented in Figure 1. Hence we used the logic of this process with the detailed reasoning provided in Table A0 with respect to the categorisation of each reason.

We distinguish between non-response due to refusal of the individual to cooperate (labeled as REFUSER), non-response because the individual had moved residence (labeled as MOVER), non-response due to inability of the survey agency to contact a respondent, for reasons not directly associated with mobility (e.g. the respondent is never at home; labeled as NO CONTACT) and finally non-response for other reasons which are no fault of the non-respondent (e.g. ill health, respondent does not speak English; labeled as NO FAULTER).

REFUSER and MOVER are reasons for non-response that are to a large extent initiated by the individual. It can be expected that with respect to the unemployment duration those who left the sample for reasons which placed them in the MOVER group, will have, on average, higher exit rates. The REFUSER group may include less motivated individuals with poor labour market opportunities. It is conceivable, though perhaps unlikely, that this category also contains individuals who are actively searching, or perhaps even already have found a job and therefore are less inclined to participate in the survey.

The NO CONTACT group are defined from codes associated with a loss of contact between the agency running the survey and the individual respondent. Though the loss of contact could be initiated by the individual respondent, it is presumed that an agency, to a large extent, can affect such drop-out by providing more intensive tracing efforts. An example of one of the codes in this category is "not contacted, never in". It is unclear how many times the agency tried to contact the individual, but it is possible that the number of attempts to contact individuals determines the frequency in this category. "No trace of address" or "address vacant or derelict" are examples of other codes in the NO CONTACT group. The category NO FAULTER is simply defined as those non-respondents whose non-response was unavoidable due to circumstances beyond their control. Examples are "Ill health", "Could not speak adequate English", "Lost in the postal system" etcetera.

Table 3 provides counts of the attrition categories under the above mentioned grouping definitions. From the table it can be seen that, for instance, 211 of the non-respondents are movers and that 193 persons refused to cooperate, 151 could not be contacted and 60 have reasons for non-response which are no fault of their own.
 
 
 
 

Table 2: Reasons for attrition, conditional on participation in the first wave


Variable                                                                 Code                 Wave II


No trace of address                                                1                       5
Address vacant or derelict                                       2                       58
Premises demolished                                              3                       0
Business/industrial premises only                             4                       1
Remote address (not issued to interviewer)               5                       1
Mover - follow-up address given                              19                     16
Mover – follow-up address not known                     20                     216
Respondent deceased                                             21                     7
No contact at address                                             22                     111
Complete refusal to interview                                  23                     1
Address given is Benefit office                                24                     0
Interview obtained                                                 51                     3352
Refusal to office                                                    70                     1
Not contacted (e.g. never in)                                   71                    191
Personally refused interview                                    72                    188
Broke appointment and not re-contacted                   73                    82
III (at home) during survey period                            74                    6
Away/in hospital during survey period                       75                   38
Incapacitated                                                          76                    2
Refusal on behalf of respondent                               77                    56
Named respondent could not speak adequate English 78                    1
Other reason for non-response                                 79                    371
Lost in Postal system                                               80                   2



Total number of cases                                                                     4708
___________________________________________________________________________
 
 

Table 3: Counts for aggregated attrition categories


Category                                                                                     number


MOVER               (codes 19, 20, 22)                                             211
REFUSER            (codes 70, 72, 73, 77)                                       193
NO CONTACT     (codes 1, 2, 3, 4, 5, 23, 24, 25, 71)                    151
NO FAULTER      (codes 21, 74, 75, 76, 78, 79, 80)                      60



Total number of drop-outs                                                             615

___________________________________________________________________________
 
 
 
 
 
 

In the empirical analyses we will relate these non-response categories to a range of exogenous variables and the unemployment duration. This will provide us more insight into the effect of non-response on a range of exogenous variables and a relevant endogenous variable such as unemployment duration. Before we do this, some remarks remain.

Firstly, our grouping of the attrition categories is to a large extent driven by our interest in the unemployment duration variable. The groups are constructed in such a way that it may be expected that the within group codes are relatively homogenous in their effect (except of course for NO FAULTER). This should avoid the potential offsetting effects of separate codes within each category take place. We therefore believe that this way of grouping the data helps us to better understand the forces that drive non-response and its effect on the variable of interest. Moreover, the "endogenously driven" grouping may help us in our search for suitable instruments to correct for non-response.

Secondly, it has to be noted that as both MOVER and REFUSER are defined as being individual initiated attrition, they may therefore be relevant for a specific group of unemployment recipients. Self selectivity effects are expected to play an important role in these groups and it will in general be difficult to find good instruments to correct for this. The loss of contact between the agency and the respondent (NO CONTACT), on the other hand can to a large extent be influenced by the agency running the survey and, to a certain extent, is expected to act more generally on the sample of unemployment recipients. It has to be noted however, that extra efforts of the agency may influence the composition of the sample directly.

Finally since we have imposed this fourfold taxonomy from the outset then we must test its validity. We can do this formally in the next section where we present the estimation results.
 
 

3.2 Attrition and personal characteristics

The appropriate statistical model for the examination of the multiple reasons for attrition from the sample is multinomial logit analysis. Using this estimation it is important to check to see if the model estimated satisfies the Independence of Irrelevant Alternatives (IIA) assumption which is a consequence of this model. This can be checked using a the test suggested by Hausman and McFadden (1984) which is a variant of the Durbin-Hausman-Wu test. In addition it is important to verify that the number of alternatives which have been modeled are the most satisfactory representation of the data. This is particularly important if there is no good theoretical basis for a particular grouping of the outcome variable in the data or, as in this case, we are attempting to summarize 22 possible reasons into a much smaller but coherent typology. The various alternative configurations can be tested using the Cramer-Ridder (1991) test . Once this test was used on our data and categorisation of the attrition reason we established that our a priori reasoning of dividing the attrition into four basic reasons was too sophisticated. Indeed the data suggests via the results of the Cramer-Ridder test that the NO FAULTER and NO CONTACT groups should be grouped. The results in Table 4 report the final specification which was accepted after a series of Cramer-Ridder tests (described in the Appendix). We label this group ALL OTHERS. The alternative fourfold specification is reported in Table A4 in the Appendix.

The Table relates the reason for non-response at the second questionnaire of the survey is related to a range of characteristics. Table A2 of the Appendix reports a logit analysis for survey attrition. In Table 4, the respondents serve as the reference group and hence the effects presented are relative to this group. For the category MOVER and REFUSER groups, the age variables imply a quadratic relationship between non-response and age. These results imply that the probability of non-response rise up to middle years and decrease thereafter. This effect is not significant for the non-response of the ALL OTHERS group. Apart from age, being a female, being married, being in the control group and the local unemployment rate change between 1988 and 1990 seems to matter. With respect to the latter variable, large impacts are found on the probability to drop-out in the ALL OTHERS category. Females, married individuals and members of the control groups remain longer in the sample. Under the REFUSER category there is also some evidence that self declared problems with literacy may play a role in attrition.

Included in Table 4 are two additional variables that are labeled ‘Potential Attrition Identifiers’. The aim of including these variables is to try to find factors regarding the survey or interview conditions which might correlate with non-response type but be independent of our endogenous variable, namely the unemployment duration. These results are discussed further in Section 4 when we describe the search for valid instruments.

We also estimated a logit model in which all exogenous variables were excluded. The results of this logit can be used to test whether non-response is random. The likelihood ratio test for joint significance of the included regressors yields a statistic of 70.8. From this it can be concluded that this hypothesis is strongly rejected.
 
 
 
 

Table 4: Non-response in the first wave and explanatory variables: results from a multinomial logit


Variable                             MOVER                     REFUSER                     ALL OTHERS


Constant                           1.786 (2.95)               -2.035 (3.26)           -1.644 (2.74)

Personal Characteristics

Age                                   0.061 (2.29)                   0.034 (1.41)             0.027 (1.18)
Age2                                 -0.003 (3.66)                 -0.001 (2.08)            -0.001 (1.51)
Female                              -0.296 (1.74)                  0.114 (0.71)            -0.989 (4.82)
Married                             -0.670 (2.91)                 -0.034 (0.17)            -0.851 (3.87)
Literacy Problems               0.066 (0.28)                 -0.607 (1.90)              0.121 (0.54)
Education                          -0.241 (1.55)                 -0.168 (1.05)             -0.189 (1.23)
Number of Children             0.042 (0.40)                 -0.124 (1.14)              0.094 (0.94)
Black                                  0.644 (1.64)                 -0.013 (0.02)              0.352 (0.82)
Asian                                  0.067 (0.18)                  0.134 (0.34)               0.113 (0.31)

Exogenous Factors

Local Unemp.change             0.198 (0.13)                -1.630 (1.07)              5.085 (2.09)
Inner city area                       0.042 (0.23)                 0.026 (0.14)             -0.087 (0.25)
Control group                        0.763 (2.16)                 0.196 (0.76)              0.106 (0.24)

Potential Attrition Identifers

Interview Duration                 -0.002 (0.38)                0.007 (1.08)               -0.003 (0.65)
Item Non-response                -0.436 (1.31)               -0.416 (1.20)               -0.169 (0.66)



- log likelihood 2065.91

_____________________________________________________________________________________

Explanatory note: The respondents are taken as reference group. The absolute t-values are in parentheses. Coefficients are in bold if they are significant at the 10% level.
 
 

3.3 Attrition and the Distribution of Unemployment Duration

The potential survey participants are a random sample of workers approaching their 6th month of unemployment duration in the period March--July 1989. The survey, which was conducted in the fall of 1989, was set up to analyze the effect on job search outcomes of a Restart interview. As we saw in the tables of Section 2, 4708 out of the original set of potential respondents participated in the survey and of these 4708 respondents, 3352 also participated in the second wave held about 6 months later. This section examines the distortions on the distribution of unemployment duration due to attrition between the two waves. Our unemployment duration variable is defined as the duration of unemployment onwards from the first interview date. We take a fixed point in time, namely the first of September 1989. It is interesting to know that from the 4708 respondents, 1896 had already found a job by the time that they were interviewed (i.e. 2812 were still unemployed).

Basically, one can distinguish two reasons for a relation between non-response (and the reason for non-response) and unemployment duration. First of all, job search behaviour and the behaviour towards survey participation may be affected by the same underlying individual-specific characteristics. An individual with a relative dislike for social contacts may refuse to cooperate with the survey interview and may also be reluctant to apply for a job. An individual who spends a lot of his time searching for a job may not want to spend time with a survey interview. The second reason for a relation between non-response and unemployment duration is that the acceptance of a job makes it more difficult for the agency to contact the individual. Job acceptance may entail a movement of the individual to another geographical location - which could easily be out of the scope of the survey. Also, the individual may be away from home more often. This relation is fundamentally different from the first-mentioned explanation, as the causal effect runs directly from job acceptance to non-response, and this effect does not depend on the presence of unobserved characteristics.

Van den Berg, Lindeboom and Ridder (1994) examine the relation between unemployment duration and non-response in a model for attrition from a longitudinal survey, conditional on response in the first wave of the survey. Their data do not enable a distinction between the two reasons for such a relation. This is because it is not observed whether an individual accepts a job between the last wave in which he participates and the first wave in which he does not participate. In the present paper we are in a position to distinguish empirically between the two reasons for a relation. This is because we always observe the moment of exit out of unemployment. If most non-respondents exit unemployment in between the two waves then this indicates that there is a direct causal effect from exit out of unemployment to non-response. If, on the other hand, we observe a relation between non-response and exit out of unemployment at dates after the survey date, then this indicates that the relation works by way of unobserved individual-specific characteristics.

It is to be expected that the reason for attrition is also informative with respect to the relation between attrition and unemployment duration. For example, a person who drops out of the sample because they are a MOVER may have a direct causal relation between their unemployment duration and attrition, whereas an individual in the REFUSER category may have unobserved characteristics which play an important role.

A simple way to examine the consequence of survey non-response is to apply a basic non-parametric method such as the Kaplan-Meier estimate of the duration distribution. Our sample contains ongoing spells of unemployment duration, as individual spells are included who are still unemployed at the date of the survey. We therefore have to adapt the Kaplan-Meier estimate of the exit rate (or hazard) accordingly. An appropriate way to deal with a sample of ongoing spells is to focus on the distribution of unemployment duration, beyond the selection date (the residual duration, denoted as r), conditional on the duration elapsed up to the date of selection (the elapsed duration, denoted as p). In practice the adaptation of the Kaplan-Meier estimate of the hazard amounts to an appropriate redefinition of the risks sets at the observed times that a failure occurs.

With flow data for instance, the risk set at a point, say t, includes durations that are equal to or exceed t. So the risk set Rt satisfies Rt=åiI (ti > t ), where I(.) is the indicator function and ti the duration of individual/spell i. In case of the distribution of the residual duration conditional on the elapsed duration, a spell ( t=r+p) is included in the risk set if additionally t exceeds the elapsed duration p . So the risk set for the residual duration conditional on the elapsed duration, Ri* , satisfies in this case Rt*=åiI (ti > t , pi <t). Table A3 of the appendix reports survival probabilities for the different type of non-response groups that we consider. In Figure 2a and 2b the survivor functions are plotted.

There are clear differences in the survivor functions. The survivor function for the group MOVERS is uniformly below all other functions implying that the unemployment exit rates are higher for this group. Respondents who refuse to participate in the survey have substantially lower exit rates, though at the right tail of the distribution (for durations longer than a year) the survivor functions seem to come close to the survivor function of the survey participants. This is also the case for REFUSER. The group associated with attrition due to the loss of NO CONTACT or NO FAULTER between the individual and the agency has the lowest unemployment exit rates (i.e. the survival curve is above all others.

The log-rank test for equality of the survivor functions in Figure 2a yields a test statistic of 14.7, which exceeds the 95th percentile of the c 24 distribution, which equals 9.5. Equality of the survivor functions is therefore rejected.

Table 5 reports results of a partial likelihood analysis. Contributions to the partial likelihood function are based on the conditional probability that a spell i ends, given the risk set Ri, defined as the set of spells having the same duration as spell/individual i or longer. This conditional probability is a simple ratio of the hazard for i relative to the sum of the hazards of all individuals that are exposed to the risk. In the case where the commonly used proportional hazard (PH) assumption is adopted, factors common to all individuals cancel from the expression. Consequently, under this PH assumption, no specific form of the baseline hazard is required and therefore in principle an unrestricted non-parametric baseline hazard is allowed for. For instance if we define the hazard rate q ( ti , xi ;b ) = q0(ti )q0(xi ;b ) then the partial likelihood function can be written as :


 
 

where d i is an indicator that equals 1 if i is observed to make a transition. In our case where we base our partial likelihood on the distribution of the residual duration, conditional on the elapsed duration, the risk set needs to be modified. The modification is similar to the modification of the risk set for the Kaplan-Meier estimate. More specifically, Rpi is defined as the set containing all spells exceeding the length of spell i (ti ) of which the elapsed duration pi is smaller than tt In a way the redefined risk sets allows for delayed entry of specific items, where the delaying time depends on the elapsed duration. An alternative is to use a stratified partial likelihood approach (see Ridder and Tunali, 1990) in which the stratification is on the elapsed duration. This procedure allows for non-parametric baseline hazard for each stratum. We estimated these models. The main conclusions were not altered.

We included dummy variables for each type of attrition in the specification of the hazard. This is convenient, but it may be restrictive as it implies constant shifts between the hazard rates. Note that Figure 2a suggests that such a specification may be reasonable. Moreover, restrictiveness of the specification could be tested by comparing it to more flexible alternative specifications. We also used stratified partial likelihood methods, where stratification was taken with respect to the different attrition types. Estimates of regression coefficients of these methods were virtually identical to estimates following from a specification with attrition dummies.

All the tables below will also report sandwich estimators of the variance-covariance matrix as suggested by Lin and Wei (1989). The sandwich estimator of the variance covariance matrix is robust to misspecifications of the model. Consequently, a comparison of the standard errors from the conventional estimates of the variance-covariance matrix with the sandwich estimator provides an informal specification test. Table 5 below reports the results on the effect of non-response on the hazard rate of unemployment duration. We observe individual records from the start of the unemployment spell that led to the invitation of the Restart interview (1988--1989) up to 31 December 1994. Only 229 out of the 8012 cases (approximately 3%) are right-censored. We lack administrative information on the state of destination, so that uncensored cases include both job findings as well as transitions to out-of-the-labour force states. This limitation needs to be borne in mind when interpreting the results of Table 5.

The first column of Table 5 refers to the partial likelihood results for the complete sample (i.e. the respondents and the non-respondents) and non-response indicators. In general these indicators will not be available. Significant coefficients signal differences in the distribution of unemployment durations but also signal the presence of relevant (normally unobserved) heterogeneity. The second column concerns the results for the complete sample, but the non-response indicators are left out. A comparison of columns 1 and 2 reveals whether the exclusion of information on non-response distorts the other parameters of the model. The third column reports estimates based on the sample of individuals who participate in both waves of the survey. A comparison of this column with the two other columns provides us with insight in the extent of the bias in some coefficients, generated by survey non-response.

We start with a discussion of column 1. The coefficients on the variables Age and Age2 suggest initially declining exit rates that increase after middle age. Women are observed to have higher exit rates. This result, on face value, contrasts with the literature, but in reality the exits include exit to non-participation. Since the Restart interview induces those women who are not genuinely seeking a job to sign-off UB when cross-questioned about their availability for work it induces many to sign off. Married people and those with an educational qualification also experience higher unemployment exit rates. These variables imply monotonically decreasing exit rates. The local unemployment change variable has a large and significant positive effect on the exit rate. Better labour market conditions increase the exit rates of the unemployed. The inner city variable indicates that lower exit rates for inner city inhabitants. Finally note that the control group experiences no significantly different exit rates than those who had a Restart interview. Note that we measure the effect of the variable conditional on (still) being unemployed at the date of the first wave of the interview, a couple of months after the Restart interview took place. In previous analyses (Dolton, Lindeboom & Van den Berg (1999)), where we analysed non-response to the first wave, we found that the control group experienced significantly lower unemployment exit rates than those who had a Restart interview.

As expected MOVER has a large and significant positive coefficient. Apparently, individuals who leave the sample for mobility reasons are also the same individuals that experience higher unemployment exit rates. One may argue that individuals change place of residence because they have found a job, and that therefore a permanent shift in the exit rate (as we modelled it) may not be the right way to capture this. If this were the case, one would expect to find the larger part of the unemployment durations to end prior to the interview date, and it would be appropriate to model this relationship more directly. A glance at the data revealed that this is not the case. Therefore, capturing the intrinsic mobility behaviour of individuals with a permanent component, such as MOVER, (so that it captures time constant individual heterogeneity) may be the most appropriate way to do this. The coefficients relating to the ALL OTHERS and REFUSERS are insignificant suggesting little bias may result in the estimation of the unemployment duration equation from ignoring these kinds of attrition.
 
 

A comparison of column 1 with column 2 reveals that few differences are to be found in the case where we omit the non-response indicators. In a way this tells us that omission of unobserved characteristics does not affect the other parameter estimates. Whether truncation of the sample with respect to non-response indicators (which is usually the case in surveys) is of influence can be judged from the parameter estimates in column 3. A direct comparison of the results show that the gender, married, the number of children, having a drivers license and the "change in the local unemployment rate" are variables that are affected the most. Note that, for instance, the effect of the local unemployment change variable could be best seen in respect to the results of Table 4 of the previous section were we found that survey response rates increase when local labour market conditions improve. If this variable has a positive effect on the exit rate, then indeed one would expect to find this effect to be exaggerated in the case where the model is estimated on survey participants alone.

One possible method of testing whether the Specification I is significantly different from Specification II in Table 5 is to perform a Durbin-Hausman Wu test. More specifically we need to assume that our unemployment duration equation is consistently estimated even when attrition is present. In this case having instrumental variables for the attrition may induce more efficient estimates. This is the idea behind a test of Specification I against Specification II, where the former is more efficient than the latter. Formally we would test the hypothesis that the two sets of coefficients are not systematically different against the alternativethat the coefficients are different. We get a test statistic of =6.16. Since the critical value at the 95% level is 23.685 we reject in favour of , implying that there are no systematic differences in the coefficients which suggests at face value that the data on unemployment duration can be adequately modelled without including attrition dummies. This result may, not be surprising given how similar the sets of coefficients look in Table 5 between the alternative specifications. However it should be remembered that part of the misspecification which is present (in excluding drop outs) will be absorbed into the baseline hazard of the model. We investigated an intuitive test for this by estimating an exponential hazard model in place of the PH model. In the equivalent of Specifications I and III there was a greater difference in the corresponding coefficients.

Further investigation of the differences in the unemployment duration can be studied by the estimation of separate partial likelihoods for each of the attritor groups separately. These results are presented in Table 6. It should again be re-iterated that in the usual kind of statistical analysis of unemployment durations we would not observe any of the unemployment durations of the individuals in these equations. Hence these results should be interpreted with care. Specifically these results are the unemployment hazards of those who have dropped out of the survey. The results show a clear difference between the MOVERS, REFUSERS and ALL OTHERS. Age and education are important factors in unemployment duration for MOVERS. Conversely women, married individuals and blacks have longer durations given they are in the REFUSER sample. The results also show that the unemployment durations of the ALL OTHERS group cannot be explained by any of the observable characteristics that we have in our data.

The implication of these results is that we should be cautious about estimating hazard models on samples with attrition in the hope to that we can infer validity about the whole sample of unemployed. The most specific form of the bias derives from those who drop out of the sample because they move away. This is not surprising as most people in this category may be making their geographical move for job related reasons. However it seems quite clear that we may mistakenly believe that attrition would cause no substantive bias if we simply compare the coefficients of Specification II and III in Table 5. However, when we have the (usually unknown) information about unemployment duration by attrition type we can see from Table 6 that the determinants of unemployment duration are quite different for these three groups. It may well be that the net effect of erroneously joining these three groups together (into a group of any person who drops out of the sample) may average out these different influences and mask the heterogeneity which exits between attrition types. Our results also suggest that the attrition process is endogenous to the determination of unemployment duration.
 
 
 

Table 5: Unemployment duration with and without attrition IV estimates: results from a partial likelihood analysis


Variable                             Specification I                     Specification II                     Specification III


Personal characteristics

Age                              -0.041 (6.9/6.7)                  -0.040 (6.9/6.6)                  -0.045 (6.7/6.8)
Age2                                      0.001 (6.7/6.2)                  0.001 (6.6/6.0)                    0.001 (6.5/6.3)
Female                           0.400 (9.3/9.4)                  0.401 (9.4/9.5)                    0.408 (8.5/8.8)
Married                          0.383 (7.0/6.6)                  0.386 (7.2/6.7)                    0.421 (6.9/6.6)
Black                            -0.120 (1.0/0.7)                 -0.104 (0.7/0.5)                   -0.168 (1.2/0.7)
Asian                             0.081 (0.8/0.6)                 -0.007 (0.6/0.4)                   -0.007 (0.6/0.4)
Literacy                          0.012 (0.2/0.0)                  0.012 (0.2/0.0)                    0.060 (0.8/0.7)
Education                       0.227 (5.6/5.0)                  0.223 (5.5/4.9)                    0.216 (4.6/4.2)
Total number of kids      -0.066 (2.6/2.7)                  0.068 (2.6/2.4)                   -0.081 (2.8/2.5)
Drivers licence               -0.044 (0.7/0.7)                 -0.044 (0.7/0.7)                    0.017 (0.3/0.2)
Mobile                           0.283 (3.9/4.7)                   0.282 (3.8/4.6)                    0.296 (3.6/4.5)

Exogenous Factors

Local Unemp.change         1.520 (4.0/3.7)                   1.488 (3.9/3.5)                   1.247 (2.9/2.6)
Inner city area                  -0.118 (2.4/2.1)                  -0.121 (2.4/2.2)                  -0.118 (2.0/1.8)
Control group                   -0.073 (1.0/1.1)                  -0.077 (1.1/1.2)                  -0.089 (1.1/1.2)
 

Non-response types
MOVERS                       0.188 (2.7/2.7)
REFUSERS                    0.077 (1.0/1.3)
ALL OTHERS               -0.079 (1.1/2.9)



- log likelihood                    18736.06                             18740.37                             14096.14

# cases                                      2812                                    2812                                  2197


Explanatory note:

The partial likelihood is based on the distribution of the residual duration conditional on the elapsed duration. In parentheses we report t-values based on the sandwich estimator of the variance covariance matrix (Lin and Wei, 1989) and t-values based on the inverse of the hessian, respectively. In specification I, the model is estimated on the full sample, and dummy variables are included for the different non-response types. Specification II is estimated on the same sample, but the non-response indicators are excluded. Specification III refers to estimates on the sample of survey respondents at wave II.
 
 

Table 6: Unemployment duration by Attrition Type: results from a partial likelihood analysis


Variable                                 ALL  ATTRITORS                     MOVERS                     REFUSERS                     ALL OTHERS


Personal characteristics

Age                                       -0.025 (2.01)                             -0.054 (2.24)                 -0.021 (0.97)                     -0.024 (1.11)
Age2                                                 0.001 (1.77)                              0.001 (1.76)                  0.003 (0.68)                      0.001 (1.61)
Female                                    0.377 (3.86)                              0.284 (1.62)                  0.720 (4.65)                      0.078 (0.38)
Married                                   0.262 (2.31)                             -0.152 (0.74)                  0.503 (2.45)                      0.334 (1.50)
Black                                      0.102 (0.46)                             -0.285 (0.86)                  0.851 (2.96)                      0.460 (1.14)
Asian                                       0.444 (2.25)                              0.494 (1.51)                   0.648 1.80)                      0.214 (0.63)
Literacy                                  -0.183 (1.28)                             -0.131 (0.51)                 -0.002 (0.08)                    -0.197 (0.76)
Education                                0.254 (2.98)                               0.429 (3.00)                  0.262 (1.70)                     0.151 (0.99)
Total number of kids               -0.018 (0.05)                              -0.030 (0.43)                 -0.086 (0.72)                    -0.089 (0.77)
Drivers licence                        -0.141 (3.66)                              -0.105 (0.19)                 -0.025 (0.66)                     0.726 (0.89)
Mobile                                    0.194 (1.90)                                0.096 (0.18)                  0.217 (1.26)                      0.567 (0.71)

Exogenous Factors

Local Unemp.change               2.840 (3.34)                                   5.253 (4.2)                  2.110 (1.47)                       1.842 (1.19)
Inner city area                        -0.097 (0.91)                                -0.107 (0.53)                 -0.108 (0.60)                     -0.153 (0.86)
Control group                         -0.007 (0.03)                                -0.011 (0.04)                 -0.170 (0.55)                     0.243 (0.621)



- log likelihood                         3238.22                                         896.46                           796.19                                887.76

# cases                                    615                                                211                                193                                    211


Explanatory note:

The partial likelihood is based on the distribution of the residual duration conditional on the elapsed duration. In parentheses we report t-values. The model is estimated on the sample who drop out of the sample at Wave II by the reason of their attrition.
 
 

4 The Search for Valid Instruments to correct for sample attrition
 
 

For correction of the selection bias generated by attrition, it is essential to have instruments that affect attrition behaviour but that do not affect the distribution of the variable of interest. One candidate for the role of an instrument which has been shown to be valid in other datasets is an item non-response score. The idea of this variable is that each person has a latent unobserved variable - propensity not to respond to surveys. One possible proxy for such a variable is the tendency not to respond to individual questions on a previous wave of the questionnaire as this may indicate the first signs of a growing tendency of the individual not to respond. We computed such a variable from neutral questions on wave I and developed a score. (Details are provided in the Appendix). It is this score we use as a regressor in Table 4. The results indicate that such a variable does not have sufficient explanatory power in this data set to act as a valid instrument.

A second candidate as an instrument is the time spent in the face-to-face interview in the previous wave of the survey. The argument for using such a variable is that a previous, time consuming, experience of being surveyed may make the individual less likely to agree to respond in the next survey. This variable has also been found to be significant in previous studies e.g. Dolton, Taylor and Werquin (1999). The results reported in Table 4 however show that this variable is insignificant in the Restart data and hence not a candidate as an instrument.

Our final candidate as an instrument to correct for sample attrition is information on the interviewer who performed the interview in the first wave of the survey. The most flexible way to incorporate interviewer characteristics is to use interviewer fixed effects (i.e. interviewer dummies). To test for the usefulness of this instrument, we include the set of interviewer dummies (170 interviewer dummies) as explanatory variables in the logit for attrition between wave I and II. The likelihood ratio’s yields a test statistic of 311, which exceeds the 95th percentile of the c2170 distribution, which equals 207. The set of interviewer dummies is jointly significant. To establish that an instrument is truly valid one must also verify that the variable in question does not correlate with the relevant chosen outcome measure in order that the variable can act as an exclusion restriction. Hence it can be tested by establishing if this set of interviewer dummies also adds to the model for unemployment duration estimated on the full sample (so sample survivors and drop-outs). In case of the partial likelihood model all 211 interviewer dummies can be added to the specification. The likelihood ratio statistic is 184, whereas the critical value of the c2211 distribution at the 5% significance level equals 253. This implies that joint significance of the interviewer dummies is rejected. The rejectance probability is about 96%.
 
 
 
 

5 Conclusions
 
 

Most longitudinal surveys suffer from attrition at least some of which may not occur at random from the sample. Attrition may cause a bias in estimates based on data from respondents. We use a unique dataset that combines panel survey information of individual workers with administrative records. The reasons for survey attrition are coded into different behavioural categories. The Cramer-Ridder test supports only three distinct categories: these are those who move away, those who refuse to respond and all others. The administrative records provide information on individual labour market behaviour and personal characteristics for the complete sample (i.e. the sample participants and the non-respondents). We examine the implications of attrition for the distributions of variables in the survey. We find that attrition affects the distribution of exploratory variables such as age and gender, but also that of unemployment duration. Most importantly a Durbin-Hausman-Wu test supports the treatment of different attrition categories in the unemployment duration equation which shows that studies of unemployment duration based only on respondents and ignoring those who drop out of a survey could be biased. Specifically the lack of those who move away between wave I and wave II of a survey is significant as this type of person has significantly shorter unemployment durations.

Finally we suggest that interviewer effects could act as valid identifying instruments for attrition. Our result that previous interview duration and item non-response are distinct from results in other surveys and hence it is possible that attrition identifiers could be highly data specific. Hence what may be a valid exclusion restriction to identify attrition in one sample may not work in another data set.

References

Brehm, J. (1987), "Who's missing? An analysis of non-response and under coverage in the 1986 national election studies post-election survey",Working paper, National Election Studies.

Cramer , J.S. and Ridder, G. (1991) ‘Pooling States in the Mulitnomial Logit Model’, Journal of Econometrics, vol.47, pp.267-72.

Diggle, P. and Kenward, M.G. (1994) "Informative Drop-out in Longitudinal Data Analysis", Applied Statistics , Vol.43, pp.49 - 73.

Dolton, P., Lindeboom, M. and Van den Berg, G. (1999) ‘A Taxonomy of Survey Non-response and its Relation to Labour Market Behaviour’, in ‘The Creation and Analysis of Employer-Employee Matched Data’, Haltwanger, J., Lane,J., Spletzer, J., Theeuwes, J., and Troske, K. (eds)

Dolton, P. and D. O'Neill (1995), "The impact of Restart on reservation wages and long-term unemployment", Oxford Bulletin of Economics and Statistics , 57, 451 - 70.

Dolton, P. and D. O'Neill (1996a), "Unemployment duration and the Restart effect: some experimental evidence", Economic Journal , 106, 387 - 400.

Dolton, P. and D. O'Neill (1996b), "Restart and roundabouts: experimental evidence on the long-term effects of the Restart unemployment program", Working paper, University of Newcastle-upon-Tyne.

Dolton, P. and D. O'Neill (1996c), "The Restart effect and the return to full-time stable employment", Journal of the Royal Statistical Society,Series A , 159, 275 - 88.

Dolton, P. and Taylor, R. (1999) ‘A Taxonomy of Survey Nonresponse: A Case Study of the Long Term Unemployed’, Newcastle University Discussion Papers in Economics, no.99-04.

Goyder, J. (1987) ‘The Silent Minority. Nonrespondents on Sample Surveys’. Polity Press.

Hausman, J. (1978) ‘Specification Tests in econometrics’, Econometrica, vol.47, pp.153-162.

Hausman, J. and McFadden, D. (1984) ‘Specification Tests for the Multinomial Logit Model’, Econometrica, vol.52, pp.1219-40.

Horowitz, J.L. and Manski, C.F. (1998), "Censoring of outcomes and regressors due to survey nonresponse: identification and estimation using weights and imputations", Journal of Econometrics , 84, 37 - 58.

Lin D.Y. and L.J. Wei (1989), "The robust inference for the Cox proportional hazards model", Journal of the American Statistical Association , 84,1074 - 1078.

Potthoff, R.F., K.G. Manton and M.A. Woodbury (1993), "Correcting for nonavailability bias in surveys by weighting based on number of callbacks", Journal of the American Statistical Association , 88, 1197 - 1207.

Ridder, G. (1987), "The sensitivity of duration models to misspecified unobserved heterogeneity and duration dependence", Working paper, Groningen University.

Ridder, G. and W. Verbakel (1988), "On the estimation of the proportional hazard model in the presence of unobserved heterogeneity", Research Memorandum, University of Amsterdam.

Ridder, G. and I. Tunali (1990), "Family-specific factors in child mortality: stratified partial likelihood estimation", Research Memorandum, Groningen University.

Van den Berg , G.J., M. Lindeboom, and G. Ridder (1994), "Attrition in longitudinal panel data, and the empirical analysis of dynamic labour market behaviour", Journal of Applied Econometrics , 9, 421- 435.

Wang, R., J. Sedransk and J.H. Jinn (1992), "Secondary data analysis when there are missing observations", Journal of the American Statistical Association , 87, 952 - 961.

Waterton, J. and Lievesley, D. (1987) ‘Attrition in a Panel Study of Attitude’, Journal of Official Statistics, vol.3, pp.267-82.
 
 
 
 

Appendix
 
 
 
 

Table A1: Data Definitions and Means
 
 
 
 
 
Variable  Mean Definition
Age 16.24 Age over 18.
Female 0.296 1 – if Female
Married  0.454 1 if Married.
Literacy Problems 0.106 1 – if the person has reading or writing problems
Education 0.106 1 – if the person has an educational qualification.
Number of Children 0.523 Number of Children
Black 0.023 1 – if Black.
Asian 0.042 1 – if Asian.
Drivers Licence 0.470 1 – if the person has a drivers licence.
Mobile 0.474 1 – if the person has access to motorised transport.
Local Unemp.change 0.346 Change in the level of Unemployment Locally over last month.
Inner city area 0.202 1 – if the person lives in an Inner City Area.
Control group 0.079 1 – if the person is in the Restart control group.
Interview Duration 16.24 Length of restart interview in minutes.
Item Non-response 0.069 Item Non-response score calculated as below.

 
 
 

Item Non-Response Score

The variable Item Non-response was calculated as a cumulative point score for questions not answered on the questionnaire.

i) Whether they live alone.

ii) What their ethnic origin is (where the alternative prefer not to say is not counted as non-response.)

iii) Do you have a telephone for receipt of incoming calls.

iv) Whether or not they can be willingly contacted in 6 months time.

v) Within your household are you or someone else responsible for owning or renting the property.

  1. Do you have any health problems.
  2. What age were you when you left school?
  3. Could you say if you have any trouble in everyday life with reading in English?
  4. Could you say if you have any trouble in everyday life with writing in English?
  5. Could you say if you have any trouble in everyday life with working with numbers?
An additional point was accorded if they did not return two supplementary questionnaires, one for each of them.

Hence our measure is a point score with a maximum value of 12. The frequency distribution of this variable in our sample is:
 
Item Non response Frequency Relative Frequency
0 2644 94.03
1 152 5.41
2 13 0.46
3 1 0.04
5 1 0.04
9 1 0.04
Total 2812 100

 
 
 
 

Table A2: Attrition between wave I and wave II and explanatory variables: results from a logit analysis for non-response



Variable

Constant                              0.713 (1.82)
Age                                    -0.036 (2.12)
Age2                                    0.001 (3.42)
Black                                  -0.314  (1.09)
Asian                                  -0.039  (0.17)
Female                                0.356  (3.25)
Married                               0.488  (3.46)
Literacy                               0.042   (0.26)
Education                            0.197   (1.96)
Total Number of Children     0.0009 (0.01)
Local Unempl.change           0.447   (0.48)
Inner city area                    -0.041   (0.36)
Control group                      0.318   (1.69)
Interview Duration             -0.002   (0.55)
Item Non-response            -0.436    (1.80)



- log likelihood 1381.30


Explanatory note:
In parentheses we report t-values based on the inverse of the hessian and t-values based on the sandwich estimator of the variance-covariance matrix.
 
 

Table A3 Survivor Function for different types of attrition based on Kaplan Meier estimates



   Response   MOVE  NOCONT    REFUS     NO FAULTER


time 0   1.0000     1.0000   1.0000    1.0000    1.0000
     30   0.7782     0.8333   1.0000    1.0000    1.0000
     60   0.6446     0.8333   0.7197    0.6667    0.8000
     90   0.5634     0.6250   0.6488    0.5833    0.8000
   120   0.5050     0.4375   0.5653    0.4466    0.5444
   150   0.4642     0.3750   0.5161    0.3190    0.3889
   180   0.3986     0.3750   0.4915    0.2552    0.3889
   210   0.3644     0.2500   0.4424    0.2552    0.3889
   240   0.3146     0.1875   0.3687    0.2552    0.3889
   270   0.2635     0.1644   0.3002    0.2194    0.2722
   300   0.2234     0.1480   0.2720    0.2032    0.1810
   330   0.1975     0.1268   0.2462    0.1927    0.1670
   360   0.1727     0.1129   0.2228    0.1681    0.1494
   390   0.1412     0.0985   0.1859    0.1326    0.1379
   420   0.1229     0.0827   0.1686    0.1245    0.1073
   450   0.1096     0.0719   0.1512    0.0970    0.0881
   480   0.0971     0.0590   0.1375    0.0841    0.0728
   510   0.0865     0.0546   0.1237    0.0776    0.0613
   540   0.0801     0.0482   0.1136    0.0631    0.0460
   570   0.0723     0.0431   0.1035    0.0566    0.0383
   600   0.0672     0.0388   0.0933    0.0501    0.0306
   630   0.0603     0.0338   0.0839    0.0469    0.0230
   660   0.0556     0.0288   0.0789    0.0420    0.0230
   690   0.0531     0.0266   0.0752    0.0404    0.0230
   720   0.0500     0.0244   0.0709    0.0404    0.0230
   750   0.0468     0.0209   0.0687    0.0404    0.0192
   780   0.0439     0.0201   0.0658    0.0372    0.0192
   810   0.0414     0.0187   0.0637    0.0340    0.0192
   840   0.0395     0.0187   0.0593    0.0307    0.0192
   870   0.0377     0.0180   0.0579    0.0275    0.0192
   900   0.0365     0.0180   0.0572    0.0259    0.0192
   930   0.0351     0.0173   0.0557    0.0243    0.0192
   960   0.0341     0.0165   0.0557    0.0226    0.0192
   990   0.0329     0.0165   0.0521    0.0210    0.0153
  1020   0.0317     0.0158   0.0506    0.0210    0.0153


 
 

Table A4: Non-response in the first wave and explanatory variables: results from a multinomial logit


Variable                             MOVER             NO CONTACT             REFUSER             NO FAULTER


Constant                            -1.779 (2.94)     -0.932 (1.31)                    -2.034 (3.25)         -5.412 (5.06)

Personal Characteristics

Age                                    0.061 (2.28)       0.015 (0.54)                     0.034 (1.41)         0.060 (1.50)
Age2                                - 0.003 (3.66)       -0.001 (1.04)                 - 0.001 (2.08)        -0.001 (1.35)
Female                            -0.296 (1.74)         -0.959 (4.02)                    0.114 (0.71)        -1.069 (2.77)
Married                           -0.670 (2.91)         -0.766 (2.95)                  - 0.034 (0.17)        -1.049 (2.64)
Literacy Problems              0.066 (0.23)          0.118 (0.45)                 - 0.607 (1.90)         0.144 (0.37)
Education                        - 0.241 (1.55)         -0.263 (1.46)                   -0.168 (1.05)        -0.006 (0.02)
Number of Children            0.043 (0.40)          0.143 (1.24)                   -0.124 (1.13)        -0.073 (0.36)
Black                                 0.641 (0.39)          0.073 (0.54)                  - 0.013 (0.03)         0.919 (1.44)
Asian                                 0.064 (0.37)         -0.369 (0.49)                    0.135 (0.34)         1.042 (2.00)

Exogenous Factors

Local Unemp.change         0.184 (0.13)             -2.326 (1.34)                 -1.630 (1.07)         5.085 (2.09)
Inner city area                   0.042 (0.23)              0.328 (1.64)                  0.026 (0.14)        -0.087 (0.25)
Control group                  -0.764 (2.16)           -1.054 (2.27)                  0.196 (0.76)          0.106 (0.24)

Potential Attrition Identifers

Interview Duration             -0.002 (0.01)           -0.007 (1.00)                  0.007 (1.08)         0.003 (0.31)
Item Non-response            -0.436 (0.33)            -0.166 (0.54)                -0.416 (1.20)        -0.145 (0.33)



- log likelihood                                                                     2181.22

Explanatory note: The respondents are taken as reference group. The absolute t-values are in parentheses.
 
 

Testing the Independence of Irrelevant Alternatives Assumption. (IIA).

A crucial assumption of the multinomial logit model is the IIA property. Testing whether a particular data set adheres to this assumption can be established by using the test suggested by Hausman and McFadden (1984). The suggested statistic is distributed as a chi squared distribution and has the form:

where

is the coefficient vector from the consistent estimator.

is the coefficient vector from the consistent estimator.

is the covariance matrix of the coefficients from the consistent estimator.

is the covariance matrix of the coefficients from the consistent estimator.

In this test statistic the covariance matrix is guaranteed to be positive definite only asymptotically and assurance are not made about the diagonal elements. Negative values along the diagonal are possible. Such situation will cause the test statistic H to be negative. In this case we can interpret this result as providing strong evidence that we cannot reject the null hypothesis that the coefficients are not affected by the removal of one of the alternatives. In the two tables below we report the test statistics in both the 5 attrition regime and the 4 attrition regime dropping each of the single alternatives in turn. Our results suggest that the IIA is justified in both the 4 and 5 alternative model.
 

Table A5: Hausman IIA Test – 5 Alternatives
 
Dropping Alternative H Chi sq Df H0
NO FAULTER 0.32 42 Accept
REFUSER -1.02 42 Accept
NO CONTACT -0.45 42 Accept
MOVER 0.01 42 Accept

 

Table A6: Hausman IIA Test – 4 Alternatives
 
Dropping Alternative H Chi sq Df H0
NO FAULTER -0.55 29 Accept
ALL OTHERS  -1.63 29 Accept
MOVER -0.07 27 Accept

 
 

Testing How many Attrition Regimes there should be.

Testing the valid number of attrition types can be assessed using the Cramer-Ridder test sequentially. Introducing some notation we assume that we wish to test whether there the j alternatives should be grouped in the classification. Assume there are individuals in j’th alternative and that the total number of individuals n in the sample is denoted by s. Cramer and Ridder (1991) showed that a valid likelihood ratio test could be constructed:

where

is the maximum loglikelihood if the estimates are constrained to satisfy 

and is the mximum likelihood of the original unconstrained model. Clearly the most appropriate way to proceed is to test the alternative of having 5 sets of separate coefficients – one set for the reference group and one each for our 4 attrition types treated. (Note for identification we only actually estimate 4 sets relative to the reference group – which will be our sample respondents.) We would then reduce the number of alternative types of attrition by goruping them and testing them until we have a preferred number of alternatives using this test. This we did systematically. The summary of the results are presented in the Table below. Our conclusion from this analysis is that the NO CONTACT and NO FAULTER groups should be joined together –leaving 4 groups in the data to be used in our duration of unemployment analysis – namely : Respondents, MOVER, REFUSER and ALL OTHER.
 

Table A7: Summary of Cramer-Ridder Test for the Valid number of attrition types.
 
H0 H1 Log Lr LR Critical (df)=Chi Sq Accept
5 regimes 4 regimes –joining NO FAULTER and NO CONTACT -2182.89 3.34 (15)=7.261 H1
4 regimes 3 regimes –joining REFUSER, NO CONTACT, OTHER -2204.98 47.52 (30)=18.49 H0
4 regimes 3 regimes – joining MOVER with REFUSER, and NO CONTACT with OTHER -2328.02 293.6 (30)=18.49 H0
4 regimes 3 regimes – joining MOVER, NO CONTACT and OTHER. -2196.07 29.7 (30)=18.49 H0
4 regimes 2 regimes – joining all attrition types. -2101.58 71.335 (45)=30.1 H0

 

Table A8: Attrition between wave I and wave II and unemployment duration: results from a partial likelihood analysis


Variable                             Specification I                     Specification II                     Specification III



 

Personal characteristics

Age                                -0.042 (7.1/6.9)                    -0.045