Imputing Missing Occupation Codes in Administrative Data

Here we present a methodological paper dealing with the structural break of information on occupations in Swedish register data. The aim of the paper is to find a method to bypass the severe issue structural breaks in occupational data. It is well understood that there are always some disparities in the coverage rate of employees over time in microdata. However, the transition from the Swedish Standard Classification of Occupations (SSYK) 1996 to SSYK2012 resulted in a significant loss of information on occupations. This loss count surged by 1.5 million in 2014 and was not close to recovery until 2020. In this paper, we aim to fill this gap. To do so, we primarily use a logistic regression model to estimate the likelihood of individuals changing occupations, and based on this, we impute missing occupation codes.

Thus, we hereby submit a methodological contribution that we hope can facilitate more accurate longitudinal studies where occupations are considered over time, specifically where the ISCO nomenclature is the underpinning framework. This implies that our paper is not only valid when analysing Swedish data but also has international relevance.

Njekwa Ryberg, Peter and Bjerke, Lina, Imputing Missing Occupation Codes in Administrative Data (March 05, 2025). Available at SSRN: https://ssrn.com/abstract=5166451 or https://dx.doi.org/10.2139/ssrn.5166451