About CPF

The idea of CPF ​

The Comparative Panel File (CPF) harmonises the world’s largest and longest-running household panel surveys from seven countries: Australia (HILDA), Germany (SOEP), Great Britain (BHPS and UKHLS), Korea (KLIPS), Russia (RLMS), Switzerland (SHP), and the United States (PSID). The project aims to support the social science community in the analysis of comparative life course data. The CPF is not a data product but an open-source code that integrates individual and household panel data from all seven surveys into a harmonised three-level data structure. The open-source character of the code allows for developing and extending areas of application.

CPF is an open-science project aimed at answering the growing need for cross-nationally comparative longitudinal data in the social sciences. It also contributes to the open and replicable science by providing access to data resources and collaborative improvement of research tools.

Currently, CPF is developed by Konrad Turek and Matthijs Kalmijn at the Netherlands Interdisciplinary Demographic Institute (NIDI-KNAW) and Thomas Leopold at the University of Cologne [see The CPF Team]. The CPF code was designed and prepared by Konrad Turek and will be continuously developed and improved by the CPF team and the community of users.

How did it start?

The first version of CPF (1.0) was published in December 2020. The idea of the comparative dataset originated in 2019 among a group of sociologist from the Netherlands Interdisciplinary Demographic Institute, the University of Cologne, and the University of Amsterdam, involved in a research project “CRITEVENTS” founded by NORFACE/DIAL programme*, which focused on critical life events and the dynamics of inequality over the lifecourse. CPF was developed in an attempt to extend and popularise the approach implemented in the Cross-National Equivalent File (CNEF). CNEF is a long-running and well-established project which harmonizes international longitudinal surveys of households. It is an extraordinary endeavor, however, it has some limitations related to topics included, lack of options to include new variables, or complex application procedures.  Building on the CNEF approach, CPF was an attempt to overcome these limitations for users who require more flexibility and control over the data management process. 

* “Critical Life Events and the Dynamics of Inequality: Risk, Vulnerability, and Cumulative Disadvantage” (CRITEVENTS) was funded by NORFACE through the transnational research programme “Dynamics of Inequality Across the Life-Course: Structures and Processes (DIAL), which is co‐funded by the European Commission through Horizon 2020 under grant agreement No 724363.

Basic information about CPF data

CPF version 1.0. covers the period until 2018, combines seven countries, and includes in total around 2.7 million observations from almost 360 thousand respondents. The oldest survey is PSID which started in 1968 and collected 40 waves until now. The second oldest is SOPE which started in 1984 and collected 35 waves. From 1994, CPF includes four countries, from 1999 – five. The youngest panel study in CPF is HILDA with 18 waves since 2001. 

Number of waves, observations and respondents (CPF ver. 1.0)

Number of waves that individual participated: exact number by survey (left axis) and minimal number for the total sample (right axis) (CPF ver. 1.0)

Timeline of the data and number of observations by wave (CPF ver. 1.0)

Distribution of birth cohorts (year of birth) by survey

The team

Assistant professor, sociology, work & ageing labour markets, life course inequalities at Tilburg University

Professor of Methods of Empirical Social Research at the Institute of Sociology and Social Psychology at the University of Cologne

Professor of demography and sociology
at the Netherlands Interdisciplinary Demographic Institute (NIDI-KNAW)

Research Assistant for CPF at NIDI-KNAW. 
MSc Double Degree Sociology and Population Studies (Tilburg University / Universitat Pompeu Fabra)