Latest Update at ceprDATA.org

March 19, 2015

Ever wonder where you can get access to the data we use for our papers? Look no further than ceprDATA.org, a website where we provide consistent, (relatively!) user-friendly versions of the Current Population Survey (CPS), American Community Survey (ACS), and other datasets used at CEPR.

Version 2.0 of our CPS Outgoing Rotation Group extract for 1979-2014 was posted last week. You can download the data or view the program files we used to create the extracts we use in-house. CEPR CPS Basic Monthly program files for 1994-2014 have also been updated and are available for download.

This update represents a major overhaul of our extract. We have made a large number of minor coding corrections to a number of variables, and have also dropped some variables from earlier editions of the extract. A full list is in the changelog at the end of the master program.

If you’ve used our CPS ORG extract before, the biggest change is that we now use the Basic CPS data as the sole source for our extract from 1994 to the present. Previously we used NBER’s MORG extract as the underlying source for our extract from 1979-2002, while merging some variables from the Basic CPS into the NBER extract. With this update, we continue to use the NBER MORG extract for 1979-1993, but from 1994-present, we use the raw CPS Basic data directly from the Census.

With this update, we have also updated the version of the NBER MORG for 1979-1993 to use the most recent available version of the NBER extract (accessed July 2014).

We have also made significant changes to our wage variables. Most importantly, we went from carrying over 25 hourly wage variables to carrying just six. These six variables are wage1, wage2, wage3, wage4, rw, and rw_ot. We believe these variables are more straight-forward and do a better job of measuring overtime, tips, commissions, and bonuses (otc) for hourly workers.

Full details on the new wage variables are available in cepr_org_wages.do.

Briefly, wage1 is hourly earnings for workers paid by the hour; it excludes otc; and is available only for hourly workers.

wage2 is the usual hourly earnings, including otc, for nonhourly workers; and is available only for nonhourly workers.

wage3 combines the usual hourly earnings for hourly workers (excluding otc) in wage1, and nonhourly workers (including otc) in wage2; wage3 is available for all workers and attempts to match the NBER’s recommendation for the most consistent hourly wage series from 1979 to the present.

wage4 is the usual hourly earnings, including otc for hourly and nonhourly workers. From 1994 to the present, this series uses hourly workers’ reported usual amounts of overtime, tips, commissions, and bonuses in order to estimate a wage for hourly workers that includes otc. From 1979 to 1993, this series attempts to estimate otc for hourly workers based on differences between weekly pay and the implied weekly pay at usual hours and straight pay. We do not place great faith in the wage4 series before 1994.

(The names wage1, wage2, wage3, and wage4 are borrowed from Economic Policy Institute terminology.)

We have retained a slightly modified version of the rw variable, which is based on wage3 with a number of adjustments. First, rw converts hourly wages to constant 2014 dollars using the CPI-U-RS. Second, for workers who report a top-coded weekly earnings, we assign our estimate of the mean above the top-code, rather than the top-coded value, in order to calculate hourly earnings; our procedure uses a lognormal approximation and is applied separately by gender. (See cepr_org_topcode_lognormal.do and cps_basic_topcode_lognormal.do). We do not adjust earnings for the very small number of hourly workers whose hourly pay is top-coded.

Third, rw includes respondents who report that their weekly “hours vary”. For these workers, we use reported hourly pay or, if necessary, weekly pay together with an imputed usual weekly hours; for details, see cepr_basic_hours.do. Finally, we trim observations where the real 1989 hourly wage is below $0.50 or above $200. (For a longer, somewhat dated, discussion of the top-coding, “hours vary”, and trimming procedures, see this 2003 paper.)

rw_ot is based on wage4 (which includes otc for all workers) and otherwise makes the same adjustments as rw.

Internally, we generally use rw_ot when an analysis uses only data from 1994 to the present and rw when an analysis includes data before and after 1994.

We encourage you to take a look at our data. If you find any problems or have any questions, please let us know.

Support Cepr

APOYAR A CEPR

If you value CEPR's work, support us by making a financial contribution.

Si valora el trabajo de CEPR, apóyenos haciendo una contribución financiera.

Donate Apóyanos

Keep up with our latest news