Dear all,

I looked for similar queries and couldn't find any, but of course, if you know of other "threads" let me know.
It's been months that I've been working on the following, I'd be very very grateful if you could help me with this. I'm using PSID data, that is known for its "chaos" (there are no recognizable patterns on any dimension: name, labels, time..., in fact, someone developed a full package of commands in order to handle this specific database). So here's a brief description of my dataset:
  • It's a demographic survey that runs from 1968 to 2017 (frequency: annual up to 1999, then biannual).
  • It's in wide format: time runs horizontally.
  • The first two columns identify individuals for the whole duration of the dataset, namely there are values also for individuals that were not born in 1968 and were interviewed later.
  • KEY: the number of variables for each year CHANGE.
Now, the goal is to make this dataset LONG as in a panel format: individual // all years // all variables.
Code:
 
reshape long **ReadyData**, i(id) j(year) (1)
I've placed the year suffix _#### for each variable by a forced loop and spent tons of hours tracking down each variable path, reconstruct coherent lists with DS and rename them consistently.

I'm done with this phase, the dataset is renamed decently in the format (example):

FAMID_1968 INDIVID_1968 // INCOME_1968 NCHILDREN_1968 EMPLSTATUS_1968 // INCOME_1969 NCHILDREN_1969 EMPLSTATUS_1969 [+N: PENSION_1969...] // ...

Now there's my problem: the number of columns per year is not constant, and I don't think reshape is built for dealing with this situation. I wouldn't know how to implement (1) with variables emerging (for instance) the first time in 2013.

It's been a while I didn't post so if you need more details or the actual scripts just let me know and I'll promptly get back to you.
Many many thanks in advance,

Francesco