Hello,
here is a small part of my dataset. The full dataset is about 50million observations:
You can see that some observations are missing due to holidays, weekends etc..
LPERMNO datadate ajexdi prccd
10001 03jan1989 3 6.313
10001 04jan1989 3 6.25
10001 05jan1989 3 6.5
10001 06jan1989 3 6.125
10001 09jan1989 3 6.5
10001 10jan1989 3 6.5
10001 11jan1989 3 6.25
10001 12jan1989 3 6.25
10001 13jan1989 3 6.5
10001 16jan1989 3 6.5
10001 17jan1989 3 6.25
10001 18jan1989 3 6.5
10001 19jan1989 3 6.375
10001 20jan1989 3 6.25
10001 23jan1989 3 6.5
10001 24jan1989 3 6.25
Now i want this dataset to merge with another dataset by the unique identifier (LPERMNO) and by the date (datadate).
Now in my second dataset, there is always one date per year per LPERMNO.
The issue is that this date is sometimes on a holiday or weekend so i cant merge it fully with this dataset.
In this dataset i would like to generate new observations for the missing ones and just take the values from the next available date per LPERMNO.
So for example i would want the missing values 21jan1989 and 22jan1989 to be generated and to take on the values for ajexdi and prccd from 23jan1989.
So the newly, fully complete dataset would look like this:
LPERMNO datadate ajexdi prccd
10001 03jan1989 3 6.313
10001 04jan1989 3 6.25
10001 05jan1989 3 6.5
10001 06jan1989 3 6.125
10001 07jan1989 3 6.5
10001 08jan1989 3 6.5
10001 09jan1989 3 6.5
10001 10jan1989 3 6.5
10001 11jan1989 3 6.25
10001 12jan1989 3 6.25
10001 13jan1989 3 6.5
10001 14jan1989 3 6.5
10001 15jan1989 3 6.5
10001 16jan1989 3 6.5
10001 17jan1989 3 6.25
10001 18jan1989 3 6.5
10001 19jan1989 3 6.375
10001 20jan1989 3 6.25
10001 21jan1989 3 6.5
10001 22jan1989 3 6.5
10001 23jan1989 3 6.5
10001 24jan1989 3 6.25
To ensure that the newly generated observations are actually quite close to the next one and not years away (As this is stock data and they sometimes start trading again after stopping for years) i would also like to limit the "next available date" to be maximimum 30 days away from the missing date
How do i accomplish this? Ideally with a command that doesnt take hours as the dataset is quite huge but it doesn't matter if it isn't possible otherwise. Feel free to ask any clarifying questions or you might have a better solution to my problem.
Thanks
Nick
Related Posts with Generating missing observations with the next available observation
How not to write "replace ..." a thousand times when using egen byte?Hello everyone, Stata beginner here. I am currenly doing research on the trade effects of currency …
twopt title with for values loop--how to display range of values?Hi, This is more of a forvaues question, but since I am using it with maptile, I figured I would pr…
Storing cell proportions of survey dataHello, I am trying to store cell proportions using: Code: svy: tab gender2 symptoms, row matcell(c…
panel data - averages in new groupHi I have a large data set with 9 different balance sheet variables for 57 Companies, over ca. 16 y…
Dominance Analysis Command for Multilevel Zero-Inflated Negative Binomial RegressionDear everyone, Could anyone assist with the Stata command for performing a dominance analysis on a …
Subscribe to:
Post Comments (Atom)
0 Response to Generating missing observations with the next available observation
Post a Comment