Dear all,
I'm restructuring my historical datasets to perform survival analysis. My data is in a non-standard historical dataset format and I would be very grateful if you could help me with it. I have my master dataset that contains information on companies in their year of founding such as their unique identifier, capital, location of headquarters etc. I have information on companies that were found to let's say from the 1800 year to the 1914 year. So it looks like cross-sectional data with DATE column is a founding date of the firm:
1st variable - a unique identifier
2nd variable - DATE as the date of founding in the YEAR-MONTH-DAY format
3d variable - location of headquarter etc.
Then I have 6 separate cross-sectional datasets on these companies in the year 1847, 1869, 1874, 1892, 1905 and 1914. If the company was listed in the year 1847 dataset it means it survived until 1847 and similarly for all years. Year datasets variables partially intersect with master dataset variables since it has the companies unique identifier and some variables such its capital etc. I do not have a dummy variable for survival in either of the datasets. I tried to structure the datasets in survival analysis form and created variable failure which equals 0 for all companies listed in the year 1847,1869,1874 etc. Then I merged year of a founding variable in years-datasets. By comparing the year of founding with 1847,1869, 1874, 1892, 1905 and 1914 years I created year0, year1 per company and then by append I constructed one dataset based on all year-datasets. However, in my dataset, I have for instance one company for which we know when failure equals 0 but we do not know when failure equals 1 since our year-datasets include just survived companies. Basically here is like my dataset looks like now:
Id year0 year1 failure
1 1836 1847 0
1 1847 1869 0
1 1869 1874 0
2 1836 1847 0
2 1847 1869 0
From that, we know that the firm with id 1 was founded in the year 1836 and was listed last time in the year 1874. It means that the firm died somewhere on the interval between 1874 - 1892. The firm with id 2 died somewhere in the interval 1869-1874. My question is how do I add another observation per firm with failure equals 1 since we know for sure if the firm was not listed in one of our year-datasets it fails to survive on the corresponding interval. So what I want to have is that :
Id year0 year1 failure
1 1836 1847 0
1 1847 1869 0
1 1869 1874 0
1 1874 1892 1
2 1836 1847 0
2 1847 1869 0
2 1869 1874 1
where I need to add red observations that tell me when the firm fails to survive.
I'm just starting my career as a researcher and maybe my description of the problem is not clear but feel free to ask any questions. I really appreciate any help or comments.
Thank you!
0 Response to Survival analysis
Post a Comment