Dear STATA experts,


I'm working on a Interrupted Time Series analysis.

I have individual level data of over 30k subjects. My variables of interest are year of diagnosis (YEAR_OF_DIAGNOSIS), the treatment variable which creates 2 groups ("expand" 1/0), and the outcome of interest (uninsured 1/0).
YEAR_OF_DIAGNOSIS is in "double" format.

If I sort the data by year of diagnosis and expand and attempt to declare data as a time series, I get an error: "repeated time values within panel."
I assume this is because this is individual level data.

Code:
sort expand YEAR_OF_DIAGNOSIS

tsset expand YEAR_OF_DIAGNOSIS
I was able to get "around" this error (I think) by collapsing all the data.

Code:
sort expand YEAR_OF_DIAGNOSIS

collapse uninsured, by (YEAR_OF_DIAGNOSIS expand)
list

tsset expand YEAR_OF_DIAGNOSIS
This appears to work, and allows me to proceed with the ITS.

However, I'm not sure if this is the correct way of doing this. Is there a better way, by actually using the individual level data?

I would appreciate any help.