Hi there,

I am using StataMP 15.1.

I would like to estimate the sample attrition in my panel dataset.

I have observations for a large pool of individuals across four years: 2015,16,17 and 2018.

I have seen that xtdescribe presents this nicely, however, because I told STATA that my data was panel, but did not specify a time variable, the xtdescribe command does not work here.

I could not specify a time variable because I had multiple observations for each year, and as I am not intending to use time-series commands such as lags and leads I was advised that xtset id would be fine without the timevar.

I will attach an example of my data below:


Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long id float(date diary_day tran tran_freq) double amnt int pi 100001 2016 0 2 2 1000 . 100001 2016 0 . 2 . . 100001 2016 0 1 2 5 . 100001 2016 1 . . . . 100001 2016 2 3 4 127 2 100001 2016 2 2 4 820 3 100001 2016 2 1 4 30.5 2 100001 2016 2 . 4 . . 100001 2016 2 4 4 30.400000000000002 2 100001 2016 3 1 1 820 3 100001 2016 3 . 1 . . 100001 2017 0 . . . . 100001 2017 1 1 1 127 3 100001 2017 2 1 1 40 3 100001 2017 3 1 1 35 1 100001 2018 0 . . . . 100001 2018 1 . . . . 100001 2018 2 1 2 10 1 100001 2018 2 2 2 89.23 3 100001 2018 3 . . . . 100002 2017 0 . . . . 100002 2017 1 1 1 25.35 4 100002 2017 2 1 1 120 4 100002 2017 3 1 1 6.140000000000001 1 100003 2016 0 1 1 1623 . 100003 2016 0 . 1 . . 100003 2016 1 . 8 . . 100003 2016 1 3 8 500 10 100003 2016 1 1 8 20 11 100003 2016 1 6 8 20 . 100003 2016 1 5 8 150 . 100003 2016 1 2 8 2 1 100003 2016 1 4 8 34.15 4 100003 2016 1 7 8 25 10 100003 2016 1 8 8 20 . 100003 2016 2 . 2 . . 100003 2016 2 2 2 17.5 10 100003 2016 2 1 2 61 4 100003 2016 3 . 1 . . 100003 2016 3 1 1 2 1 100003 2017 0 . . . . 100003 2017 1 2 3 7.45 4 100003 2017 1 1 3 12.99 4 100003 2017 1 3 3 15 4 100003 2017 2 . . . . 100003 2017 3 2 4 19.72 4 100003 2017 3 1 4 93.97 3 100003 2017 3 4 4 1376.33 . 100003 2017 3 3 4 23.89 4 100003 2018 0 . . . . 100003 2018 1 5 8 3 . 100003 2018 1 2 8 19.150000000000002 4 100003 2018 1 6 8 40 . 100003 2018 1 8 8 20 1 100003 2018 1 3 8 107.92 4 100003 2018 1 1 8 13.71 4 100003 2018 1 7 8 6 1 100003 2018 1 4 8 28 4 100003 2018 2 1 3 94.2 6 100003 2018 2 3 3 41.51 3 100003 2018 2 2 3 22.87 3 100003 2018 3 2 2 1696.1000000000001 . 100004 2017 0 . . . . 100004 2017 1 1 1 3.48 4 100004 2017 2 3 4 579 6 100004 2017 2 4 4 505 . 100004 2017 2 1 4 597 2 100004 2017 3 2 5 74.84 4 100004 2017 3 4 5 389.74 . 100004 2017 3 3 5 92.01 6 100004 2017 3 1 5 92.01 2 100004 2017 3 5 5 389.73 . 100004 2018 0 . . . . 100004 2018 1 1 2 123 2 100004 2018 1 2 2 123 2 100004 2018 2 1 2 12 4 100004 2018 2 2 2 7 4 100004 2018 3 1 2 5 4 100004 2018 3 2 2 40 4 100005 2015 0 . . . . 100005 2015 1 1 1 100.41 3 100005 2015 1 . 1 . . 100005 2015 2 1 1 35.81 3 100005 2015 2 . 1 . . 100005 2015 3 1 4 6.53 3 100005 2015 3 3 4 14 . 100005 2015 3 2 4 3 1 100005 2015 3 4 4 37 3 100005 2015 3 . 4 . . 100005 2016 0 . . . . 100005 2016 1 1 1 516.5 3 100005 2016 1 . 1 . . 100005 2016 2 . . . . 100005 2016 3 . . . . 100007 2015 0 . . . . 100007 2015 1 1 1 5 1 100007 2015 1 . 1 . . 100007 2015 2 . 1 . . 100007 2015 2 1 1 30 1 100007 2015 3 . 2 . . end label values pi pi_l label def pi_l 1 "1 Cash", modify label def pi_l 2 "2 Check", modify label def pi_l 3 "3 Credit card", modify label def pi_l 4 "4 Debit card", modify label def pi_l 6 "6 Bank account number payment", modify label def pi_l 10 "10 PayPal", modify label def pi_l 11 "11 Account-to-account transfer", modify
This is transaction-level data, each observation represents an individual reporting a payment on a specific 'diary_day' (from 0 to 3), per year.

Each individual has an identifier "id" and the year variable is "date".

Is there a way to estimate the attrition rate via dummy variables perhaps?

Even if I cannot see it across the whole panel like in xtdescribe, perhaps there is a way to estimate the attrition rate from 2015-2016 and then 2016-2017, etc... This, however, will not be helpful if some individuals dropped out of the survey in, say, one 'middle' year but then came back in at a later date, presumably.

Thank you in advance for any help.

Jack