Dear all,

I have a panel dataset where individuals are assigned a "success" variable over time. Once success = 1, it remains 1 until the end of the sample period.
I would like to compute for each individual the conditional probabilities of achieving success given success was 0 in the previous periods, i.e. Pr(success(t)=1|success(t-1)=0 for all t, controls).
Since I also have a couple of control variables, these shall also be used as predictors for success.
Below is a sample dataset with one randomly generated control variable and a first_success dummy indicating the first occurrence of success.

What I tried (actually, I tried out quiet a few ideas..):
1. Use logit to compute the probability of success for each separate time period (see "1st TRY")
2. Use logit to compute the probability of success for all time periods simultaneously (see "2nd TRY")

My problem:
When I calculate the cumulative probabilities for both approaches, I get for some ids values that are greater 1 (see, cum_pr_cond cum_pr_SUCC). I guess that I am doing something wrong.

Do you have any suggestions how I could compute the conditional probabilities Pr(success(t)=1|success(t-1)=0) correctly and more elegantly?

Thank you,
Max


Code:
clear
input id    t    success
1    1    1
1    2    1
1    3    1
2    1    0
2    2    1
2    3    1
3    1    0
3    2    0
3    3    1
4    1    0
4    2    0
4    3    1
5    1    0
5    2    0
5    3    0
6    1    1
6    2    1
6    3    1
end
set seed 100
gen control1 = runiform()

*dummy indicating first occurence of success
bysort id success (t): gen success_cnt = _n if success ==1
gen first_success = success_cnt == 1
drop success_cnt
*----------
* 1st TRY
*----------
*Compute Pr(Success(t=1) = 1 | controls):
logit first_success control1 if t ==1
predict pr_SUCCt1e1 if t ==1, pr
label variable pr_SUCCt1e1 "Pr(Success(t=1) = 1 | controls)"

*Compute Pr(Success(t=2) = 1 | controls, Success(t=1) = 0):
*dummy variable for success failure in previous period:
bysort id (t): gen SUCCt1e0 = 1 if success[_n-1] ==0 & t == 2
replace SUCCt1e0 = 0 if SUCCt1e0 ==.
label variable SUCCt1e0 "Dummy(Success(t=1) = 0)"

logit first_success control1 SUCCt1e0 if t ==2
predict pr_SUCCt2e1_IF_SUCCt1e0 if t ==2, pr
label variable pr_SUCCt2e1_IF_SUCCt1e0 "Pr(Success(t=2) = 1 | controls, Success(t=1) = 0)"

*Compute Pr(Success(t=3) = 1 | controls, Success(t=2) = 0, Success(t=1) = 0):
*dummy variable for success failure in previous periods:
bysort id (t): gen SUCCt2e0_SUCCt1e0 = 1 if success[_n-1] == 0 & success[_n-2] == 0 & t == 3
replace SUCCt2e0_SUCCt1e0 = 0 if SUCCt2e0_SUCCt1e0 ==.
label variable SUCCt2e0_SUCCt1e0 "Dummy(Success(t=2) = 0 AND Success(t=1) = 0)"

logit first_success control1 SUCCt2e0_SUCCt1e0 if t ==3
predict pr_SUCCt3e1_IF_SUCCt2e0_SUCCt1e0 if t ==3, pr
label variable pr_SUCCt3e1_IF_SUCCt2e0_SUCCt1e0 "Pr(Success(t=3) = 1 | controls, Success(t=2) = 0, Success(t=1) = 0)"

*Final conditional probability:
egen pr_cond = rowfirst(pr_*)
*cumulative conditional probability:
bysort id: gen cum_pr_cond = sum(pr_cond)

*----------
* 2nd TRY
*----------
*Alternative approach:
logit first_success control1
predict pr_SUCC, pr
label variable pr_SUCC "Pr(Success(t) = 1 | controls)"
*cumulative probability:
bysort id: gen cum_pr_SUCC = sum(pr_SUCC)
bro id t success pr_cond cum_pr_cond pr_SUCC cum_pr_SUCC