I have a longitudinal dataset below, which is about students' grade retention in K-12 education system in the US.
clear
input str10 id byte (grade state)
1 1 0
1 2 0
1 3 0
1 4 0
1 5 0
1 6 0
1 7 0
1 8 0
1 9 0
1 10 0
1 11 0
1 12 0
2 1 0
2 2 0
2 . .
2 4 0
2 . .
2 6 0
2 7 0
2 8 0
2 . .
2 10 0
2 11 0
2 12 0
3 1 0
3 2 0
3 3 0
3 4 0
3 5 0
3 6 0
3 7 0
3 8 0
3 9 0
3 10 0
3 . .
3 11 0
4 1 0
4 . .
4 3 0
4 4 0
4 5 0
4 6 0
4 . .
4 . .
4 . .
4 . .
4 . .
4 12 0
5 1 0
5 2 0
5 . .
5 . .
5 . .
5 . .
5 . .
5 . .
5 . .
5 . .
5 . .
5 9 0
6 1 0
6 1 1
7 1 0
7 . .
7 . .
7 . .
7 . .
7 4 0
7 5 0
7 6 0
7 7 0
7 7 1
end
1-The dataset has 3 variables, id--student id; grade- current grade year in which the student is (range from 1 to 12); state- if a student is retained in a specific grade year (0-No grade retention 1-have grade retention)
2-As can be seen below, the dataset has the following patterns,
1) complete grade record (there is no missing value on the variable "grade" from grade year 1 to grade 12 and thus the student can make progress normally--like the student with id==1)
2) student has grade retention only once and the variable "grade" has no missing values at all (like the student with id==6)
3) The "grade" variable looks OK; however, you will find that the student is a demoted student. (Like the student with id==5)
More specifically, for the student with id==5, it seems that there is there is no occurrence of "state"==1 from grade==1 to grade==12,
however, you will find the last value of "grade"==9 would be repeated grades if you fill the gap for the sequence (2,.,.,.,.,.,.,.,.,.,9) with the number 3,4,5,6,7,8,9,10,11,
you will find that the last value of grade==9 took place after 11, which means that the student is a demoted student. (Student with id==3 is the same situation as the student with id==5)
4) student (like student with id==4) has missing value on the variable "grade", however, if you count this variable from grade==1 to grade==12 and fill the missing values with the number(s) consecutively,
you will find these students have no grade retention issues.
For example, if the sequence with missing value is 1,2,.,.,5 , and what you filled is 3,4, then the student will be regarded as have a normal grade progression. (Student with id==2 is the same situation as the student with id==4)
5) The student has grade retention only once and it has demoted issue before the occurrence of the 1st grade retention. (Like the student with id==7).
What I need to do is as follows,
1-Count how many students with complete data records on the "grade" variable in the dataset (no grade retention at all from grade 1 to grade 12, like the student with id==1) and we don't need to do anything on the variable "state".
2-Count how many students with missing value on the "grade" variable making normal progression in the dataset (Although there are some missing values from grade 1 to grade 12, in fact, these students still make normal progression, like the student with id==2 or id==4) and correct the variable "state" as 0 for all its values. (We can call this type students "normal gap students")
3-Count how many students who were demoted simply in the dataset, like the student with id==3 or id==5 and correct the variable "state" as 1 when they were demoted. (We can call this type students "abnormal gap students")
4-Count how many students having problematic gap on the grade variable before the 1st occurrence of grade retention in the dataset. (Like the student with id==7) What we need to do is to make the state=1 when the student was demoted and delete the data after the demotion within this student.
Can someone help me with Stata code? The original dataset for the purpose of illustration is listed below,
Thank you very much!
(Note---Each person having grade retention in the original dataset just has one observed grade retention (1st occurrence of grade retention)
The expected dataset should look like below after correcting the variable "state",
clear
input str10 id byte (grade cr_state)
1 1 0
1 2 0
1 3 0
1 4 0
1 5 0
1 6 0
1 7 0
1 8 0
1 9 0
1 10 0
1 11 0
1 12 0
2 1 0
2 2 0
2 . 0
2 4 0
2 . 0
2 6 0
2 7 0
2 8 0
2 . 0
2 10 0
2 11 0
2 12 0
3 1 0
3 2 0
3 3 0
3 4 0
3 5 0
3 6 0
3 7 0
3 8 0
3 9 0
3 10 0
3 . 0
3 11 1
4 1 0
4 . 0
4 3 0
4 4 0
4 5 0
4 6 0
4 . 0
4 . 0
4 . 0
4 . 0
4 . 0
4 12 0
5 1 0
5 2 0
5 . 0
5 . 0
5 . 0
5 . 0
5 . 0
5 . 0
5 . 0
5 . 0
5 . 0
5 9 1
6 1 0
6 1 1
7 1 0
7 . 0
7 . 0
7 . 0
7 . 0
7 4 1
end
0 Response to Seek Help to Count the Number of the Persons Who Have Different Type Gap in a Longitudinal Dataset and Correct it with Stata.
Post a Comment