Generating new variable as a function of most common observation in another variable

Hey everyone! I am trying to create a concentration index for occupation across States for different caste groups in India. For that firstly, I am trying to find the most common occupation for a particular caste group in a particular state and time. I am trying to use the following code but it is not working with my dataset

Code:

gen common_state_occ = .  
bysort STATEID time CASTE (Count): replace common_state_occ =occupation[_N]

an example of my dataset is as follows:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int STATEID double OCCUPATION float(CASTE time)
 1  . 6 1
 1  . 6 1
 1  . 6 1
 1  . 6 1
 2  . 4 0
 3 63 3 0
 3  . 7 0
 3  8 4 1
 3  . 7 1
 3  . 3 1
 4  . 2 0
 5  . 5 1
 6  . 1 0
 6 15 3 0
 6  . 3 1
 6  . 3 1
 6  . 4 1
 7  . 6 0
 8  . 2 0
 8  . 3 0
 8  . 3 0
 8  . 2 0
 8  . 4 0
 8 99 5 0
 8  . 2 1
 8  . 4 1
 8  . 3 1
 8  . 4 1
 8  . 5 1
 8 95 5 1
 8 83 3 1
 9  . 4 0
 9  . 1 0
 9  . 6 1
 9  . 6 1
 9  . 3 1
 9 95 3 1
 9  . 3 1
10  . 3 0
10  . 6 0
10  . 6 0
10  . 3 0
10 63 3 1
10  . 6 1
10  . 3 1
10  . 3 1
11  . 6 0
11  . 3 0
19 63 4 0
19 98 6 0
19  . 2 0
19  . 3 0
19 84 2 1
19  . 6 1
19  . 4 1
20  . 3 0
21  . 5 0
22  . 1 0
22  . 3 1
22  . 5 1
23  . 3 0
23 63 3 0
23 65 4 0
23  . 4 0
23 63 5 0
23  . 3 0
23  . 4 0
23 95 4 1
23  . 6 1
23  . 4 1
23  . 5 1
24  . 2 0
24  . 2 0
24  . 5 0
24  . 2 1
24  . 3 1
27  . 3 0
27  . 2 0
27  . 2 0
27  . 2 0
27 63 3 0
27 63 5 1
27  . 6 1
27  . 3 1
27  . 2 1
27  . 4 1
28  . 3 0
28  . 4 0
28  . 3 0
28  . 3 1
29  . 1 0
29 78 6 0
29  . 3 0
29  . 6 0
29 63 5 1
29  . 4 1
29 57 7 1
29 63 3 1
33  . 3 0
33  . 3 0
end
label values STATEID STATEID
label def STATEID 1 "Jammu & Kashmir 01", modify
label def STATEID 2 "Himachal Pradesh 02", modify
label def STATEID 3 "Punjab 03", modify
label def STATEID 4 "Chandigarh 04", modify
label def STATEID 5 "Uttarakhand 05", modify
label def STATEID 6 "Haryana 06", modify
label def STATEID 7 "Delhi 07", modify
label def STATEID 8 "Rajasthan 08", modify
label def STATEID 9 "Uttar Pradesh 09", modify
label def STATEID 10 "Bihar 10", modify
label def STATEID 11 "Sikkim 11", modify
label def STATEID 19 "West Bengal 19", modify
label def STATEID 20 "Jharkhand 20", modify
label def STATEID 21 "Orissa 21", modify
label def STATEID 22 "Chhattisgarh 22", modify
label def STATEID 23 "Madhya Pradesh 23", modify
label def STATEID 24 "Gujarat 24", modify
label def STATEID 27 "Maharashtra 27", modify
label def STATEID 28 "Andhra Pradesh 28", modify
label def STATEID 29 "Karnataka 29", modify
label def STATEID 33 "Tamil Nadu 33", modify
label values CASTE GROUPS
label def GROUPS 1 "Brahmin 1", modify
label def GROUPS 2 "Forward caste 2", modify
label def GROUPS 3 "OBC 3", modify
label def GROUPS 4 "Dalit 4", modify
label def GROUPS 5 "Adivasi 5", modify
label def GROUPS 6 "Muslim 6", modify
label def GROUPS 7 "Christian, Sikh, Jain 7", modify

I was hoping for suggestions on how to go about doing this! After creating the most common occupation variable, I will generate another variable that identifies if an individual is working in the occupation that is mots common for his/her caste in that state. for that I will be using this:

Code:

gen works_common_occ = occupation == common_state_occ

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Generating new variable as a function of most common observation in another variable
Generating new variable as a function of most common observation in another variable

0 Response to Generating new variable as a function of most common observation in another variable

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Generating new variable as a function of most common observation in another variable Generating new variable as a function of most common observation in another variable

0 Response to Generating new variable as a function of most common observation in another variable

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Generating new variable as a function of most common observation in another variable
Generating new variable as a function of most common observation in another variable