Hi all, I am working on a study of grant applications and selections. I have about 8 years worth of data, and am looking at Principal Investigators only. The goal is to see if characteristics of the PI predict the likelihood of getting a grant.
It is common for same person to apply for multiple grants, and be selected for multiple grants. The same person can apply for several grants within the same year, but I don’t have a more refined time unit than year. (so I don’t know the time order of the different grants in the same year)
Sample sizes:
about 8,000 applications
about 3,000 unique applicants
I am interested in a model predicting the probability of being awarded the grant. Currently I am running a model of applications clustered within applicants using the following code:
xtlogit awarded yearvariables independentvariables, i(applicant_id)
My questions:
1) does this sound like the correct model for the data structure, applications clustered within applicants? Specifically, I’m wondering if I need to account for the particular grant topic/area the person applied for, as some will have higher award rates than others. Additionally, within the same grant area, the same person can apply multiple times, and be selected multiple times. (So within the same grant area, the same person can get multiple awards). However, across the 8 years, there are about 150 year-grant area combinations, which seems like a lot. Seems too much to be included as dummy variables in the model. Could this be a cross classified model, with applications nested in applicants, but a given applicant is linked to multiple grants? (cross classified)?
2) is it a problem that the same person can apply multiple times in a year, and I don’t have a more refined time unit than year?
Any advice would be much appreciated!
Thank you!!
MJ
Related Posts with multilevel/ cross classified model question
converting unique string variables from two datasets that are characters and numbers to numeric variableHello, I currently have two datasets with a variable that is called scrambled_id. Scramble_id repre…
proportion of sales in each group accounted for by typeConsider the following example data identifier type sales unit 00.00.01 1 400 kg 00.00.01 0 5…
Endogeneity in Dynamic Tobit modelDear Statalisters, My panel data set contains T=6 observations for N=145 different countries (balan…
Data cleaning: Select all that applyHi! I have a Qualtrics survey with several questions that have "select all that apply" options. I wa…
Drop survey families where parents are not in specific survey roundsHi all, I would like to drop any family (as shown by their famid) when the "Head" and "Spouse" do n…
Subscribe to:
Post Comments (Atom)
0 Response to multilevel/ cross classified model question
Post a Comment