Hi all, I am working on a study of grant applications and selections. I have about 8 years worth of data, and am looking at Principal Investigators only. The goal is to see if characteristics of the PI predict the likelihood of getting a grant.

It is common for same person to apply for multiple grants, and be selected for multiple grants. The same person can apply for several grants within the same year, but I don’t have a more refined time unit than year. (so I don’t know the time order of the different grants in the same year)

Sample sizes:

about 8,000 applications
about 3,000 unique applicants

I am interested in a model predicting the probability of being awarded the grant. Currently I am running a model of applications clustered within applicants using the following code:

xtlogit awarded yearvariables independentvariables, i(applicant_id)

My questions:

1) does this sound like the correct model for the data structure, applications clustered within applicants? Specifically, I’m wondering if I need to account for the particular grant topic/area the person applied for, as some will have higher award rates than others. Additionally, within the same grant area, the same person can apply multiple times, and be selected multiple times. (So within the same grant area, the same person can get multiple awards). However, across the 8 years, there are about 150 year-grant area combinations, which seems like a lot. Seems too much to be included as dummy variables in the model. Could this be a cross classified model, with applications nested in applicants, but a given applicant is linked to multiple grants? (cross classified)?

2) is it a problem that the same person can apply multiple times in a year, and I don’t have a more refined time unit than year?

Any advice would be much appreciated!

Thank you!!

MJ