I’m doing an analysis of applicants for grants over several years. In a given year, duplicate people have been dropped (those who submit more than one application in a given year). However, it is quite common for the same person to be found in multiple years, and the number of years in the data can vary across people.
The question I am trying to answer: is there a "statistically significant" linear trend over time in the percentage of females, % of males, and % unknown? Also, I’d like to show the regression trends in a graph with the confidence intervals. I realize that with having an unknown category, increases in the % females and % males over time need to be interpreted with caution.
Proposed set up: 3 separate logistic regression models. Outcome is 1) female (vs not female), 2) male (vs not male), 3) unknown (vs not unknown).
The explanatory variable is year, coded as: 1, 2, 3, etc. (use to determine the linear trend)
Question:
**1) does one need to account for the fact that the same person can be found in different years? For my purposes, I just want to know if the overall percentage increased over the years, regardless of whether some were the same people or not. Also, the outcome (gender) does not change over time within a given person. Therefore, it seems like my goal is maybe to treat them as independent but the data has some of the people in the same years. Can one do a regular logit does one need to do a GEE for example accounting for the panel data?
Note, question cross posted here (no replies as of now): https://stats.stackexchange.com/ques...-for-clusterin
Related Posts with to account for clustering or not to account for clustering?
Pooling the data from different data sourcesDear researchers, I am interested in studying specific factors across countries for the period from…
Which method to use to see differences?Hello! I have four variable ‘education’, year 2000, year, 2005, and year 2010 Education year 2000 …
Graph transparency with Stata14Hi all, I want to plot on the same graph the effect of a predictor for both conditional and uncondi…
Encode or sencode with specific reserved value labelsHi Statalist I am encoding a list of categorical variables that are common in four datafiles, one f…
How to create transition matrices for life satisfactionDear all, I have a balanced panel data of life satisfaction for 1607 individuals across three years…
Subscribe to:
Post Comments (Atom)
0 Response to to account for clustering or not to account for clustering?
Post a Comment