Dear Statalist Community,
I am looking for a command that tells me how many observations are lost with each variable that my regression contains.
Let's say my regression would look like this:
reg goals rankdiff teamvalue coachexperience weather
And let's assume I'd work with a dataset containing several similarly defined variables and I am looking for the ones that leave me with the highest number of observations.
So far, I have played around with excluding single variables and see how the observations react and which combination of variables within the regression may cause the most significant drop in observations.
I am imagining a command that gives me s.th. like:
------------------
1. goals - 300k observations left (=100%)
2. rankdiff - 280k observations left
3. teamvalue - 270k observations left
4. coachexperience - 110k observations left
5. weather - 100k observations left
------------------
In this scenario, I would then proceed to look for a good replacement for "coachexperience", as it seems to have too many missing values in data rows where the other variables contain values.
The real dataset and the regression are bigger than this example and finding out which variables decrease the overall observations the most is more tedious.
I would appreciate any help regarding this matter.
Thank you very much,
Björn.
Related Posts with Indicate loss of observations by variable
Output t-test resultsHi, I performed t-tests for many variables in my dataset. I have written code as following: foreac…
How to create a new variable that contains the total frequencies of only specific values of another variable?Hello community, I'm trying to create a new variable "ER" (employment rate) from another variable L…
Question of replaceMy dataset contains a varible of sex, the formate of which is string and the value of which contain …
Questions on event study and estimation windowHi, I have two datasets below that will be used to run the regression. The first dataset has the e…
Calculating Age from two variablesHello everyone, I have a dataset like this- input str4 hsn double cdob str10 bqdate_1 "1004" 19251 …
Subscribe to:
Post Comments (Atom)
0 Response to Indicate loss of observations by variable
Post a Comment