Problem with highly skewed independent variables and many zeros

Dear members,

I have a binary dependent variable and want to estimate either a logistic regression or a LPM. My key explanatory variable (a measure of exposure to specific media content) has many zeros, some medium values and few extremely high values. See this summary statistic of the explanatory variable as an example:

Code:

      Percentiles      Smallest
 1%            0              0
 5%            0              0
10%            0              0       Obs              21,169
25%            0              0       Sum of Wgt.      21,169

50%            0                      Mean           .3175061
                        Largest       Std. Dev.      1.190602
75%            0             21
90%     .8571429             23       Variance       1.417533
95%            2       25.28572       Skewness       8.337951
99%     5.428571       31.57143       Kurtosis       110.2057

As you see the maximum value is more than 20 times the standard deviation of the variable. Because there are many zeros I cannot log-transform the variable. Do you have any ideas in how far this is could be a probem? Should I use a transformation to the variable? Does it affect the choice between LPM and logistic model? E.g. is logistic regression more robust to skewed distriubtions?

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Problem with highly skewed independent variables and many zeros
Problem with highly skewed independent variables and many zeros

0 Response to Problem with highly skewed independent variables and many zeros

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Problem with highly skewed independent variables and many zeros Problem with highly skewed independent variables and many zeros

Related Posts with Problem with highly skewed independent variables and many zeros

0 Response to Problem with highly skewed independent variables and many zeros

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Problem with highly skewed independent variables and many zeros
Problem with highly skewed independent variables and many zeros