Dear all, I have calculated a “Diversity Index” for a given population. Per the census website, the DI: “the DI tells us the chance that two people chosen at random will be from different racial and ethnic groups….The DI is bounded between 0 and 1, with a zero-value indicating that everyone in the population has the same racial and ethnic characteristics, while a value close to 1 indicates that everyone in the population has different characteristics.” (I put the full equation below)

I’m running a fractional logit model with the DI as the dependent variable and year as the independent variable. I’d like to plot the trend line and 95% CIs around the trend. Here is my code. First, I use dataex to show the data; then I show the model with continuous year; then the model with categorical year.

Questions:

1) any obvious problems with this approach? In particular, I wasn’t sure if I need to make any adjustments to the fractional logit code for the fact this the same group of individuals over time, or maybe use a different approach to fractional logit.

2) better to include year as c.year or i.year? The plots look quite different.

I am using Stata 14.

Thank you!!!

Code:
******************************DATA
 dataex di_rev year_r

----------------------- copy starting from the next line ---------------------
> --
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(di_rev year_r)
.34123 1
.35147 2
.36345 3
.37255 4
.39094 5
.39714 6
.39895 7
end

------------------ copy up to and including the previous line ----------------
> --

Listed 7 out of 7 observations
Code:
************************************OPTION 1: WITH CONTINUOUS YEAR

. fracreg logit di_rev c.year_r

Iteration 0:   log pseudolikelihood = -5.3012582  
Iteration 1:   log pseudolikelihood = -4.6198733  
Iteration 2:   log pseudolikelihood = -4.6196722  
Iteration 3:   log pseudolikelihood = -4.6196722  

Fractional logistic regression                  Number of obs     =          7
                                                Wald chi2(1)      =     163.74
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -4.6196722               Pseudo R2         =     0.0014

------------------------------------------------------------------------------
             |               Robust
      di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      year_r |   .0446101   .0034862    12.80   0.000     .0377772     .051443
       _cons |  -.6959255   .0097576   -71.32   0.000    -.7150501   -.6768008
------------------------------------------------------------------------------

. quietly margins, at(year_r=(1(1)7))

. marginsplot

  Variables that uniquely identify margins: year_r
Array


***********************************OPTION 2: WITH CATEGORICAL YEAR:

Code:
.
. fracreg logit di_rev i.year_r
note: 7.year_r omitted because of collinearity

Iteration 0:   log pseudolikelihood = -5.3011755  
Iteration 1:   log pseudolikelihood = -4.6196655  
Iteration 2:   log pseudolikelihood = -4.6194615  
Iteration 3:   log pseudolikelihood = -4.6194615  

Fractional logistic regression                  Number of obs     =          7
                                                Wald chi2(0)      =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -4.6194615               Pseudo R2         =     0.0015

------------------------------------------------------------------------------
             |               Robust
      di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      year_r |
          2  |   .0452339   1.05e-11  4.3e+09   0.000     .0452339    .0452339
          3  |   .0973966   6.01e-11  1.6e+09   0.000     .0973966    .0973966
          4  |    .136525   1.20e-10  1.1e+09   0.000      .136525     .136525
          5  |   .2144551   2.28e-10  9.4e+08   0.000     .2144551    .2144551
          6  |   .2404216   2.45e-10  9.8e+08   0.000     .2404216    .2404216
          7  |   .2479757   2.47e-10  1.0e+09   0.000     .2479757    .2479757
             |
       _cons |  -.6578177   1.96e-13 -3.4e+12   0.000    -.6578177   -.6578177
------------------------------------------------------------------------------

. quietly margins i.year_r

. marginsplot

  Variables that uniquely identify margins: year_r
Array







----------------------------------------------------------------------------------------------------------

FYI, DIVERSITY INDEX EQUATION BELOW:





Diversity Index Equation



DI = 1 – (H² + W² + B² + AIAN² + Asian² + NHPI² + SOR² + Multi²)



H is the proportion of the population who are Hispanic or Latino.

W is the proportion of the population who are White alone, not Hispanic or Latino.

B is the proportion of the population who are Black or African American alone, not Hispanic or Latino.

AIAN is the proportion of the population who are American Indian and Alaska Native alone, not Hispanic or Latino.

Asian is the proportion of the population who are Asian alone, not Hispanic or Latino.

NHPI is the proportion of the population who are Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino.

SOR is the proportion of the population who are Some Other Race alone, not Hispanic or Latino.

MULTI is the proportion of the population who are Two or More Races, not Hispanic or Latino.



Source: https://www.census.gov/library/visua...20-census.html