Hello!

My goal is to study the effect of new feature introduction by online platform on the ratings reported by platform users for a sample of companies. The newly introduced feature became available to all platform users. Each user could self-select to (a) use the feature, and (b) provide ratings. The rated companies have no control over that feature's use by platform users. In other words, after the introduction of the feature the companies could have ratings reported by both types of users, i.e., those who chose to use it and those who chose not to.

The original data are collected at the user (review) level, which I aggregated at means by company id and year:

Code:
xtset
       panel variable:  id (unbalanced)
        time variable:  year, 2007 to 2019, but with gaps
                delta:  1 unit

xtdescribe

      id:  1, 2, ..., 762                                    n =        747
    year:  2007, 2008, ..., 2019                             T =         13
           Delta(year) = 1 unit
           Span(year)  = 13 periods
           (id*year uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       3       7         8         9       9      11

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+---------------
      165     22.09   22.09 |  ....111111111
      163     21.82   43.91 |  .....11111111
       84     11.24   55.15 |  ......1111111
       60      8.03   63.19 |  ....1.1111111
       35      4.69   67.87 |  .......111111
       23      3.08   70.95 |  ...1111111111
       20      2.68   73.63 |  ...........11
       20      2.68   76.31 |  .........1111
       20      2.68   78.98 |  ......1.11111
      157     21.02  100.00 | (other patterns)
 ---------------------------+---------------
      747    100.00         |  XXXXXXXXXXXXX
The new feature was introduced by platform in January of 2015; therefore, I create:
Code:
gen feature = (year >= 2015) & !missing(year)
And then estimate the feature's effect using the following model:
Code:
xtreg y feature control1 control2, fe vce(robust)
where control1 is the total number of yearly ratings and control2 is the total number of yearly ratings reported by users using the feature (=0 if year < 2015).

Does my approach seem appropriate to capture the effect of new feature introduction on the outcome? I would appreciate your feedback.