Hello all,

I have a panel dataset of 3 million observations, with each observation detailing the annual number of prescriptions and patients for a physician for a given year for certain classes of medications. The years of available data are 2013-2017. There is also some physician demographic information including gender, specialty, years in practice, state, and number of group practice members.
For example, an observation may say "Dr. John Smith, year 2013, M, orthopedic surgery, 100 opioid prescriptions, 200 patients". The next row may say "Dr. John Smith, year 2014, M, orthopedic surgery, 80 opioid prescriptions, 170 patients". The code I used to define the panel data is

Code:
xtset npi year
(where npi is the provider id)

I am interested in using the xtpoisson command to look at the within provider change in opioid prescribing over time, with number of patients (bene_count) seen per provider per year as the exposure. The code that I have used is:

Code:
xtpoisson opioid_claim_count i.year i.provider_sex i.provider_state i.specialty i.years_experience i.group_members, exposure(bene_count) fe vce(bootstrap, reps(200)) irr 

Code:
opioid_claim_count |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
              year |
             2014  |   .9545114   .0002171  -204.70   0.000      .954086     .954937
             2015  |   .8897576   .0002035  -510.75   0.000     .8893589    .8901565
             2016  |   .8511887   .0001957  -700.96   0.000     .8508053    .8515723
             2017  |   .7981606   .0001862  -966.50   0.000     .7977958    .7985256

    ln(bene_count) |          1  (exposure)
------------------------------------------------------------------------------------
For the purpose of brevity, I did not copy and paste the results from the categorical variables, although all were omitted from the regression (except for provider_state) as they are time invariant (i.e. gender, specialty, etc).

I just wanted to confirm a few things:

1) Is it OK to use year as the main independent variable with total patients per year seen (bene_count) as the exposure to look at within provider change in opioid prescribing (dependent variable) each year?

2) If so, is the correct interpretation to say something like "opioid prescriptions relative to total number of patients decreased by ~ 5% each year per provider"

3) Is there a good way to use the margins command to get an accurate number of opioid prescriptions per provider? I've tried using the following code after the xtpoisson:

Code:
 
margins year, atmeans

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |
       2013  |   4.801408   .0134106   358.03   0.000     4.775124    4.827692
       2014  |   4.754852    .013413   354.50   0.000     4.728563    4.781141
       2015  |   4.684602    .013413   349.26   0.000     4.658313    4.710891
       2016  |   4.640287    .013413   345.95   0.000     4.613998    4.666576
       2017  |   4.575963   .0134132   341.15   0.000     4.549673    4.602252
------------------------------------------------------------------------------
But I'm not sure what to do with these margin numbers nor how to interpret them. Are they ln transformed? I've read some other threads on this forum that using margins after fixed effects to calculate a dependent count variable can be somewhat inaccurate. I'm happy to just use the IRR if this is the case.

Thank you!

Venkat