Dear Statalists,
as many others, I’m currently struggling with setting up my data for a panel model analysis as this was suggested by a reviewer. Before, all analyses were done via OLS). The main issue lies in the correct definition of cross section and time variables as this is quite ambiguous in my case. Therefore I’m looking at sales of movies on four different home video “channels” or “formats” over the course of the first 12 week after the respective movie’s home entertainment release. I.e., each movie is represented by 12*4 rows in my underlying file. As the movies were released at different dates over the course of six years, they are sorted by their home video release dates. I have prepared a sketch to illustrate how the numbers actually “happened”:

Array

Naturally, the underlying file looks a bit different as you can see in the next sketch:

Array


The analysis uses the weekly accrued home video sales per channel as dependent variable and includes various independents: time variant continuous factors (e.g., price in week t), time invariant continuous factors (e.g., Box Office revenue accrued during theatrical run), time invariant dummies (e.g., PG-13 Rating, or whether movie is a sequel or not).

My question is essentially: how can I make Stata understand that, for instance, a sale via Digital Purchase of Movie 1 in Week 1, happens simultaneously to a respective Blu-ray sale ?

If I set, for instance, Movie ID as the “panel variable” and week as time variable (xtset id_film week), Stata expectedly returns the “repeated time values within panel” error. Only when I set the “observation ID” (a sequential number representing the row) as the time variable and “channel” as panel variable, does the panel regression return comparable results to my prior OLS analyses.

Thanks for any input on how to correctly set this up!