Hello,

I'm new to using the ANOVA command (it is not common in my field). A reviewer for a journal submission asked me to use ANOVA as a robustness test to my main specification. It likely will not go in the final version of the paper, but I need to run it correctly in order to appease the reviewer. I think I've found a solution (and potentially a cause) to my issue, but I want to report the error here anyway because it seems like an issue with the ANOVA command itself. Even with my solution, it seems that this is the wrong error to be popping up, so I think the ANOVA command needs to be tweaked so that it provides the correct error. I'm using the current version of Stata SE, 15.1.

The basic problem is that I'm attempting to run ANOVA with panel data from an experiment. (I'll add specifics in the next paragraph.) When I first specify my repeated variable(s) using the repeated() option, the command is unable to automatically detect the bse ("between-subject error") term. I'm met with an error: r(421); could not determine between-subject error term; use bse() option. I then added the bse() option and tried it again. This time I was met with a different error: r(422); could not determine between-subject basic unit; use bseunit() option. So I then added the bseunit() option and tried it again. However, I was then met with the exact same error. It's telling me that it needs me to specify an option that I'm already specifying!



To give some background, my data is from an experiment. (I'll just use a subset that reproduces the error as a minimum working example, so these numbers don't actually represent my full experimental data.) I have 38 total subjects who played a game either 3 or 9 total times (unbalanced panel), for a total of 192 observations. Note that the dependent variable is actually a ratio of two other variables. For 20 total observations, the bottom variable used in the ratio is 0, and thus those 20 observations have a "missing" value for the dependent variable. Those 20 are (automatically) dropped from all regressions below, leaving n=172.

Subjects are nested within 6 total sessions (with subjects playing 3 repetitions each during the first 4 sessions, and subjects within the last 2 sessions playing 9 repetitions each). Subjects are indexed by the variable SubjectID, sessions are indexed by the variable SessionID, and repetitions of the game are indexed by the variable Period. Thus, I use xtset SubjectID Period at the beginning of my code (both for my standard analysis and for the ANOVA analysis).

I should note that because I limited the data to a minimum working example (eliminating all but 2 treatments), the session numbering skips around (SessionID equals 4, 5, 6, 8, 10, or 11). For the same reason, repetitions do not line up exactly across sessions. For example, for SessionID==4, Period may equal 10, 11, or 12, while for SessionID==10, Period may equal any integer from 1 to 9 (inclusive). The full data is located at the bottom of the post.

There are two treatments, "N" (baseline) and "D". Subjects who played 3 total repetitions played only 1 of the treatments, meaning that treatment is nested within subject (in turn nested within session) for them. Subjects who played 9 total repetitions played both treatments (with the order in which they played them flipped across the two relevant sessions), meaning that treatment is not nested within subject (nor session) for them. So overall, treatment is not nested within subject, nor even session. (I specify all of this because properly specifying nesting seems to matter with regards to getting these errors.)



The main regression specification in my paper uses the following code (again, edited down to provide a minimum working example, hence the "if" statements):
Code:
xtreg LiqPerc1_2_B TreatD if Agent==1 & TreatAV==1, re cluster(SessionID)
This runs a panel regression on the dependent variable LiqPerc1_2_B using a constant and a treatment dummy TreatD that equals 1 for treatment D and 0 for treatment N. The coefficient estimate for TreatD tells the marginal impact of switching from the baseline to treatment D, which is the estimate of interest. The re option specifies that the model include random effects at the subject level (SubjectID), and the cluster() option specifies that standard errors be clustered at the session level (SessionID).


The reviewer wants me to run an ANOVA model as an alternative specification to test for within-session issues. From reading through the ANOVA help text, the examples provided in the r.pdf file, and various online forums, I had thought that my corresponding ANOVA specification should be the following:
Code:
anova LiqPerc1_2_B TreatD / Period SessionID if Agent==1 & TreatAV==1, repeated(SessionID)
However, as previously mentioned, this produces the error r(421); could not determine between-subject error term; use bse() option. I then specified that option, using the following:
Code:
anova LiqPerc1_2_B TreatD / Period SessionID if Agent==1 & TreatAV==1, repeated(SessionID) bse(SessionID)
This time, I get a new error: r(422); could not determine between-subject basic unit; use bseunit() option. So I added that option, using the following:
Code:
anova LiqPerc1_2_B TreatD / Period SessionID if Agent==1 & TreatAV==1, repeated(SessionID) bse(SessionID) bseunit(SessionID)
This results in the same exact error. I'm getting an error code telling me that I need to specify the bseunit() option, even though I'm already specifying it. While I now believe that this is the incorrect specification, hence the fact that the model won't run, I believe that the specific error code needs to be changed to one that more accurately reflects the problem/provides instructions for a solution.



To that point, I essentially used trial and error to figure out what is hopefully the proper model specification. My final specification choice was the following:
Code:
anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(Period) grouping(SessionID)
Though, I should note that the grouping() command is not necessary to make it run; this will also run without errors:
Code:
anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(Period)



Ultimately, I guess that I have the following questions:
1. What is causing this error to appear, and can it be fixed?
2. Is my final specification correctly adapting my paper's main specification to the ANOVA model?


Thank you for your help.

Best,
Matt




PS, in case it helps with diagnosis, here is a list of some other codes I tried that resulted in the same (or similar) errors:
-anova LiqPerc1_2_B TreatD / SessionID if Agent==1 & TreatAV==1, repeated(SessionID) (r(422))
--anova LiqPerc1_2_B TreatD / SessionID if Agent==1 & TreatAV==1, repeated(SessionID) bseunit(SessionID) (r(422))
-anova LiqPerc1_2_B TreatD / SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) (r(422))
--anova LiqPerc1_2_B TreatD / SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bseunit(SessionID) (r(422))
--anova LiqPerc1_2_B TreatD / SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bseunit(SubjectID) (r(422))
-anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) (r(421))
--anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(SubjectID) (r(198); bse() term not found in model, though it seems to me that it is)
--anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(SessionID) (r(198); bse() term not found in model, though it seems to me that it is)
--anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bseunit(SubjectID) (r(421))
---anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(SessionID) bseunit(SubjectID) (r(198); bse() term not found in model, though it seems to me that it is)
---anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(SubjectID) bseunit(SubjectID) (r(198); bse() term not found in model, though it seems to me that it is)
--anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bseunit(SessionID) (r(421))
---anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(SessionID) bseunit(SessionID) (r(198); bse() term not found in model, though it seems to me that it is)
---anova LiqPerc1_2_B TreatD / Period SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(SubjectID) bseunit(SessionID) (r(198); bse() term not found in model, though it seems to me that it is)
-anova LiqPerc1_2_B TreatD / Period SessionID / SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) (r(421))
--anova LiqPerc1_2_B TreatD / Period SessionID / SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(SessionID) (r(422))
---anova LiqPerc1_2_B TreatD / Period SessionID / SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(SessionID) bseunit(SessionID) (r(422))
---anova LiqPerc1_2_B TreatD / Period SessionID / SubjectID|SessionID if Agent==1 & TreatAV==1, repeated(SubjectID SessionID) bse(SessionID) bseunit(SubjectID) (r(422))
-anova LiqPerc1_2_B TreatD / Period SessionID if Agent==1 & TreatAV==1, repeated(SessionID) (r(421))
--anova LiqPerc1_2_B TreatD / Period SessionID if Agent==1 & TreatAV==1, repeated(SessionID) bse(Period) (r(422))
---anova LiqPerc1_2_B TreatD / Period SessionID if Agent==1 & TreatAV==1, repeated(SessionID) bse(Period) bseunit(Period) (r(422))




PPS, here are the relevant variables for all 120 observations used in the minimum working example (that is, if Agent==1 & TreatAV==1). I saw on the "help" tab that posting .dta files is not recommended, so I figured I was supposed to post it long-form. I hope that it's not too long.

list SessionID SubjectID Period TreatD LiqPerc1_2_B in 1/120, separator(3)

Sessio~D Subjec~D Period TreatD LiqPer~B
-
1. 4 61 10 0 15
2. 4 61 11 0 4
3. 4 61 12 0 .
4. 4 64 10 0 .
5. 4 64 11 0 37.5
6. 4 64 12 0 .
7. 4 66 10 0 50
8. 4 66 11 0 100
9. 4 66 12 0 50
10. 4 68 10 0 50
11. 4 68 11 0 50
12. 4 68 12 0 45
13. 4 71 10 0 100
14. 4 71 11 0 .
15. 4 71 12 0 37.5
16. 4 72 10 0 50
17. 4 72 11 0 .
18. 4 72 12 0 52
19. 5 82 10 1 100
20. 5 82 11 1 100
21. 5 82 12 1 100
22. 5 83 10 1 30
23. 5 83 11 1 29.23077
24. 5 83 12 1 50
25. 5 85 10 1 87.5
26. 5 85 11 1 75
27. 5 85 12 1 100
28. 5 88 10 1 36.36364
29. 5 88 11 1 42.85714
30. 5 88 12 1 50
31. 5 90 10 1 33.33333
32. 5 90 11 1 .
33. 5 90 12 1 0
34. 5 91 10 1 20
35. 5 91 11 1 16.66667
36. 5 91 12 1 14.28571
37. 5 92 10 1 100
38. 5 92 11 1 50
39. 5 92 12 1 .
40. 5 95 10 1 22.22222
41. 5 95 11 1 18.18182
42. 5 95 12 1 28.57143
43. 6 101 10 1 0
44. 6 101 11 1 100
45. 6 101 12 1 47.61905
46. 6 102 10 1 12.5
47. 6 102 11 1 82
48. 6 102 12 1 .
49. 6 104 10 1 70
50. 6 104 11 1 85.71429
51. 6 104 12 1 83.33334
52. 6 107 10 1 25
53. 6 107 11 1 0
54. 6 107 12 1 30
55. 6 109 10 1 91.66666
56. 6 109 11 1 .
57. 6 109 12 1 92.85714
58. 6 111 10 1 .
59. 6 111 11 1 12.5
60. 6 111 12 1 25
61. 8 141 4 0 40
62. 8 141 5 0 52.94118
63. 8 141 6 0 42.85714
64. 8 143 4 0 25
65. 8 143 5 0 33.33333
66. 8 143 6 0 37.5
67. 8 145 4 0 71.42857
68. 8 145 5 0 100
69. 8 145 6 0 55.55556
70. 8 148 4 0 97.5
71. 8 148 5 0 8.333333
72. 8 148 6 0 6.666667
73. 8 149 4 0 50
74. 8 149 5 0 25
75. 8 149 6 0 40
76. 10 181 1 0 33.33333
77. 10 181 2 0 34.28571
78. 10 181 3 0 20
79. 10 181 4 1 35
80. 10 181 5 1 30
81. 10 181 6 1 40
82. 10 181 7 0 25
83. 10 181 8 0 15.09434
84. 10 181 9 0 25
85. 10 182 1 0 66.66666
86. 10 182 2 0 50
87. 10 182 3 0 50
88. 10 182 4 1 50
89. 10 182 5 1 57.14286
90. 10 182 6 1 50
91. 10 182 7 0 50
92. 10 182 8 0 60
93. 10 182 9 0 50
94. 10 184 1 0 30
95. 10 184 2 0 20
96. 10 184 3 0 20
97. 10 184 4 1 .
98. 10 184 5 1 40
99. 10 184 6 1 41.66667
100. 10 184 7 0 33.33333
101. 10 184 8 0 21.42857
102. 10 184 9 0 .
103. 10 187 1 0 50
104. 10 187 2 0 75
105. 10 187 3 0 66.66666
106. 10 187 4 1 50
107. 10 187 5 1 80
108. 10 187 6 1 66.66666
109. 10 187 7 0 60
110. 10 187 8 0 75
111. 10 187 9 0 83.33334
112. 10 188 1 0 66.66666
113. 10 188 2 0 50
114. 10 188 3 0 60
115. 10 188 4 1 55
116. 10 188 5 1 40
117. 10 188 6 1 .
118. 10 188 7 0 41.66667
119. 10 188 8 0 81.25
120. 10 188 9 0 0