Hello,

I have been using R but started learning STATA recently. To briefly introduce what I have been trying to do, I have some dataset, which I received as a homework in my statistics course last year. By reusing the dataset, I have been trying to rerun in the course by using STATA. That said, I already know what statistical results I should get, which makes easier for me to see whether I am going to the right direction.


(1) I am trying to regress two categorical variables and the interactions of the two on one dependent variable, as in "reg DV IV1##IV2."

The code I am trying to use are, "reg Score Condition##Experience" and "anova Score Condition##Experience."

Initially, two independent variables are categorical, coded as 0 and 1.


However, I have learned in my statistics class that centering categorical variables is always useful for interpretation purpose when I regress interactions. Thus, I recoded 0 and 1 into -1 and 1 and tried rerunning the code (by recoding the values, the mean of the variable is 0 implying that the variables are correctly centered). The new code I used is, "reg Score ConditionC##ExperienceC" and "anova Score ConditionC##ExperienceC." The alphabet C just indicates that the variables are centered. However, when I tried rerunning the code, I got the error saying "ConditionC: factor variables may not contain noninteger values." I further tried putting "i." in front of the centered variables as in, "reg Score i.ConditionC##i.ExperienceC." However, the code did not still work, and I still got the same error message.

Based on the above and by searching through the forum, I got this first question and would like to confirm my understanding: "Am I not allowed to have negative values in a factor variable? My rationale for this question is simple. To me, both 0 and 1, and -1 and 1 can imply YES or NO. However, STATA seems not to allow negative values in a categorical variable. Am I correct?


(2) Again, by searching through the forum, I changed my code by putting 'c.' in the front. For example, "reg Score c.ConditionC##c.ExperienceC." Then this worked well, and I got the statistical result I should have gotten (As I mentioned, I already have the HW key, so compared my STATA result with the HW Key).

Here comes my second question. If I put 'c.' in front my categorical variable, which is coded as -1 and 1, how does STATA interpret the code? Simply as a continuous variable? How can it interpret a categorical variable as a continuous variable?


Thank you for your help in advance!