I am new here so feel free to ask if I incorrectly formatted anything!
First let me explain the research a bit to give you a general understanding of what I am trying to achieve. I obtained data from participants who designed an ice cream with different ingredients. They were randomised over 3 different groups: a Control Group, a Measurement Group and a Certificate Group. There were a total of 4 categories of ingredients: chocolate, nuts, fruit and other.
- In the Control Group I just asked participants to design a delicious ice cream.
- In the Measurement Group I asked people to design a delicious ice cream and I told them I would measure the amount of chocolate/nuts/fruit ingredients (one of those was shown).
- For the Certificate Group I I asked people to design a delicious ice cream & told them I would measure the amount of chocolate/nuts/fruit ingredients (one of those was shown) & I would give
them a certificate for the strategy (delicious ice cream).
However due to an error in the randomisation algorithm I received mostly chocolate observations and the nuts/fruits measurements were basically useless.
My results had one variable (treatment) that included overall in which group they were: a categorical variable that had the values 0 - Control Group, 1- Measurement Group, 2 - Certificate Group
I created dummy variables based on the ingredients that were measured, because nuts and fruit basically became useless because of too little observations I will show the chocolate dummies. I also created another dataset for these dummies that did not include the nuts & fruit observations.
ChocolateMeasurementGroupdummy: 1 - Chocolate Measurement Group, 0 - Control Group or Chocolate Certificate Group >>>> remember that the nuts/fruit observations were removed
ChocolateCertificateGroupdummy: 1 - Chocolate Certificate Group, 0 - Control Group or Chocolate Measurement Group
ChocolateControlGroupdummy: 1 - Control Group, 0 - Chocolate Certificate Group or Chocolate Measurement Group
my dependent variable is Chocolate Amount, which is a continuous variable with a min of 0 and a max of 5.
Now I am testing 2 things:
H1 The Measurement Group put significantly more chocolate in their ice creams than the Control Group
H2 The Measurement Group put significantly less chocolate in their ice creams that the Certificate Group
I need to know if H1 is true before I can test H2
I was told to use one-tailed and set an alpha of 10% (0.1) but the issue is that ANOVA, which would be most logical to use, cannot be used because of the one-tail because F-statistics are not designed for one-tailed. Instead, I created 2 separate datasets from the original chocolate-only dataset and ran t-tests using stata's ttest command.
- Dataset 1. included only Chocolate Measurement Group observations & Control Group observations
- Dataset 2. included only Chocolate Measurement Group observations & Certificate Group observations
Stata code:
In dataset 1. to test H1 I used:
Code:
ttest ChocolateAmount, by(ChocolateMeasurementGroupdummy)
In dataset 2. to test H2 I used:
Code:
ttest ChocolateAmount, by(ChocolateMeasurementGroup dummy)
Now Question 1: Does this make sense? Or should I still include every treatment group in both t-tests?
After this I wanted to perform regressions to test H1 and H2 further. I was thinking about using a normal linear regression and I know this is possible with dummy variables but I am cannot seem to wrap my head around how to do it so it makes sense. I was told I had a nested between-subjects design. Frankly, I have no idea how to do this in Stata. My teacher recommended using the following stata code and run this in the original chocolate-only dataset that only included the chocolate measurement observations and control group (so no nuts/fruits):
Code:
regress ChocolateAmount ChocolateMeasurementGroupdummy ChocolateCertificateGroupdummy
So Question 2: how do I perform a regression that answers my H1 & H2 and that I can interpret.
I am aware that there are very complex models and commands that do this, but I do not have enough statistical knowledge and background to use these. So the tools I have are basically simple regressions and t-tests.
Thank you very much for your help in advance!
0 Response to Nested between-subject design (non-binary independent variable) analysis in Stata
Post a Comment