Hey Everyone,

I'm trying to analyse some data I've collected, and am having some trouble with setting up the specific model, interactions, and ensuring proper analysis, so I could really use some help! I've been getting some conflicting advice on how to run the model with interactions, and have rerun the model under a few different variations, and now don't know what is the correct set-up. I've tried to add as many details as possible.

Experimental set up:
I've completed a crossover training study in rugby players, involving 2 groups completing alternate 3-week back-to-back vertical and horizontal plyometric training cycles with a 1 week wash out in between, so that there are 4 time points for each group: Pre-mesocycle 1, (3 weeks), Post-mesocycle 1, (1-week break) Pre-mesocycle 2, (3 weeks) post-mesocycle 2. There was also a control group that didn't complete plyometric training, but still attended all 4 testing sessions. So that each training group was tested 4 times, twice for each direction (intervention). There was a fair amount of participants that couldn't attend all 4 tests so unfortunately we tried an ANOVA, but didn't work. So we've settled on a mixed model. They were tested on a 30 m sprint with timing lights and a radar for a Force-velocity profile including lots of variables. From my understanding I will rerun the models for each variable. For the rest of this, I am basing it off 30-m sprint time, so that lower numbers are preferable.

Data set up:
My data is set up in long form where both group (the order in which they did interventions), direction (the specific intervention for that time point) are coded 1 & 2
AJ is subject 1, group 1, timepoint 1 direction 1 (horizontal pre)
subject 1, group 1, timepoint 2, direction 1 (horizontal post)
subject 1, group 1, timepoint 3, direction 2 (vertical pre)
subject 1, group 1, timepoint 4, direction 2 (vertical post)

I've since added a prepost variable column to identify testing pre or post to be able to collapse direction data from the 2 groups. So now, group (the order in which they did interventions), direction (the specific intervention for that time point), and Prepost testing identification (pre or post) are both coded 1 and 2 so that:

AJ is subject 1, group 1, timepoint 1,Pre 1, direction 1 (horizontal pre)
subject 1, group 1, timepoint 2, Post 2,direction 1 (horizontal post)
subject 1, group 1, timepoint 3, Pre 1, direction 2 (vertical pre)
subject 1, group 1, timepoint 4, Post 2, direction 2 (vertical post)

Jarred is subject 13, group 2, timepoint 1, Pre 1, direction 2 (Vertical pre)
subject 13, group 2, timepoint 2, Post 2, direction 2 (Vertical post)
subject 13, group 2, timepoint 3, Pre 1, direction 1 (Horizontal pre)
subject 13, group 2, timepoint 4, Post 2, direction 1 (Horizontal post)

For the control group they'd be lablled the same except for group and direction to be 0 for all associated rows.

Model set up:
First I've graphed the median bands from all three groups across the 4 time points to get a visual understanding of what happened. There is clearly a time effect. (For clarity, what happened is group 1 improved, group 2 didn't/maybe got worse and the control group got worse-meaning there shouldn't be an intervention effect because they cancelled out. I'm wondering if there was a slight order effect being a short-washout or if fitness played into it maybe group 1 ended up 10kg lighter after a few dropouts from each group).

First commands:
mixed thirty i.group##c.timepoint i.direction##c.timepoint || subject:,
note: 2.direction omitted because of collinearity
note: timepoint omitted because of collinearity
note: 2.direction#c.timepoint omitted because of collinearity

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = 118.51323
Iteration 1: log likelihood = 118.51323

Computing standard errors:

Mixed-effects ML regression Number of obs = 113
Group variable: subject Number of groups = 32

Obs per group:
min = 2
avg = 3.5
max = 4

Wald chi2(7) = 12.67
Log likelihood = 118.51323 Prob > chi2 = 0.0805

---------------------------------------------------------------------------------------
thirty | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
group |
1 | .1490282 .0987514 1.51 0.131 -.0445211 .3425774
2 | .0566127 .0717661 0.79 0.430 -.0840462 .1972716
|
timepoint | .0214728 .0074621 2.88 0.004 .0068474 .0360982
|
group#c.timepoint |
1 | -.0557562 .0207406 -2.69 0.007 -.0964072 -.0151053
2 | -.0104264 .0188638 -0.55 0.580 -.0473988 .0265461
|
direction |
1 | -.0749454 .0702471 -1.07 0.286 -.2126271 .0627364
2 | 0 (omitted)
|
timepoint | 0 (omitted)
|
direction#c.timepoint |
1 | .0154717 .0251638 0.61 0.539 -.0338484 .0647918
2 | 0 (omitted)
|
_cons | 4.263815 .0484872 87.94 0.000 4.168782 4.358848
---------------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
subject: Identity |
var(_cons) | .0236984 .0061224 .0142828 .0393209
-----------------------------+------------------------------------------------
var(Residual) | .0027085 .0004255 .0019907 .0036852
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 148.16 Prob >= chibar2 = 0.0000


and then again without the control to look at an order effect: mixed thirty i.group##c.timepoint i.direction##c.timepoint if group!=0 | | subject:,

However in both cases, this returned multiple interactions omitted because of collinearity, specifically direction 2 and direction 2 by time point ( and similar for all variables investigated). So I've been told to run everything in the model, but something is always dropped due to collinearity
So then I was told to run the models separately to look at group and direction individually. So then the statistician at my school said to run

Treatment:

mixed thirty i.direction | | subject :,

Not Significant

Treatmentx time:

. mixed thirty i.direction##c.timepoint || subject :,
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 116.85474
Iteration 1: log likelihood = 116.85474
Computing standard errors:
Mixed-effects ML regression Number of obs = 113
Group variable: subject Number of groups = 32
Obs per group:
min = 2
avg = 3.5
max = 4
Wald chi2(5) = 8.95
Log likelihood = 116.85474 Prob > chi2 = 0.1111
---------------------------------------------------------------------------------------
thirty | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
direction |
1 | .0532367 .0703589 0.76 0.449 -.0846643 .1911377
2 | .0788422 .067155 1.17 0.240 -.0527792 .2104636
|
timepoint | .0214785 .0076075 2.82 0.005 .006568 .036389
|
direction#c.timepoint |
1 | -.0183004 .0157676 -1.16 0.246 -.0492042 .0126035
2 | -.0316995 .0150589 -2.11 0.035 -.0612145 -.0021845
|
_cons | 4.263799 .0487321 87.49 0.000 4.168285 4.359312
---------------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
subject: Identity |
var(_cons) | .0238062 .0061602 .0143361 .0395321
-----------------------------+------------------------------------------------
var(Residual) | .0028154 .0004424 .0020691 .0038309
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 145.52 Prob >= chibar2 = 0.0000

So clearly time point effect and direction by time point interaction. However, herein lies my question. For cross-over mixed models is time point a continuous variable or an identifier variable? I've gotten both answers. As I understand it If I'm looking at direction x timepoint- I'm comparing the direction across 4 time points when each group only measured twice, meaning I'm splitting up data from each group. So I've been told to set-up time as identifier instead in which case I get this

. mixed thirty i.direction##i.timepoint || subject :,
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 119.07527
Iteration 1: log likelihood = 119.07527
Computing standard errors:
Mixed-effects ML regression Number of obs = 113
Group variable: subject Number of groups = 32
Obs per group:
min = 2
avg = 3.5
max = 4
Wald chi2(11) = 13.98
Log likelihood = 119.07527 Prob > chi2 = 0.2343
-------------------------------------------------------------------------------------
thirty | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
direction |
1 | .0437501 .0741319 0.59 0.555 -.1015458 .189046
2 | .0545834 .0663056 0.82 0.410 -.0753732 .18454
|
timepoint |
2 | .032873 .0224673 1.46 0.143 -.0111621 .0769081
3 | .0560257 .0225284 2.49 0.013 .0118709 .1001806
4 | .061302 .0233951 2.62 0.009 .0154484 .1071556
|
direction#timepoint |
1 2 | -.0616232 .0342432 -1.80 0.072 -.1287387 .0054923
1 3 | -.0573523 .0776528 -0.74 0.460 -.2095489 .0948444
1 4 | -.028897 .0786847 -0.37 0.713 -.1831162 .1253223
2 2 | -.0289247 .031255 -0.93 0.355 -.0901833 .0323339
2 3 | -.0845294 .0779317 -1.08 0.278 -.2372727 .068214
2 4 | -.1155199 .0781867 -1.48 0.140 -.2687629 .0377231
|
_cons | 4.28 .0468851 91.29 0.000 4.188107 4.371893
-------------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
subject: Identity |
var(_cons) | .0237073 .0061227 .0142905 .0393293
-----------------------------+------------------------------------------------
var(Residual) | .0026713 .0004197 .0019633 .0036346
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 148.82 Prob >= chibar2 = 0.0000

Or I've thought to run timepoint as the pre-post comparison to collapse data from both groups in which case I've gotten:
. mixed thirty i.direction##i.prepost || subject :,
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 113.56704
Iteration 1: log likelihood = 113.56704
Computing standard errors:
Mixed-effects ML regression Number of obs = 113
Group variable: subject Number of groups = 32
Obs per group:
min = 2
avg = 3.5
max = 4
Wald chi2(5) = 1.96
Log likelihood = 113.56704 Prob > chi2 = 0.8547
-----------------------------------------------------------------------------------
thirty | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------------+----------------------------------------------------------------
direction |
1 | .0196508 .059204 0.33 0.740 -.0963868 .1356885
2 | .0189386 .0592142 0.32 0.749 -.0971191 .1349963
|
2.prepost | .0215539 .0175721 1.23 0.220 -.0128868 .0559946
|
direction#prepost |
1 2 | -.0173457 .0259909 -0.67 0.505 -.0682869 .0335954
2 2 | -.0290778 .0253103 -1.15 0.251 -.0786852 .0205295
|
_cons | 4.304372 .0462857 93.00 0.000 4.213654 4.395091
-----------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
subject: Identity |
var(_cons) | .0240023 .0062222 .0144409 .0398945
-----------------------------+------------------------------------------------
var(Residual) | .0030408 .0004777 .0022349 .0041372
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 141.11 Prob >= chibar2 = 0.0000

But then do I run the pre-post as an identifier or as a continuous?

Also to look the group interactions (which to me is the practical application, what happened in each group after 7 weeks) I want to compare against control and without control group to look at order effect? Is time point continuous or identifier since it's 2 mesocycles?

1. is timepoint a continuous or identifier variable
2. For direction, do I include all 4 timepoints (split group) or just 2 time points (Pre-post) or some other classification
3. group x time point using 4 timepoints for practical application
4. Order effect is group x time point without control group
5. Do I need to look at any groupx direction or other interactions? Have I set up things appropriately?

Thank you in advance! Appreciate any advice