Hi!

I'm conducting a study in which I have 3 controls per case, matched on age, cancer stage and cancer grade. I have a few categorical and a few continuous variables i want to compare between the groups. Most continuous variables are not normally distributed. Thus, I've understood that I have non-independent data and should go for non-parametric descriptive tests, thus the Wilcoxon sign-rank test. You can see an example below, the indicator variable being cohort and the variable I want to compare is weight_merge. My patients are overweight, so this is not normally distributed.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(cohort weight_merge)
1  54
0  54
1 120
1  50
1 101
1 158
0  74
0  53
1  77
1  80
1  58
1  98
1 129
0 106
0 138
1  70
1  76
1  72
1  72
0  59
0  81
1  64
1  90
0  63
1  62
1  49
1  64
1  74
0  79
1  53
1  58
1  73
1  66
1 110
0 103
0  70
1  75
1  80
1  68
1  96
1 116
0  95
1 113
1  91
1  89
1  72
1  80
1 100
1  86
1  86
1  50
1  86
1  87
1  95
1  77
1 109
1  60
1  63
1  92
0  69
1  67
0  72
1  62
1  75
0  47
1  92
0  85
0 104
1  75
1  80
0  75
1  58
1  61
1  85
1  91
0  96
1 130
1 116
1  73
0  62
1 150
1  53
0  56
0  73
1  69
1 114
1  50
1  81
0  60
1  77
1 119
1  68
1  62
1 150
1  95
1  72
1  98
1  98
1  66
1  77
end
label values cohort cohort
label def cohort 0 "Study", modify
label def cohort 1 "Control", modify


Now, with other "comparison of means" test, Stata allows the option by(groupvar), but not for -signrank-. In either case, even with -ttest-, specifying by(groupvar) assumes unpaired data, according to the help section.
It seems that Stata needs paired data to be ordered as measurement_pre and measurement_post in order to do paired comparisons (ttest measurement_pre == measurement_post) for all these types of test, but in my case that is unattainable since I have 3 controls per case and I cannot tie a certain control to a certain case and I have about 5 variables that I want to compare.

Is there a user-written command for this? Or, how big of an error could I introduce by instead just running for unpaired comparisons with ranksum weight_merge, by(cohort) even though my data is dependent due to the matching? Before matching they were independent, controls were taken from a large, completely separate set of patients.

This question was also posted on ResearchGate, and I will make sure to post any good answer there as well.

https://www.researchgate.net/post/Ho...d23584d80a347d

Best regards!

//Rasmus W Green, PhD student, Karolinska Institute, Dept. of Women's and Children's Health, Stockholm, Sweden.