Hi Stata users!

Stata 14.1. I am trying to assess if there is a difference in the ordinal ranks of most frequently diagnosed diseases between men and women, and across different seasons. From my dataex below, we have an id variable (srno), the binary sex variable (sex2: 0 = men; 1 = women), the categorical season variable (seas: 1 = winter; 2 = pre-monsoon; 3 = southwest monsoon; 4 = post-monsoon), and a categorical diagnosis variable, which lists the indicated condition of the individual. I cherry-picked my dataex to include entries from all four seasons.

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input int srno byte sex2 float seas str40 diag
4734 0 1 "Gastritis"
4735 1 1 "Upper Respiratory Tract Infection (URTI)"
4736 0 1 "Cough"
4737 1 1 "Cold"
4738 1 1 "Diarrhea"
6177 0 2 "SOB"
6178 0 2 "Upper Respiratory Tract Infection (URTI)"
6179 0 2 "Upper Respiratory Tract Infection (URTI)"
6180 0 2 "Cold"
6181 0 2 "Gastritis"
8089 1 3 "Gastritis"
8090 1 3 "Tinea"
8091 0 3 "Gastritis"
8092 0 3 "Fever"
8093 0 3 "Low Back Ache"
10669 0 4 "Tinea"
10670 0 4 "Icterus"
10671 0 4 "Cough"
10672 0 4 "Tinea"

I can quickly see which diagnoses are made most frequently, across sex and season, by using the tabsort command (ssc install tab_chi), e.g. below for sex, for the top five most frequent diagnoses

tabsort diag if sex2==0
RevisedDiagnosis Freq. Percent Cum.
Cough 1,477 12.05 12.05
Musculoskeletal Pain 1,327 10.83 22.88
Road Traffic Accident 1,023 8.35 31.23
Tinea 981 8.00 39.23
Cold 795 6.49 45.72
tabsort diag if sex2==1
RevisedDiagnosis Freq. Percent Cum.
Musculoskeletal Pain 530 12.10 12.10
Cough 490 11.19 23.29
Cold 349 7.97 31.26
Fever 322 7.35 38.62
Gastritis 279 6.37 44.99
Now, this is both a statistics question and a Stata-istics question (I apologise for the former). I believe I should use the Wilcoxon-Mann Whitney or Kruskal Wallis test to see if there is a difference between these two ranks, but what I don't understand is what form the data should take, in order to make these tests possible. I have successfully used other Statalist posts to create two new variables which list the rank of each diagnosis for males and females, but I don't understand how a single variable could contain the necessary information to make these tests possible. I don't discount I am making some error with regards to choice of test, dependent variable or otherwise. I appreciate any help possible!

Kind regards,

Harry