I am trying to look at glucose level variation based on another variable (var2) to see how var2 influences my dependent variable. I have a cohort of 1000 people but only need to take 300, i was thinking of making glucose level a normal distribution then taking 100 from those with glucose levels above 180 (high glucose group) and 100 from those with less than 50 (low group) and 100 from the "normal" range group. Now I really don't want my 300 to have another strong confounder or to be biased in anyway and in my survey there are lots of other variables i could look out to see if they're confouders, but i don't know where to start. I was thinking of doing a linear regression to see which variables influence glucose levels then try to ensure that the 300 i choose are matched by these variables so they dont influence the dependent variable too much. for instance for gender not to be a confounder, in each 100 I'm hoping to have 50 male and 50 females but also age distributions aren't even.

Any advice on how I can go about having a sample group which varies by glucose levels and var2 and is adjusted for other possible confounders? I really want to have a sample group where i could properly measure the influence of var2 without confounders being a big influence