Hello,

I want to draw a ssample (with certain mean/ distribution) from a larger dataset that has different mean/ distribution. For example, say in my larger data set has mean age is 40y and 89% were men. I want to draw a smaller sample with mean age 53y and 49% men. How do I do that?

Sharing some background context:

I have data collected under a community-based diabetes screening program. This screening was done using telemedicine equipped mobile medical van. Patients who were diagnosed with diabetes or at risk of diabetes complications were referred to a rural diabetic center for follow-up care.

Apparently, the rural diabetic center caters to many other patients not referred to by the van.

Unfortunately, the patients who were referred to the center were given a new unique ID and there is no way of identifying those screened in the van from the follow-up data recorded in the center.

That said, I want to draw a sample population from the follow-up data in a way that the baseline characteristics of the sample (i.e. health profile of patients who visited the center the first time) match the baseline characteristics of those screened. This way the sample drawn from the follow-up data will be representative of the screened population.

This will allow me to understand the long-term effect of care provided in the diabetes center to those screened in the mobile-medical van. I want to measure the added value of running sreenig drives suing mobile medical units as compared to routine care.

I understand that this is not an ideal way; however, due to data paucity on similar delivery care models, I don't have an alternative.

Thanks in advance.

Best,
Preeti