Problem with Survey data analysis, non-response, selection bias, use of paradata

Hi Statalisters, Greetings from Inddi

I am analysing a survey data and I am relatively new to this commonly used study design. I am seeking help on topics of survey weighting, selection bias and paradata.

Survey data setup

The survey was on doctors (registered under a particular program in the country) on the impact of the pandemic on health services. The sampling frame consists of 23,900 doctors covered by 3 agencies (Agency A, B and C). Under A, there are 13,400 doctors, Under B there are 6000 doctors and under C, 4500 doctors. Among these, 700, 1000 and 1100 doctors were randomly sampled from Agency A, B and C respectively (total sample= 2800). From the survey conducted on these 2800 doctors across the 3 agencies, response was received from 400 doctors from Agency A, 800 from agency B and 700 from agency C.
As per the above, I have assumed that this survey used a stratified random sampling at the agency level. The dataset I have (Data respondents) is on these 1900 doctors. Data is available on about 200 variables from the 1900 responders.

Data available on non-respondents and paradata

The central concern is non-responders and how to account for the ensuing bias as described below. The challenge I am facing with non-response analysis is that the data I have on the 900 non-responders are minimal. In the data set with the full 2800 doctors (Data full) the data I have common across responders and non-responders are only on their (1) agency (A, B or C), (2) qualification (3 category variable bachelors, specialization, super specialization) , and (3) province (5 category variable). Additionally, I also have paradata on the ‘number of attempts’ to contact the doctors (Attempt 1, 2 and 3) – Var 4. Reason for nonresponse among the 900 doctors is also recorded (reasons fall under 10 categories)

Analysis will involve estimating frequencies and proportions, and few regression models giving crude Odds ratio estimates. What is the practical way to analyse this survey data accounting for selection bias?

I give below a sample data set with 30 observations and few variables produced by -dataex-..The data structure below is that of respondents only.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id int dateofsurvey str1 agency str2 province str19 qualification byte numberofattemptstocontact str23 age byte opdload_ct str14 opdload_hilo str64 servicesb4c19 byte(services_tests_b4c19 services_meds_b4c20 services_ehealth_b4c21)
 1 22494 "B" "P2" "Superspecialization" 1 "30 - 45 years old"        10 "Same as before" "Testing, Providing  medication"                                   1 1 0
 2 22494 "B" "P2" "Superspecialization" 1 "30 - 45 years old"        10 "Higher"         "Testing, Providing  medication"                                   1 1 0
 3 22494 "C" "P1" "Superspecialization" 1 "Older than 61 years old"  20 "Lower"          "Testing, Providing  medication"                                   1 1 0
 4 22494 "C" "P1" "Superspecialization" 1 "30 - 45 years old"        20 "Lower"          "Testing, Providing  medication"                                   1 1 0
 5 22494 "C" "P1" "Superspecialization" 1 "30 - 45 years old"        40 "Same as before" "Testing, Providing  medication"                                   1 1 0
 6 22494 "B" "P2" "Superspecialization" 2 "46 - 60 years old"        30 "Same as before" "Testing, Providing  medication, Home consultation"                1 1 0
 7 22494 "B" "P3" "Superspecialization" 1 "46 - 60 years old"        15 "Same as before" "Testing "                                                         1 0 0
 8 22494 "B" "P3" "Specialization"      1 "30 - 45 years old"        20 "Higher"         "Testing "                                                         1 0 0
 9 22494 "B" "P3" "Superspecialization" 2 "46 - 60 years old"        25 "Lower"          "Other, please specify"                                            0 0 0
10 22494 "B" "P3" "Superspecialization" 1 "30 - 45 years old"        60 "Lower"          "Testing,  Other, please specify"                                  1 0 0
11 22494 "B" "P3" "Superspecialization" 2 "30 - 45 years old"        25 "Lower"          "Testing, Providing  medication"                                   1 1 0
12 22525 "C" "P1" "Superspecialization" 1 "30 - 45 years old"        25 "Same as before" "Providing medication, testing"                                    1 1 0
13 22525 "C" "P1" "Superspecialization" 1 "46 - 60 years old"        30 "Lower"          "Testing, Providing  medication"                                   1 1 0
14 22525 "C" "P1" "Superspecialization" 2 "30 - 45 years old"         3 "Lower"          "Providing medication, testing, Other, please specify"             1 1 0
15 22525 "C" "P1" "Superspecialization" 1 "30 - 45 years old"        10 "Lower"          "Providing medication, testing"                                    1 1 0
16 22525 "C" "P1" "Superspecialization" 1 "46 - 60 years old"        40 "Lower"          "Providing medication, testing"                                    1 1 0
17 22525 "C" "P1" "Superspecialization" 2 "46 - 60 years old"        10 "Lower"          "Providing medication "                                            0 1 0
18 22525 "C" "P1" "Superspecialization" 1 "30 - 45 years old"        10 "Lower"          "Testing, Providing  medication"                                   1 1 0
19 22525 "C" "P1" "Superspecialization" 1 "46 - 60 years old"        50 "Lower"          "Testing, Providing  medication"                                   1 1 0
20 22555 "B" "P2" "Superspecialization" 3 "30 - 45 years old"        30 "Same as before" "Testing, Providing  medication, Home consultation"                1 1 0
21 22555 "B" "P2" "Superspecialization" 1 "30 - 45 years old"        15 "Lower"          "Testing, Providing  medication"                                   1 1 0
22 22555 "B" "P2" "Superspecialization" 3 "30 - 45 years old"        20 "Lower"          "Testing, Other, please specify Providing  medication,  E-health " 1 1 1
23 22555 "B" "P2" "Superspecialization" 1 "Less than 30 years old"   20 "Lower"          "Testing, Providing  medication"                                   1 1 0
24 22555 "A" "P5" "Superspecialization" 1 "Less than 30 years old"   20 "Same as before" "Providing medication, testing, Other, please specify"             1 1 0
25 22555 "A" "P4" "Bachelors"           1 "30 - 45 years old"        20 "Higher"         "Testing "                                                         1 0 0
26 22555 "A" "P4" "Superspecialization" 3 "Less than 30 years old"   20 "Lower"          "Testing "                                                         1 0 0
27 22555 "A" "P4" "Specialization"      1 "Less than 30 years old"  100 "Lower"          "E-health "                                                        0 0 1
28 22555 "A" "P4" "Superspecialization" 3 "30 - 45 years old"         5 "Lower"          "Providing medication "                                            0 1 0
29 22494 "B" "P2" "Superspecialization" 1 "30 - 45 years old"        60 "Lower"          "Testing,  Other, please specify"                                  1 0 0
30 22555 "A" "P5" "Superspecialization" 1 "30 - 45 years old"        30 "Same as before" "Testing, Providing  medication, Home consultation"                1 1 0
end
format %tdnn/dd/CCYY dateofsurvey

The following are the codes I have started with (I am using StataMP 13 on Windows 10):

Code:

  
gen wt_strat=13400/400
replace wt_strat=6000/800 if agency=="B"
replace wt_strat=4500/700 if agency=="C"
gen FPC_Strata=1/wt_strat

Following this I ran the survey set command:

Code:

  svyset id [pweight=wt_strat], strata(agency) fpc(fpc_strat)

Code:

    pweight: wt_strat
         VCE: linearized
  Single unit: missing
     Strata 1: agency
         SU 1: id
        FPC 1: fpc_strat

Where ‘id’ is a variable specific for each doctor in the list.

Please correct me if I have gone wrong in the above steps assuming stratified random sampling. Or should strategies to account for non-response be incorporated in the above command lines?

Accounting for non-response

To account for non-response, I read about post stratification (in previous threads in Statalist, and literature) but I have data on only 3 variables across non-responders and responders. I also read that paradata can be used to account for non-response analysis (Kreuter F, Olson K. Paradata for Nonresponse Error Investigation. 2013). I have 1 paradata variable, "number of attempts to contact" specifying the number of times (maximum 3 attempts) a particular doctor was contacted to get a successful interview. But I do not know how to use this variable and in Stata or whether this variable is enough to account for bias.

Requesting your insights on the aforementioned.

Thank you!

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Problem with Survey data analysis, non-response, selection bias, use of paradata
Problem with Survey data analysis, non-response, selection bias, use of paradata

0 Response to Problem with Survey data analysis, non-response, selection bias, use of paradata

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Problem with Survey data analysis, non-response, selection bias, use of paradata Problem with Survey data analysis, non-response, selection bias, use of paradata

0 Response to Problem with Survey data analysis, non-response, selection bias, use of paradata

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Problem with Survey data analysis, non-response, selection bias, use of paradata
Problem with Survey data analysis, non-response, selection bias, use of paradata