Choosing the right matching technique for conducting a diff in diff analysis

Hello everyone,

I am struggling with my project and hope to get some advice on the best way to proceed further. I looked through older topics but couldn't find the right answer, if I missed something please let me know.

My Data: I have a very large dataset of randomly sampled points (min. 100m apart to reduce spacial autocorrelation) (treated ca. 30.000, not treated ca 175.000) on forest cover and deforestation obtained from Hansen et. al. (2013) and related covariates from other sources. They are sampled from one treatment area and two control areas in proximity to each other. It includes factor (control) , binary (outcome) and continues variables. (see dataex sample in txt file)

My objective: I want to analyse the effectiveness of the program on the deforestation rate (fcover or floss) between the control and the treatment area via a Difference-in-Difference method. However, as the control and treatment areas differ significantly in their covariates I want to match the treatment points on the control area points. Therefore, I sampled a larger amount of points in the control area to increase the probability to find a good match. I only found a limited amount of variables for the pre-treatment period before 2005, as other sampled variables such as distance to closest town, road distance or population density, cannot be used as they were created after the baseline. Or is there another way to still make use of this information?

My approach so far: I ran a logit model on the binary treatment variable and predicted the propensity score through predict p:
xi: logit treatment forest_cover_2000 altitude slope aspect i.geology i.soil river_dist pa_edge prec_wettest_m prec_seasonaility temp_seasonality a_prec

I also used the psmatch2 command to conduct the propensity score matching, however this gave me the return error code 430 convergence not achieved.
xi: psmatch2 redd forest_cover_2000 altitude slope aspect i.geology i.soil river_dist pa_edge prec_wettest_m prec_seasonaility temp_seasonality a_prec, n(5) comm out( fcover_ ) trim(5) logit (for results see txt file)

When trying to use the pscore command I could not include any factor variables.
When using kmatch i get the error message that the propensity score estimation failed. xi: kmatch ps treatment forest_cover_2000 altitude slope aspect i.geology i.soil river_dist pa_edge prec_wettest_m prec_seasonaility temp_seasonality a_prec (fcover_), att

After this I used the diff command to estimate the effect size, however the output seems incorrect when I compare it to the summary statistics in which the mean value for fcover is 0.950 and 0.964 for treatment and control in the baseline year (see txt file).
diff fcover, p(endline) t(redd) cov(forest_cover_2000 altitude slope aspect river_dist pa_edge prec_wettest_m prec_seasonaility temp_seasonality a_prec) kernel id(id_all) pscore( _pscore ) report
Array

Questions:
After reading the critique on PSM matching by King and Nielsen (2019) I am quite unsure which matching technique is the right one to select in my case.
I am only familiar with PSM matching through university, however it seems like that this might not be the right approach in my case as most treatment observations received a very high score near 1 and most control observations a very low score.
Did I make a mistake when calculating the ps or is propensity score matching not the right matching technique to use here?
Which matching method would perform better in my case?

When calculating the treatment effect with the diff command I obtained incorrect means for the outcome variable. Did I implement the command wrong?
What would be the best way to estimate the ATT here? What is the best way to carry out the diff in diff?

I have a second output variable (floss), which is a categorial variable denoting the year forest loss happened (1-19). If I want to conduct the same analysis on this variable, do I need to transform the individual categories into dummy variables or is there a way to do that though the code (something like if floss=5)?

Sorry for this very long thread, I tried to include all the necessary information.
Any answer is appreciated.

Thanks for any useful comments.
Best,
David

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Choosing the right matching technique for conducting a diff in diff analysis
Choosing the right matching technique for conducting a diff in diff analysis

0 Response to Choosing the right matching technique for conducting a diff in diff analysis

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Choosing the right matching technique for conducting a diff in diff analysis Choosing the right matching technique for conducting a diff in diff analysis

0 Response to Choosing the right matching technique for conducting a diff in diff analysis

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Choosing the right matching technique for conducting a diff in diff analysis
Choosing the right matching technique for conducting a diff in diff analysis