Dear STATA Seniors and Pros,

I seek your guidance in regard to a few confusions that I am currently facing in analyzing my Master's Thesis-work in STATA. Your valuable guidance would be a significant contribution in my Thesis work.

I would begin with describing the context of my problem, and would later outline my specific problems.

CONTEXT

I am currently working on my Master's Thesis entitled "Explaining Cross-Province Differentials in Child Nutritional Outcome in Nepal: An Application of Quantile Regression-Counterfactual Decomposition". I am working on the Demographic Health Dataset 2016.

The specific research questions that my Thesis strives to answer are the following:

i. Are cross-province differentials in Child Nutritional Outcome (in Nepal) explained by difference in Endowments? [covariate effect]
a. Which specific endowment most explains the differential?
ii. Are cross-province differentials in Child Nutritional Outcome (in Nepal) explained by difference in Returns to Endowments? [coefficient effect]
a. The return to which specific endowment(s) most explains the differential?

I would be conducting pair-wise comparison of Provinces in Nepal. (There are total 7 provinces, 1 of the Provinces with lowest prevalence of child stunting will be selected as a reference group; and the other six would be compared with the reference Province). The pair-wise comparison follows the Analytical method followed by (Cavatorta, Elisa; Shankar, Bhavni; Flores-Martinez, Artemisa, 2015) in their study of "Explaining Cross-State Disparities in Child Nutrition in Rural India". However, I depart from the decomposition method (Machado and Mata, 2005) that they have used for the reason mentioned below.

After careful consideration of various approaches of decomposition methods, I found that two approaches have been mostly used in the previous literature to answer similar research questions: Machado and Mata (2005)'s method of simulating counterfactual distribution and subsequent decomposition, and Firpo. et. al. (2018) Unconditional RIF Quantile Regression method.

Since Machado and Mata (2005) decomposition does not allow detailed decompsition, and I am interested in detailed decomposition of covariate and coefficient effect, I have further found that Firpo. et. al. (2018) most suits my research question. The method has also been used in a previous study by (Srinivasan, Chittur S.; Zanello, Giacomo; Shankar, Bhavani, 2013) in explaining the Rural-urban disparities in child nutrition in Bangladesh and Nepal with similar research questions as mine.


PROBLEM

I have been trying to educating myself on running first-and-second estimation stages of RIF-Regression, and have been trying to grasp a number of STATA commands (rifreg, oaxaca8, oaxaca_rif, dfl) in that regard.

However, I am facing a number of confusions, and hence, I seek your guidance in regard to the following questions. I would be extremely grateful if you could please guide me through this.

1. First and Second Stage Estimation

In the previous study (Srinivasan, Chittur S.; Zanello, Giacomo; Shankar, Bhavani, 2013), the authors have used Kernel Smoothing techniques and Kernel estimation methods to form a counterfactual distribution. However, I am finding it extremely confusing in regard to how should I proceed that with rifreg command.

- Should I first estimate counterfactual distribution using Kernel estimation methods, and then use that estimate as 'rifvar' in rifreg command? If yes, how would you recommend me to proceed with the Kernel estimation method?

- I also tried using "dfl" command developed by Joao Pedro Azevedo (2005) that estimates DiNardo, Fortin and Lemieux (DFL) Counterfacual Kernel Densities.
Firpo et. al. (2009) have indicated that kernel density estimation of counterfactual could follow the DFL method. However, I am a bit confused on how can I get "dfl" to compute kernel estimates [and not the logit estimate]. I am also confused if "adaptive kernel estimate" means the usual kernel estimate that I am looking for.


2. Model selection

Firpo et. al. (2009) as well as the previous study (Srinivasan, Chittur S.; Zanello, Giacomo; Shankar, Bhavani, 2013) suggest that the model selection involves minimizing the differences between the counterfactual distribution and the empirical distribution (of the group, whose covariate distribution has been used in the estimation of counterfactual distribution).
I am a bit confused how will I be best able to test for these differences (between counterfactual and empirical distribution) in STATA.
I suppose there are some statistical tests in using kernel density estimation, but I'm a bit unfamiliar of the command in STATA.


3. Demographic Health Survey Design

I tried searching for some earlier posts on how could we possibly account for two-level complex survey design while using rifreg. Bootstrapping was mostly suggested. However, I am a bit confused on how to use bootstrapping.
-Should I use bootstrapping in both first and second stages of RIF-regression?
-How can I possibly determine what value of bootstrap reps should I choose?

4. Percentage values of the contribution of covariate and coefficient

I intend to present my results in a table with relative percentage contribution of each covariate and coefficient effects, the same way as (Srinivasan, Chittur S.; Zanello, Giacomo; Shankar, Bhavani, 2013) have presented their results on Page 11 of their report. I would be grateful if you could please guide me on how could I achieve that.


I would be grateful for your valuable guidance.

Thank you.

Gopal Trital
Erasmus Mundus Master's Scholar in International Development Studies

References

Cavatorta, Elisa, Shankar, Bhavni, and Flores-Martinez, Artemisa, ‘Explaining Cross-State Disparities in Child Nutrition in Rural India’, World Development, 76 (2015), 216–37.

Firpo, Sergio, Fortin, Nicole M., and Lemieux, Thomas, ‘Unconditional Quantile Regressions’, Econometrica, 77/3 (2009), 953–73

Firpo, Sergio, Fortin, Nicole, and Lemieux, Thomas, ‘Decomposing Wage Distributions Using Recentered Influence Function Regressions’, Econometrics, 6/2 (2018), p.12-13.

Joao Pedro Azevedo, 2005. "DFL: Stata module to estimate DiNardo, Fortin and Lemieux Counterfactual Kernel Density," Statistical Software Components S449001, Boston College Department of Economics, revised 21 Dec 2010.

Machado, José A. F., and Mata, José, ‘Counterfactual decomposition of changes in wage distributions using quantile regression’, J. Appl. Econ., 20/4 (2005), p. 445–65.

Ministry of Health, Nepal; New ERA; and ICF. 2017. Nepal Demographic and Health Survey 2016. Kathmandu, Nepal: Ministry of Health, Nepal.

Srinivasan, Chittur S., Zanello, Giacomo, and Shankar, Bhavani, ‘Rural-urban disparities in child nutrition in Bangladesh and Nepal’, BMC public health, 13 (2013), 581.