Dear Statalist members,

I hope you are well.

I am seeking guidance in regards to the first investigation I will conduct using Stata - I have a few weeks off for Christmas and thought to self-teach myself some Econometrics/Statistics mainly using Dr Jeff Wooldrige's book. Interestingly, statistics/econometrics were not taught during my Economics undergrad!

I aim to carry out an investigation into participation of married women in the labour force and educational attainment, using the mroz dataset.

As participation in the labour force is binary, I am going to use the linear probability model - however, due to its limitations, I thought to include a probit model and compare the results.

Essentially, beyond simply interpreting the coefficients from each model, I am unsure what else to include during my little project. I wonder what tests would be appropriate and are needed?
- I know if it was a simple linear regression, I would conduct tests for heteroscedasticity, collinearity and the like but due to the differing assumptions of the LPM and probit are these still applicable?

The following is what I have completed so far, if anyone would be able to provide a brief comment on any major flaws or suggest any tests to run it would be much appreciated. I had hoped to share my project upon completion here too, but if anyone would like to view the project rather than simply this outline I could share it.

Introduction:
  • Focussed largely on the work of Becker and Mincer + a general discussion on why increasing educational attainment of women is important
Data description:
  • Pretty standard approach here, I have included a description and some summary statistics on the variables inlf, educ, exper, expersq, nwifeinc, age, kidslt6, kidsge6
  • I have also included a catplot depicting labour force participation by educational attainment.
Model 1:
  • Reg inlf educ, robust
  • Scatterplot to show how the outcomes are boolean (should I discuss how this shows assumption violation?)
Model 2:
  • Reg inlf educ exper expersq, robust
  • This model is one I am not so sure about, I had seen it used in other papers working with the mroz dataset. However, I fail to see why I should discuss expersq as opposed to the square of educ if I am looking for diminishing returns. However, generation of educsq made my coefficients statistically insignificant so I dropped it.
Model 3:
  • Reg inlf educ exper expersq nwifeinc age kidslt6 kidsge6, robust
  • Pairwise correlation on age and experience
  • Can I test for joint significance on solely kidsge6 and thus drop it as it is both statistically insignificant by itself and also insignificant upon using the 'test' command
Model 4:
  • probit inlf educ exper expersq nwifeinc age kidslt6 kidsge6, robust
  • mfx, at(mean)
  • Then compare probit result to LPM result.
Discussion of issues:
  • Mainly problem of OVB
  • Discussion of external validity.
I appreciate any replies immensely!