Hello!

I am having some troubles with a particular regression. I have data from the labour force survey from 4 quarters (Jan 2019-Dec 19) and the end goal is to perform a Blinder-Oaxaca decomposition between the wages of males and females.

However before doing that, I wanted to run a simple wage regression. The variables I have included are: log hourly wage (dependent variable), number of years with current employer (YRSWEMPLOYER), number of dependendent children under the age of 16 (FDPCH16), occupation of respondent (OCCUPATION), industry the respondent works in (INDUSTRY), ethnicity (ETHNIC), region of UK (REGION), the quarter the data corresponds to (QRTR), the gender (SEX), the subject of degree if the respondent did a degree (DEGSUB), martial status (MARITALST), their degree classificsation (DEGCLS), age(AGEBAND), whether the respondent works full time (FTPT), if they work in the public or private sector (pubpriv) and the number of employees within their current firm (NOEMPLOY).

I have created a time dummy to represent the natural changes in wages overtime, and for all the categorical vairbales, I have created dummy variables for these also. The problem I am having is that the variables listed above aren't significant within my regression, which seems strange as these variables are clearly factors affecting wages, especially gender. As well as this, it is saying I have a few collinearity problems, mainly within INDUSTRY, which also is confusing me as I have ensured to drop one dummy variable per categorical variable.

Reference levels for my regression: AGEBAND (16-19), DEGCLS (Pass), ETHNIC (White), EDUC (GCSE or equivalent), FTPT (part time), REGION (London), MARITALST (Not Married), pubpriv (public), NOEMPLOY (Under 50), OCUPATION (Elementary Occupations), DEGSUB (Mathematics, Computing and Technology), INDUSTRY (Education), SEX (Male) and QRTR(Jan-March 2019).

Regression: (sorry I realise this looks horrible - is there a better way to format code in Statalist?)

reg LGHOURPAY AGEBANDdummy2 AGEBANDdummy3 AGEBANDdummy4 AGEBANDdummy5 AGEBANDdummy6 AGEBANDdummy7 AGEBANDdummy8 AGEBANDdummy9 AGEBANDdummy10 YRSWEMPLOYER FDPCH16 DEGCLSdummy1 DEGCLSdummy2 DEGCLSdummy3 DEGCLSdummy4 ETHNICdummy2 ETHNICdummy3 ETHNICdummy4 ETHNICdummy5 EDUCdummy1 EDUCdummy2 FTPTdummy1 REGIONdummy1 REGIONdummy2 REGIONdummy3 REGIONdummy5 REGIONdummy6 REGIONdummy7 REGIONdummy8 REGIONdummy9 MARITALSTdummy2 pubprivdummy1 NOEMPLOYdummy2 NOEMPLOYdummy3 OCCUPATIONdummy1 OCCUPATIONdummy2 OCCUPATIONdummy3 OCCUPATIONdummy4 OCCUPATIONdummy5 OCCUPATIONdummy6 OCCUPATIONdummy7 OCCUPATIONdummy8 DEGSUBdummy1 DEGSUBdummy3 DEGSUBdummy4 DEGSUBdummy5 DEGSUBdummy6 DEGSUBdummy7 DEGSUBdummy8 INDUSTRYdummy1 INDUSTRYdummy2 INDUSTRYdummy3 INDUSTRYdummy4 INDUSTRYdummy5 INDUSTRYdummy6 INDUSTRYdummy7 INDUSTRYdummy8 INDUSTRYdummy9 INDUSTRYdummy10 INDUSTRYdummy11 INDUSTRYdummy12 INDUSTRYdummy14 INDUSTRYdummy15 INDUSTRYdummy16 INDUSTRYdummy17 SEXdummy2 QRTRdummy2 QRTRdummy3 QRTRdummy4

Has anyone got any ideas of where I might be going wrong as to me these variables should be highly significant!

Thanks!