Hello all!

I am currently working on a research project and I have a dataset of bankrupt and non-bankrupt European companies.
First of all, before I can conduct this research I have to think about my methodology how I will evaluate the hypothesizes.
My knowledge in stata is still limited, any help will be highly appreciated.

I need to construct a default predication model using a number of variables.
I use R&D as independent variable and some control variables (age, size, leverage, liquidity, Z-score, industry, and acquisitions).
The model I use is the logit model.

My first basic-hypothesis is: R&D spending has a positive impact on the prediction of bankruptcy because it improves the financial performance of a firm.

The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + errorterm)
  1. X1 = R&D
  2. X2 = Leverage
  3. X3 = Size
  4. X4 = Liquidity
  5. X5 = Z-score
  6. X6 = age
  7. X7 = industry

The second hypothesis goes as follows: Age and firm size will reinforce the basic hypothesis. Together, the determinants will improve the prediction of bankruptcies.

The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + errorterm)
  1. X1 = R&D
  2. X2 = Size
  3. X3 = Age
  4. X4 = Liquidity
  5. X5 = Z-score
  6. X6 = leverage
  7. X7 = industry
How do I best evaluate this hyptothesis? Do I no longer control for age and firm size and use them as independent variables? What do you all recommend? What I have done here is put the variables age and size at the beginning of the regression. By doing this the variables are no longer seen as control variables but as independent variables.
By not controlling size and age, I can deduce whether the prediction model improves by looking at the significance of the variables and the adjusted R squared of the model. Here I'm not sure yet, can someone confirm please?


The final hypothesis: There is a non-linear U-shaped relationship between R&D spending and the probability of bankruptcy.

Here I include a quadratic term and check whether the quadratic term of R&D is significant. Is this a correct method? In this hypothesis age and size will be control variables as original.

The regression specification that I use is: 𝑃r (failure = 1) = 𝛽0 + 𝛽1X1 + 𝛽2𝑋2 + 𝛽3𝑋3 + 𝛽4𝑋4 + 𝛽5X5 + 𝛽6𝑋6 + 𝛽7𝑋7 + 𝛽8𝑋8 + errorterm)
  1. X1 = R&D
  2. X2 = R&D2
  3. X3 = Leverage
  4. X4 = Size
  5. X5 = Liquidity
  6. X6 = Z-score
  7. X7 = age
  8. X8 = industry
Thanks in advance.

Kind regards,
Chun H