hello.

What are the keywords to find literature related to the following?

"The basic theory of inference from linear regression is based on the assumption that the residuals are normally distributed. But in fact there is a vast literature establishing that the inferences are pretty robust to violations of that assumption in a wide variety of circumstances." (from this post-> https://www.statalist.org/forums/for...-residual-term)

If you have any recommended literature, please let me know.

Because..

My data is a sample of 1,007,530 card payments. However, the data is unstructured and does not impose a separate identity for each user (for example, if a 20-year-old man bought 1 apple and 3 pieces of furniture, there are 2 rows for the payment variable). The residuals are also not normally distributed. I was wondering if I could run a multiple regression with the payment amount as the dependent variable and the population as the control variable.



Thank you.