Hi everyone,

I hope I'm posting this in the right place.

Here is my question :

I've always heard by my teachers that when you use an explanatory variable that is qualitative (binary or more), each modality of this variable must represent at least 5% of the total population. But what happens if one doesn't ? What if one of the modalities represents less than 5% of the total sample ?

I remember something like "stadards errors are greater, hence the robustness of the estimated coefficient is poorer..".

But is it that bad ? Even if my modality has A LOT of observations (like 1000, 10 000, 100 000) but is still under those 5% of representation ?

Thanks you very much for your help and guidance.

Jordan.