Hi,
i have a dataset of nearly 500 observations (and more than 100 variables for each obs).
For each observation I have a dichotomous variable ("diagnosis", define or not defined) and to obtain a definite diagnosis i explored other four features. I have measured empirically (yes, case by case) the contribution of each variable and noted a remarkable difference among them. I'd like to translate my observation in a "statistical language". Probably i must perform a roc analysis and for each variable measure the number of cases in which it is sufficient to make define diagnosis.
Is it correct?

Nevertheless I have two problems:
- first of all, not all variables have been tested for each patient (in some patients only two, in other only three). I must select only patients in which have been performed all of my variables? Or can the differentiate basing on proportion of n. of diagnosis using that variable / number of patients in which that variable have been tested and comparing this proportions? How can i translate into "statistical language"?
- secondly, when i try yo import my dataset in STATA it is impossible: Unable to load excel data "Error: Unexpected attribute"

I hope i was clear.
Thank you in advance for help.