Dear all,

first and foremost a happy new year to all of you!

In my current research project, I am analyzing the use of language in Twitter and its effects on an financing event.

My data is structured as follows:
1. For all observations, I have twitter language values for a certain period.
2.This period starts with the first financing event and either ends with the second financing event or, in case that no second event occurred (i.e., the data is right-censored) the period ends with the last recorded tweet.
3. What do I want to find out? Does the language used on twitter affect the probability/reduce the time that a second financing event occurs?
4. I set up my data using the following code:

stset TimeToSecond, failure(SecondFunding)

TimeToSecond are the days counted between first and second event (or last tweet in case no event happened). SecondFunding is my financing event and coded 0/1 (1=happened, 0=did not happen in the considered period).

5. Now estimating the effects with the cox-model:

stcox languageVariables* controlVariables*

What's my problem?

The results I get are meaningful. Nevertheless, I think I have the problem that the proportionality assumption is not true for my data. To test the proportional hazards assumption, I re-estimated my models. I interacted the independent variables with my Time-Variable (as suggested in a teaching book).

stcox languageVariables* controlVariables*, tvc(languageVariables* controlVariables*) texp(TimeToSecond)

The result of this estimation is, that some of the interactions of my control variables are significant, which is a sign for disproportionality (according to the book).

My question would now be:
Is it actually a problem, if only some of the control variable interactions are significant, but the explaining variables (languageVariables*) used are not significant? What alternatives are there to make it correctly (i.e. something like "disproportional hazards"?).


Best regards and stay healthy