Dear all.
I am looking for wise advice on what happened to my mixed effect model from the STATA experts here.

I compared the mixed-effect model with a random intercept (Model B) with one that does not have a random intercept (Model A) (Model A might be identical with OLS).
The data I used is the longitudinal data that has 1,272 respondents, 35 waves, and 22,777 recodes in total.
It is unbalanced data: someone has 35 spells while some respondents only participated only two waves(2 spells).
I reshaped this data into the person-age form. Age ranges from 20 to 60 years old.
The dependent variables is the physical strenuousness of work (PSW).
the data is looks like this:
Code:
        +---------------------------+
        |         id   age        PSW |
        |---------------------------|
    43. |    10037    23      73.25 |
    44. |    10037    24      73.25 |
    45. |    10037    25      73.25 |
    46. |    10037    26      73.25 |
    47. |    10037    27     75.375 |
        |---------------------------|
    48. |    10037    28     75.375 |
    49. |    10037    29     75.375 |
    50. |    10037    30     75.375 |
    51. |    10037    31     75.375 |
    52. |    10037    32   22.43056 |
        |---------------------------|
    53. |    10037    33   22.43056 |
    54. |    10037    34   22.43056 |
    55. |    10037    35   22.43056 |
    56. |    10037    36     75.375 |
    57. |    10037    37     75.375 |
        |---------------------------|
    58. |    10037    38     75.375 |
    59. |    10037    39   22.43056 |
    60. |    10037    40   22.43056 |
    64. |    10037    50      37.25 |
    65. |    10037    52      37.25 |
        |---------------------------|
    76. |    10038    48   31.89583 |
    77. |    10038    49   31.89583 |
    78. |    10038    50     20.075 |
    79. |    10038    51     20.075 |
    80. |    10038    53     20.075 |
        |---------------------------|
    81. |    10038    55     20.075 |
My two models are:

Model A) mixed psw age c.age#c.age c.age#c.age#c.age, mle
Model B) mixed psw age c.age#c.age c.age#c.age#c.age || id: , mle

The results table for each model is
Code:
Model A
Mixed-effects ML regression                     Number of obs     =     22,777

                                                Wald chi2(3)      =     580.52
Log likelihood =  -102576.8                     Prob > chi2       =     0.0000

-----------------------------------------------------------------------------------
              psw |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
              age |  -3.906402   .6288462    -6.21   0.000    -5.138918   -2.673887
                  |
      c.age#c.age |   .0742086   .0162178     4.58   0.000     .0424222     .105995
                  |
c.age#c.age#c.age |  -.0004612   .0001344    -3.43   0.001    -.0007245   -.0001978
                  |
            _cons |   108.8749    7.81452    13.93   0.000     93.55875    124.1911
-----------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
               var(Residual) |   477.7911   4.477182       469.096    486.6473
------------------------------------------------------------------------------

Model B

Mixed-effects ML regression                     Number of obs     =     22,777
Group variable: idintnum68                      Number of groups  =      1,272

                                                Obs per group:
                                                              min =          1
                                                              avg =       17.9
                                                              max =         35

                                                Wald chi2(3)      =     589.43
Log likelihood = -92923.381                     Prob > chi2       =     0.0000

-----------------------------------------------------------------------------------
              psw |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
              age |  -2.768387   .3943091    -7.02   0.000    -3.541218   -1.995555
                  |
      c.age#c.age |    .050902   .0101317     5.02   0.000     .0310442    .0707597
                  |
c.age#c.age#c.age |  -.0002928   .0000837    -3.50   0.000    -.0004568   -.0001288
                  |
            _cons |   91.63064     4.9491    18.51   0.000     81.93058    101.3307
-----------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
idintnum68: Identity         |
                  var(_cons) |   317.9105   13.30189      292.8796    345.0806
-----------------------------+------------------------------------------------
               var(Residual) |   170.1462   1.640699      166.9607    173.3925
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 19306.83      Prob >= chibar2 = 0.0000


And I plotted the predicted scores from both models. Also, I put the observed mean of the dependent variable for the sake of comparison.
in the attached graph, red dots are for the observed mean values, the blue line is from Model B, and the gray line from the Model A
As shown in the attached graph, the estimated dep variable is differed by whether I added random intercept or not.
It totally makes sense that adding random intercept makes a difference.

My issue is
why the Model B's estimated scores is lower than one from Model A at 20 yrs?
why the Model B's estimation scores become higher than Model A around 30-year-old and become wider by aging?
May these differences be due to something related to my raw data such as the unbalanced structure?

My concern is that the Model with the random intercept might be biased cause the increasing pattern of PSW after 40s (the gray line) seems not to make sense and not fit the observed pattern.
Array



Any kind of comments will be greatly helpful.