I am writing a programme for the ml command. I found that even if a slight difference in my codes can make a huge difference in the number of iterations for convergence. I want to know why this is the case.

Take a simple example. I wrote the codes for OLS in the following three ways:

Code:
clear all
set obs 3000
gen  i=_n

gen u=rnormal(0,0.5)
forval i=1 (1) 3{
    gen x`i'=rnormal(`i'-1,`i')
}
gen y=2*x1+4*x2-0.5*x3+u

capture program drop myols
program define myols
    args lnf xb sigma
    tempvar sp1
   quietly{
    gen `sp1'=normalden($ML_y,`xb',`sigma')
    replace `lnf'=ln(`sp1')
   }
end

ml model lf myols (y: y = x1 x2 x3) (sigma:)
ml maximize

capture program drop myols2
program define myols2
    args lnf xb lnsigma
    tempvar sp1
   quietly{
    gen `sp1'=normalden($ML_y,`xb',exp(`lnsigma'))
    replace `lnf'=ln(`sp1')
   }
end

ml model lf myols2 (y: y = x1 x2 x3) (lnsigma:)
ml maximize

capture program drop myols3
program myols2
  args lnf xb lnsigma
  quietly replace `lnf' = ln(normalden($ML_y, `xb',exp(`lnsigma')))
end


ml model lf myols3 (y: y = x1 x2 x3) (lnsigma:)
ml maximize
These three programming methods should be equivalent (the only exception is the first programme reports sigma and the other two report ln(sigma)). However, the first programme takes 200+ iterations to converge, the second one takes 300 iterations and not converges, and the last one can converge very quickly. Can anyone explain to me why these three programmes have so many different results? Thanks!