Hello

This is something that has puzzled me for a while but I haven't pursued it because I use rstan for Bayesian modelling. However, Stata appears to have a lot of capability with Bayesian commands so I am keen to use it. I always seem to get wildly different answers with Stata which don't make sense so I'm obviously doing something wrong. I've had to reproduce a whole dataset here to illustrate the problem:

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input int time byte discoveries float exp_time
1860  5 1
1861  3 1
1862  0 1
1863  2 1
1864  0 1
1865  3 1
1866  2 1
1867  3 1
1868  6 1
1869  1 1
1870  2 1
1871  1 1
1872  2 1
1873  1 1
1874  3 1
1875  3 1
1876  3 1
1877  5 1
1878  2 1
1879  4 1
1880  4 1
1881  0 1
1882  2 1
1883  3 1
1884  7 1
1885 12 1
1886  3 1
1887 10 1
1888  9 1
1889  2 1
1890  3 1
1891  7 1
1892  7 1
1893  2 1
1894  3 1
1895  3 1
1896  6 1
1897  2 1
1898  4 1
1899  3 1
1900  5 1
1901  2 1
1902  2 1
1903  4 1
1904  0 1
1905  4 1
1906  2 1
1907  5 1
1908  2 1
1909  3 1
1910  3 1
1911  6 1
1912  5 1
1913  8 1
1914  3 1
1915  6 1
1916  6 1
1917  0 1
1918  5 1
1919  2 1
1920  2 1
1921  2 1
1922  6 1
1923  3 1
1924  4 1
1925  4 1
1926  2 1
1927  2 1
1928  4 1
1929  7 1
1930  5 1
1931  3 1
1932  3 1
1933  0 1
1934  2 1
1935  2 1
1936  2 1
1937  1 1
1938  3 1
1939  4 1
1940  2 1
1941  2 1
1942  1 1
1943  1 1
1944  1 1
1945  2 1
1946  1 1
1947  4 1
1948  4 1
1949  3 1
1950  2 1
1951  1 1
1952  4 1
1953  1 1
1954  1 1
1955  1 1
1956  0 1
1957  0 1
1958  2 1
1959  0 1
end
This is a list of important scientific discoveries made each year from 1860 to 1959. It's from problem 10_3_1 in Ben Lambert's book "A student's guide to Bayesian statistics". I'm trying to simply model this as a Poisson mean to compare Stata and rstan.

Rstan with a gamma prior on the mean of the Poisson likelihood gives a sensible answer of 3.11 or 3.12. I've reproduced the code and the results for this here for those unfamiliar with rstan:

Code:
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)


model_string <- "
data {
int n1;
int y1[n1];
}

parameters {
real<lower=0> lambda1;
}

model {  
lambda1 ~ gamma(3,0.5);
y1 ~ poisson(lambda1);
}

generated quantities {
}
"

data_list <- list(y1 = problem10_3_1$discoveries,n1 = length(problem10_3_1$discoveries))

# Compiling and producing posterior samples from the model.
stan_samples <- stan(model_code = model_string, data = data_list)
stan_samples
And the results:
Code:
Inference for Stan model: 67a64cf3a1044b4ab88f343ac21d4848.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

         mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
lambda1  3.11    0.00 0.17  2.78  3.00  3.11  3.22  3.46  1457    1
lp__    42.11    0.02 0.69 40.08 41.96 42.38 42.54 42.58  1967    1

Samples were drawn using NUTS(diag_e) at Wed Apr 28 15:33:48 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
Doing a simple Poisson regression in Stata gives the mean of the sample (3.1) for the incidence rate as expected:

Code:
poisson discoveries,exposure(exp_time) irr
However using

Code:
bayes, prior({discoveries:_cons}, gamma(3,1)) : poisson discoveries, exposure(exp_time)
or

[CODE][bayes : poisson discoveries, exposure(exp_time)/CODE]

or

Code:
bayesmh discoveries, likelihood(poisson) prior({discoveries: _cons}, flat)
or

Code:
bayesmh discoveries, likelihood(poisson) prior({discoveries: _cons}, gamma(9,3))
keeps giving the same mean of about 1.13.

Sorry for the long post but any advice about getting started with Bayes in Stata would be appreciated. I think there is a StataCorp course-probably should take that!

Regards

Chris