It so happens we need to implement our own parameter estimation for 3PL IRT model in our small in-house tool we are developing. I'm using Stata to compare how our estimates fare against those of professional software. It's not surprising, I guess, that I've run into discrepancies, given that the model is not particularly easy to estimate. If somebody has an in-depth knowledge of the model and the techniques that one uses to estimate parameters, I'd be grateful if you shared your insights. I'm pretty sure my questions are ones of a newbie, so any expert shouldn't have a hard time answering them.

First, the overall scenario is that we feed tests of similar questions to a sample of students. By similar I mean topically similar -- for example, they can be math questions or chem questions, but never math and chem questions combined. Would it be ok then if I chose ratio of correctly answered questions as the measure of student's ability/latent trait? I believe it makes sense, but above all I'd very much prefer to avoid the situation when I have to estimate both model's parameters and students' abilities from the likelihood function -- I'm struggling with the model as it is, with students' abilities known.

If it's ok, can I estimate both students' abilities and parameters of the model from the same test or will I be over-fitting the data? I'll explain... Say, our test contains 20 math questions. We feed it to a sample of 100 students. We record who answered what and then we calculate ratios of correctly answered questions for every student. These are students' known abilities. Can I now take the first question of the test and estimate 3PL parameters for it based on the data of ratios I have just recorded? I mean it looks like I used the first question (together with the other 19, but still) to estimate abilities of the students, and now I use these estimates to estimate 3PL parameters of the first question. This back-and-forth sounds like over-fitting to me... What's your opinion?

Third, to estimate 3PL parameters I use classic method described in F. Baker and S-Ho Kim's work "Item Response Theory: Parameter Estimation Techniques..." where they use Newton-Raphson iterative procedure. We've tried to run it, but we can't make it deliver consistent results. Instead, we are thinking of outsourcing this part to the libraries of math functions that are available for the programming language we use. To be precise, we want to construct the likelihood function ourselves and then feed it to some generic maximum-finding algorithm. Sounds straightforward, but since I'm not getting consistent results, I'm doubting my whole understanding of how it's supposed to work. Could you confirm that yes, indeed -- the ultimate goal is that we simply need to find the maximum of the likelihood function that we build based on our abilities data from the sample of students?

Finally, and this is where Stata comes in, I don't understand how Stata manages to build 3PL model for a test of a single question. A single question means that students' abilities (as ratios of correctly answered questions) can be estimated to be only either 0 or 1. When I construct the likelihood function and search for its maximum, then the 3PL parameters I obtain give me a very degenerate logistic function, looking like this (red points on the plot correspond to the students' abilities, but since I normalize them, they are not actually at points 0 and 1): Stata, on the other hand, manages to produce a very normal-looking and smooth logistic function. When the test consists of dozens of questions, so that estimated abilities fill a certain range, I get smooth functions too. But I can't comprehend how Stata fits the data even in degenerate cases like this. Which leads me to doubting whether I even understand what I'm supposed to do and what the end result should be.

Any thoughts on any of the questions are most welcome. And thanks.