I have written a paper on how one can use the Monty Hall problem (with the 3 doors and the goats and the one car behind the doors) as an introduction to permutations inference.
If there is a fellow among us who is well versed both in Stata and R, and can translate my Stata code to R code, I would like to include also R code doing the same in my paper. I will give proper (in my view) attribution to the potential author of the R code by thanking him/her in the thanks note, and by clearly stating that the R code is due to him.
The Stata code below is very long, but this is because I put in ample comments of what I am doing and why. At the end of the day what I am doing in Stata is fairly simple. I am also attaching my whole paper if you are interested in the context. Array
He code that needs to be translate to R code follow below. Stata do file implementing the permutations solution to the generalised Monty Hall problem:
Code:
*** This file implements the permutations solution 2 to the slightly generalised Monty Hall problem *** There are N doors, only one car behind, and the game host opens n doors revealing goats behind *** In Stata what is between /* and */, after //, and after *, are comments not interpreted by Stata *** Stata language is very natural, and commands most of the time do just what they say set seed 666 // We set the random seed for reproducibility matrix J = J(6, 3, 0) // We create a 6 by 3 matrix with 0 in each entry, there we will save our calculations. local row = 1 /* local is something like a scalar, but it can hold strings too. We retrieve local's current content by dereferencing it, like this `row'. I have created the local row and I will be incrementing it to move down the rows of matrix J.*/ local N = 10 /*This is the number of doors, I will solve 10 doors problem, if other doors are needed change accordingly.*/ local n = 3 /*The host will be opening 3 doors revealing goats, if other doors are to be opened change accordingly.*/ /* What follows below is a loop. Note that the loop is in terms of the local b, which stands for the number of replications, and I am dereferencing b just as I would dereference any other local.*/ foreach b of numlist 10 100 1000 10000 100000 1000000 { // loop starts here clear // This clears the data, but the matrix J is still there. set obs `=`b'*`N'' /* set obs sets the observations to a number. Usually it looks simpler than here, say set obs 70; `=calculation' forces evaluation of the calculation, so with the inner `' I am dereferencing `b' and `N' to retrieve their content, and with the outer `=' I am forcing Stata to evaluate the expression. */ matrix J[`row',1] = `b' // I will be saving in the first column of the matrix J the number of replications b. egen id = seq(), from(1) to(`N') /*Stata has two commands that generate a new variable, id here. generate and egen (extended generate). This generates a sequence as you go down the rows of the variable id from 1 to N, then it repeats. The interpretation of id is that this is the identification number of our doors, from 1st to Nth door.*/ egen replication = seq(), block(`N') /*The new variable replication will be 1 for the first block of N rows, 2 for the second block of N, etc.The interpretation of the replication variable is that these are different possible realisations of the arrangement of cars and goats behind the N doors.*/ generate doorone = (id==1) /* I generate a new dummy variable doorone. The meaning is twofold: 1. without loss of generality the player chooses initially door 1, there is no loss of generality because in a moment I will start reshuffling the goats and cars behind the doors. 2. before reshuffling the car was behind this door 1. The first equality equates the new variable doorone to whatever expression is on the right of the equality sign. The second double equality == is a logical evaluation. It evaluates to 1 when variable id is equal to 1, and to 0 when the variable id is not equal to 1*/. generate double uniform = runiform() /* This generates a uniform random variable. I will use it to put the data in random order.*/ sort replication uniform /* This sorts the data by the variable replication first, and then within replication, it sorts the data by the random uniform variable, i.e., it puts the data in random order within replication.*/ by replication: generate permdoorone = doorone[id] /* I generate a new variable permdoorone, which is the permuted/reshuffled version of doorone. In Stata variable is a vector, a column of values. We can refer explicitly to say the 7th value/observation of variable doorone by writing doorone[7]. One operation above I put the data within replication in random order, so the variable id is also in random order. Therefore the current operation creates a new variable permdoorone, whose each value within replication, is equal to the value of doorone at position id. But id is reshufled.*/ sort replication id // I put the data in the original order. summarize permdoorone if id==1 /* if id==1, restricts the sample over which the calculation is done to the set of values for which the logical expression id==1 evaluates to 1 (true). Then the command summarize summarises the variable permdoors, which contains the reshuffled version of doorone, which might, or might not hide a car behind. The command summarize leaves behind the calculated mean in r(mean), and other things which we do not need here such as minimum in r(min), standard deviation in r(sd), etc.*/ * What we have done above is to calculate 1/N, which is the probability of finding the car from the No Switch strategy. matrix J[`row',2] = r(mean) /* We save our calculation in the second column of matrix J, in the row corresponding to the number of replications b.At first round of the loop row is equal to 1, below I will increments it to move down the rows of the matrix for each next round of the loop.*/ drop if id==1 /* I am done with the calculation for the No Switch, and now I will be calculating the Switch strategy probabilities. So I drop the data corresponding to first door. The first door might, or might not be hiding a car. */ sort replication permdoorone /* By the variable replication, I am sending the door with the car to the last position, if there is a door with a car left. If the car was behind the first door, the car is gone by our previous dropping of the first door, so this sorting does not do anything. */ by replication: drop if _n<=`n' /* The host will open n doors to reveal goats, so these doors are out of the calculation. Stata refers to the current number of observation by the system variable _n, and number of observation is counted with respect to the by group. So I am telling Stata to drop all doors by each replication value, which doors are smaller or equal to the number of doors the host will open.*/ summarize permdoorone /* I have put the car, if there is a car at all after switching, in the last position. But the player does not know that. So the chance of the player from Switching is the average: (number of cars remaining)/(total doors unopened remaining).*/ matrix J[`row',3] = r(mean) // We save the probability of finding the car by Switching. local ++row // I increment the local row by one so on the next round of the loop I move to the next row of the matrix J. drop id replication doorone uniform permdoorone /* I am dropping all the variables to start the next round of the loop afresh. */ } // loop ends here. matlist J // I am displaying the matrix J with my results. display "The prob of finding the car by No Switch = " 1/`N' /* display in Stata can serve as a hand calculator. The string in " " is just displayed.*/ display "The prob of finding the car by Switch = " (`N'-1)/(`N'*(`N'-`n'-1))
0 Response to The Monty Hall problem through permutations: can somebody translate the Stata code below to R code?
Post a Comment