Hello,
I am trying to run the Stata code below, and everything runs except at the very end I am getting 'the command i unrecognized r(199) error'. How can I avoid this error? I am new to Stata and I am not so sure. I have attached the pharmacy_small.dta file with this post so that you can run the code on your computer.
STATA CODE:
clear
//import the pharmacy_small Stata dataset
use pharmacy_small
// change the the variables store_type, area, and compliance into binary categorical variables with 0's and 1's
generate chain = store_type == "CHAIN"
generate north = area == "North"
// numericize all the string categorical variables while retaining the same label
encode county, generate(county_num)
python:
# install sklearn, sfi, numpy, and pandas packages first
# make sure to install them first!
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn import metrics # import scikit-learn metrics module for accuracy calculation
from sfi import Data
import numpy as np
import pandas as pd
# Use the sfi Data class to pull data from Stata variables
X = pd.DataFrame(Data.get("educate north county_num chain"),
columns = ['educate', 'north', 'county_num', 'chain'])
Y = pd.DataFrame(Data.get("compliance"), columns = ['compliance'])
# split the pharmacy_small dataset into a training and a test set using the python commands
# splitting data into a test and training set is much easier in Python than in Stata (takes 1 line)
# 'test_size = 0.25' tells Python that we want to reserve 25% of our data for the test set
# train_test_split() will automatically shuffle the data before the split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25)
end
clear
gen Alpha = .
gen AUC = .
local i = 0
range alphas 0.0 1.0 20
foreach a in alphas {
i++
python: a = Data.get("a")
// predict using the best value for alpha
python: mnb = MultinomialNB(alpha = a, class_prior = None, fit_prior = True)
// calculate probability of each class on the test set
// '[:, 1]' at the end extracts the probability for each pharmacy to be under compliance
python: Y_mnb_score = mnb.fit(X_train, np.ravel(Y_train)).predict_proba(X_test)[:, 1]
// make test_compliance python variable
python: test_compliance = Y_test['compliance']
// transfer the python variables Y_mnb_score and test_compliance to STATA
python: Data.setObsTotal(len(Y_mnb_score))
python: Data.addVarFloat('mnbScore')
python: Data.store(var = 'mnbScore', obs = None, val = Y_mnb_score)
python: Data.setObsTotal(len(test_compliance))
python: Data.addVarFloat('testCompliance')
python: Data.store(var = 'testCompliance', obs = None, val = test_compliance)
roctab testCompliance mnbScore
replace AUC = r(area) in `i' // at this point I am getting an error, I think
replace Alpha = `a'
}
Thank you for your help!
Related Posts with stata command unrecognized r(199) error
Variation within a groupHello To detect if the value of a numeric variable var_num changed within a group var_cat, I do this…
By Patient ID subtract dates if observation (admission ID) condition is metHi all, I'm working on a fairly large hospital readmission dataset with 350 vars and 1.5M observati…
Aggregate individual-level data into household-level dataHi. I am working with a survey that gathers information on an individual level. The survey asks all …
xtabond2 command help, PLEASEHi all! I am just getting into GMM (dynamic panel) for the first time, and I'm having much difficult…
Problem with using margins code and creating a dummy variable on command boxDear statalist I have a problem using margins on stata command. I'm doing some research related to …
Subscribe to:
Post Comments (Atom)
0 Response to stata command unrecognized r(199) error
Post a Comment