Dear Statalist,
I hope this finds you well and rested after the weekend.
I've recently been trying to generate a Stata dataset from SPSS files (.DAT & .SPS) the files are generated from a tertiary database software called OpenClinica is the progenitor and this is the code I run (please note this also needs to have - ingap.ado - installed).
* Generate Variable labels:
local SpsFile : dir . files "*.sps", respect
insheet using `SpsFile', delimiter(" ") clear
gen Keep = 1 if v1=="VARIABLE" & v2=="LABELS"
replace Keep=0 if v1=="VALUE" & v2=="LABELS"
replace Keep=Keep[_n-1] if Keep==.
keep if Keep==1
keep if v3=="/" | v4=="/"
drop if v1=="VARIABLE" & v2=="LABELS"
capture noisily assert _N==0
if _rc==0 {
set obs 1
gen CodeVar = "* No Variable Labels defined, or error in do file, please check"
di as err "No Variable Labels defined, or error in do file, please check"
pause
}
capture gen CodeVar=""
replace v3=v2 if v3=="/"
replace CodeVar= "lab var " + v1 +`" ""' + v3 + `"""'
list v1 v2 v3 v4 CodeVar
keep CodeVar
ingap
replace CodeVar=`"* Generate Variable labels from `SpsFile' "' in 1
save VariableLabels.dta, replace
* Generate value labels:
insheet using `SpsFile', delimiter(" ") clear
gen Keep = 1 if v1=="VARIABLE" & v2=="LABELS"
replace Keep=0 if v1=="VALUE" & v2=="LABELS"
replace Keep=Keep[_n-1] if Keep==.
keep if Keep==0
drop if v1=="VALUE" & v2=="LABELS"
drop if v1=="."
drop if v1=="EXECUTE."
capture noisily assert _N==0
if _rc==0 {
set obs 1
gen CodeVar = "* No Value Labels defined, or error in do file, please check"
di as err "No Value Labels defined, or error in do file, please check"
pause
}
capture gen CodeVar=""
gen Var=v1 if v2=="" & v1~="/"
replace Var=Var[_n-1] if Var==""
replace CodeVar="label define " + v1 if v2=="" & v1~="/" & CodeVar==""
drop if CodeVar=="label define "
gen Quot=`"""'
replace CodeVar= " " + v1 + " " + Quot + v2 + Quot if CodeVar==""
replace CodeVar= "; " + "label values " + Var + " " + Var + " " + ";" if v1=="/"
ingap
replace CodeVar="#delimit ;" in 1
ingap -1, after
replace CodeVar="#delimit cr ;" in l
list v1 v2 CodeVar, sepby(Var)
keep CodeVar
ingap
replace CodeVar=`"* Generate Value Labels from `SpsFile' "' in 1
save ValueLabels.dta, replace
clear
use VariableLabels.dta
gen VarLab=1
append using ValueLabels.dta
gen ValLabNum=_n if strmatch(CodeVar, "*label define *")
replace ValLabNum=ValLabNum[_n-1] if ValLabNum==.
capture erase VariableLabels.dta
capture erase ValueLabels.dta
* Get rid of prefixed $ symbol in varnames
replace CodeVar=subinstr(CodeVar, "v$", "v", .)
* Get rid of ' quotation mark around numerical codes
* (below needs changing if you have coded variables <-1000 or >1000)
forvalues Num = -1000/1000 {
replace CodeVar=subinstr(CodeVar, "'`Num''", "`Num'", .)
}
* Implement Dataset specific label changes below if required:
replace CodeVar=subinstr(CodeVar, "InterviewDateE", "InterviewDate_E", .)
format CodeVar %-20s
* Visual inspection:
list CodeVar if VarLab==1
pause Please check whether variable labelling commands look ok!
list CodeVar if VarLab==., sepby(ValLabNum)
pause Please check whether value labelling commands look ok!
drop VarLab ValLabNum
outfile using "LabVarsAndValues.do", noquote replace
clear
local DatFile : dir . files "*.dat", respect
insheet using `DatFile', clear case
* Implement Dataset specific Variable name changes below if required:
do "LabVarsAndValues.do"
local StataFile=subinstr(`DatFile', ".dat", ".dta",1)
save "`StataFile'", replace
In between the Label Variables and Values generation step and the importing of the data I'm uncertain of how best to deal with non-evaluable data sources - i.e. an "UNKNOWN" string in a date or binary field. Would it be best to include this during the generation of the Labels Variables and Values step (i.e. include a code for 5. "UNK") or replace all "UNK" fields?
Any advice or code on handling this would be welcome?
kind regards,
Marcus
0 Response to Opening and cleaning SPSS -> STATA
Post a Comment