Hi everyone,
I have a dataset with 14 million observations and 22 variables. The variables of interest for this question are as follows:
presonid: unique identifier for a person (not unique in the dataset, but uniquely identifies a person)
konisert: (0/1) a treatment has been carried out true or false.
screeningprove: (0/1): a sample can be classified as either a screening sample or not (in which case it is a follow-up sample), true or false
proveDato: sample date.
Each observation is a sample and these can be grouped by a unique person id. Each person has one to many samples(=observations), so the number of observations per person vary. I would now like to set the variable "screeningprove" to either 0 or 1 based on the occurrence of another variable being set to true "konisert ==1" in a time window of 10 years before the sample in question. This is to be done by personid.
I have tried the following:
*generate a sampleid within personid
bysort personid (proveDato): egen provenr = seq()
*generate the maximum number of samples per person
egen maxprovenr = max(provenr), by(personid)
*create an inner loop; for testing purposes keep just one personid
keep if personid == 100000493
local maxprovenr = maxprovenr[_n]
forval f = 1/ ‘maxprovenr’ {
replace screeningprove = 0 if konisert[_n-‘f’] == 1 & proveDato -proveDato[_n-‘f’] <3650
}
This seems to behave as expected.
I thought i could now nest this loop within another loop that would carry out this inner lopp for each personid. But this is where I can't wrap my head around how to do it. Is this possibly not at all the right approach to this problem?
cheers, Linn
0 Response to nested loop for cycling over observations by group
Post a Comment