disclaimer: dataset contains PHI therefore I have recreated a replica with substituted variable names and values that are functionally equivalent to mine.
- I have a dataset with many variables (>1,000), of which there are variables (>100) that have redacted values.
- The variables with redacted values have different patterns. Some have the entire column listed "redacted", other variables have scattered values that are blank, missing, or "redacted"
- The value that signifies the redaction is also different, sometimes it is "redacted" and other times it is "[REDACTED]"
- ID variable is unique in the dataset (i.e. primary key to the patient)
- Variables NAME, MRM, ADDRESS, CLINIC_ID are fictitious but serve the appropriate purpose to address this question.
ID | NAME | MRN | ADDRESS | CLINIC_ID |
1 | "[REDACTED]" | "redacted" | "redacted | 6 |
2 | "[REDACTED]" | "redacted" | 10 | |
3 | "[REDACTED]" | "redacted" | 5 | |
4 | "[REDACTED]" | "redacted" | "redacted" | 33 |
5 | "[REDACTED]" | "redacted" | "redacted" | 2 |
6 | "[REDACTED]" | "redacted" | 3 | |
7 | "[REDACTED]" | "redacted" | 1 | |
8 | "[REDACTED]" | "redacted" | "redacted" | 6 |
9 | "[REDACTED]" | "redacted" | 4 |
GOAL 1: I would like to create 3 variables contain a list of certain variables:
- LIST 1: Includes all the variables that have their entire column (e.g. NAME and ADDRESS) =="redacted" or "[REDACTED]"
- LIST 2: Includes all the variables that have any values (e.g. NAME, MRN, and ADDRESS ) =="redacted" or "[REDACTED]"
- LIST 3: Includes all the variables that have some values (but not all e.g. only columns like MRN above) =="redacted or "[REDACTED]"
- Drop all of the variables within the variable list created
- Drop only specific variables within the variable list created.
- For example, if I want to drop only the variables in LIST 3 if CLINIC_ID==6
I am lost on how to accomplish this in Stata (still quite novice). The beginning code of my attempt toeven display these variables in just one list is below:
Code:
foreach var of varlist * { capture assert `var' if `var'=="redacted" if _rc { display in smcl as text "variable {result}`var' "_continue display in smcl as text "contains redacted" } }
Code:
var of varlist *
However, the foreach code output just displays all of the variables in the dataset. Additionally, I can obtain the list without any extra information using the line as simply
Code:
display in smcl "{result}`var' "
Thanks in advance,
LH
0 Response to Create list of variables w/ different redaction patterns & dropping specific redacted variables based on other non-redacted vars
Post a Comment