Hello,

disclaimer: dataset contains PHI therefore I have recreated a replica with substituted variable names and values that are functionally equivalent to mine.
  1. I have a dataset with many variables (>1,000), of which there are variables (>100) that have redacted values.
  2. The variables with redacted values have different patterns. Some have the entire column listed "redacted", other variables have scattered values that are blank, missing, or "redacted"
  3. The value that signifies the redaction is also different, sometimes it is "redacted" and other times it is "[REDACTED]"
  4. ID variable is unique in the dataset (i.e. primary key to the patient)
  5. Variables NAME, MRM, ADDRESS, CLINIC_ID are fictitious but serve the appropriate purpose to address this question.
Example (have):
ID NAME MRN ADDRESS CLINIC_ID
1 "[REDACTED]" "redacted" "redacted 6
2 "[REDACTED]" "redacted" 10
3 "[REDACTED]" "redacted" 5
4 "[REDACTED]" "redacted" "redacted" 33
5 "[REDACTED]" "redacted" "redacted" 2
6 "[REDACTED]" "redacted" 3
7 "[REDACTED]" "redacted" 1
8 "[REDACTED]" "redacted" "redacted" 6
9 "[REDACTED]" "redacted" 4

GOAL 1: I would like to create 3 variables contain a list of certain variables:
  1. LIST 1: Includes all the variables that have their entire column (e.g. NAME and ADDRESS) =="redacted" or "[REDACTED]"
  2. LIST 2: Includes all the variables that have any values (e.g. NAME, MRN, and ADDRESS ) =="redacted" or "[REDACTED]"
  3. LIST 3: Includes all the variables that have some values (but not all e.g. only columns like MRN above) =="redacted or "[REDACTED]"
GOAL 2: Lastly, I am looking for two ways to drop variables from these lists:
  1. Drop all of the variables within the variable list created
  2. Drop only specific variables within the variable list created.
    1. For example, if I want to drop only the variables in LIST 3 if CLINIC_ID==6

I am lost on how to accomplish this in Stata (still quite novice). The beginning code of my attempt toeven display these variables in just one list is below:

Code:
    foreach var of varlist * {
      capture assert `var' if `var'=="redacted"
      if _rc {
        display in smcl as text "variable {result}`var' "_continue
        display in smcl as text "contains redacted"
      }
    }
I believe creating the list of all variables is sufficient just using

Code:
var of varlist *
and I don't need to use something like -unab- .

However, the foreach code output just displays all of the variables in the dataset. Additionally, I can obtain the list without any extra information using the line as simply

Code:
display in smcl "{result}`var' "
But again, the list of variables I am creating/displaying is not limited to the specific variables that I desire and I am unaware of how to accomplish both GOAL 1 and GOAL 2.

Thanks in advance,

LH