A few of my colleagues reviewed police investigative case files and measured the victims' demographic characteristics for each case using a single string variable, victim_demo. They were instructed to identify the victim's race (B, W, O, or U), gender (M, F, O, or U), and DOB using commas to separate each attribute. For example, a white, male victim with a DOB of 1/1/1980 should have had a value of "W,M,1/1/1980". My hope was to split the variable based on the comma to create the 3 demographic variables. As you can see below, they did not follow those instructions. My solution (I'm open to better ones) is to 1) create a new variable with just the numbers from the victim_demo variable (e.g., row 1 below would have a value of "08071996"), 2) separate the DOB values from the age in years values into distinct variables, 3) convert the DOB values to age in years values, and then 4) combine the age variables to have a single measure of age in years. I think I can figure out the converting to age in years part, but I can't figure out how to pull any/all numbers from the victim_demo into a new variable, let's say victimage_temp. I found this website - https://stats.oarc.ucla.edu/stata/fa...r-expressions/ - and I think I need to use regex but I'm not familiar with it. If someone could help me out with the code I'd really appreciate it. If you also have the time to show an efficient way of turning that new variable into a single measure of age in years I would also really appreciate it but I can probably figure it out if not. Thank you for any help!
Code:
input str30 victim_demo "B/M/08/07/1996" "B/M unknown ages" "B/M/06/27/2000" "B/M Teen" "B/M 4-27-92 27y/o" "B/M 16 y/o. no further" "W/M DOB is not listed" "B/M" "B/M 12-24-1995" "B/M 09-27-83 36y/o" "W, F, 27 years" end
0 Response to Extracting numbers from a string variable to put into a new variable
Post a Comment