Hello All,
I would like to extract certain specified string components of a string variable so that I can use the tab var, gen(x1) command to create a dummy variable across all components.
The -dataex- is as follows:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 county str64 contest
"LENOIR" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"LENOIR" "REGISTER OF DEEDS"
"LENOIR" "SECRETARY OF STATE"
"LENOIR" "SOIL AND WATER CONSERVATION DISTRICT"
"LENOIR" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"LENOIR" "SUPERIOR COURT JUDGE DISTRICT"
"LENOIR" "SUPREME COURT ASSOCIATE JUSTICE"
"LENOIR" "TREASURER"
"LENOIR" "US HOUSE OF REPRESENTATIVES DISTRICT"
"LENOIR" "US SENATE"
"LINCOLN" "ATTORNEY GENERAL"
"LINCOLN" "AUDITOR"
"LINCOLN" "SCHOOL BOARD"
"LINCOLN" "SCHOOL BOARD IRONTON DISTRICT"
"LINCOLN" "SCHOOL BOARD LINCOLNTON DISTRICT"
"LINCOLN" "SCHOOL BOARD NORTH BROOK DISTRICT"
"LINCOLN" "COMMISSIONER OF AGRICULTURE"
"LINCOLN" "COMMISSIONER OF INSURANCE"
"LINCOLN" "COMMISSIONER OF LABOR"
"LINCOLN" "COUNTY COMMISSIONER"
"LINCOLN" "COURT OF APPEALS JUDGE"
"LINCOLN" "DISTRICT COURT JUDGE DISTRICT"
"LINCOLN" "GOVERNOR"
"LINCOLN" "LIEUTENANT GOVERNOR"
"LINCOLN" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"LINCOLN" "NC STATE SENATE DISTRICT"
"LINCOLN" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"LINCOLN" "SECRETARY OF STATE"
"LINCOLN" "SOIL AND WATER CONSERVATION DISTRICT"
"LINCOLN" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"LINCOLN" "SUPREME COURT ASSOCIATE JUSTICE"
"LINCOLN" "TREASURER"
"LINCOLN" "US HOUSE OF REPRESENTATIVES DISTRICT"
"LINCOLN" "US SENATE"
"MACON" "ATTORNEY GENERAL"
"MACON" "AUDITOR"
"MACON" "COMMISSIONER OF AGRICULTURE"
"MACON" "COMMISSIONER OF INSURANCE"
"MACON" "COMMISSIONER OF LABOR"
"MACON" "COUNTY COMMISSIONER II"
"MACON" "COUNTY COMMISSIONER III"
"MACON" "COURT OF APPEALS JUDGE"
"MACON" "DISTRICT COURT JUDGE DISTRICT"
"MACON" "GOVERNOR"
"MACON" "LIEUTENANT GOVERNOR"
"MACON" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MACON" "NC STATE SENATE DISTRICT"
"MACON" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MACON" "SCHOOL BOARD DISTRICT I"
"MACON" "SCHOOL BOARDI"
"MACON" "SCHOOL BOARD DISTRICT IV"
"MACON" "SECRETARY OF STATE"
"MACON" "SOIL AND WATER CONSERVATION DISTRICT"
"MACON" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"MACON" "SUPREME COURT ASSOCIATE JUSTICE"
"MACON" "TREASURER"
"MACON" "US HOUSE OF REPRESENTATIVES DISTRICT"
"MACON" "US SENATE"
"MADISON" "ATTORNEY GENERAL"
"MADISON" "AUDITOR"
"MADISON" "COMMISSIONER OF AGRICULTURE"
"MADISON" "COMMISSIONER OF INSURANCE"
"MADISON" "COMMISSIONER OF LABOR"
"MADISON" "COUNTY COMMISSIONER"
"MADISON" "COURT OF APPEALS JUDGE"
"MADISON" "DISTRICT COURT JUDGE DISTRICT"
"MADISON" "GOVERNOR"
"MADISON" "LIEUTENANT GOVERNOR"
"MADISON" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MADISON" "NC STATE SENATE DISTRICT"
"MADISON" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MADISON" "REGISTER OF DEEDS"
"MADISON" "SECRETARY OF STATE"
"MADISON" "SOIL AND WATER CONSERVATION DISTRICT"
"MADISON" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"MADISON" "SUPREME COURT ASSOCIATE JUSTICE"
"MADISON" "TREASURER"
"MADISON" "US HOUSE OF REPRESENTATIVES DISTRICT"
"MADISON" "US SENATE"
"MARTIN" "ATTORNEY GENERAL"
"MARTIN" "AUDITOR"
"MARTIN" "BOARD OF COMMISSIONERS EASTERN DISTRICT"
"MARTIN" "SCHOOL BOARD"
"MARTIN" "COMMISSIONER OF AGRICULTURE"
"MARTIN" "COMMISSIONER OF INSURANCE"
"MARTIN" "COMMISSIONER OF LABOR"
"MARTIN" "COURT OF APPEALS JUDGE"
"MARTIN" "DISTRICT COURT JUDGE DISTRICT"
"MARTIN" "GOVERNOR"
"MARTIN" "LIEUTENANT GOVERNOR"
"MARTIN" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MARTIN" "NC STATE SENATE DISTRICT"
"MARTIN" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MARTIN" "REGISTER OF DEEDS"
"MARTIN" "SECRETARY OF STATE"
"MARTIN" "SOIL AND WATER CONSERVATION DISTRICT"
"MARTIN" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"MARTIN" "SUPREME COURT ASSOCIATE JUSTICE"
"MARTIN" "TREASURER"
"MARTIN" "US HOUSE OF REPRESENTATIVES DISTRICT"
"MARTIN" "US SENATE"
"MCDOWELL" "ATTORNEY GENERAL"
"MCDOWELL" "AUDITOR"
"MCDOWELL" "SCHOOL BOARD MARION DISTRICT"
"MCDOWELL" "SCHOOL BOARD NORTH COVE DISTRICT"
"MCDOWELL" "SCHOOL BOARD OLD FORT DISTRICT"
"MCDOWELL" "COMMISSIONER OF AGRICULTURE"
"MCDOWELL" "COMMISSIONER OF INSURANCE"
"MCDOWELL" "COMMISSIONER OF LABOR"
"MCDOWELL" "COUNTY COMMISSIONER"
"MCDOWELL" "COURT OF APPEALS JUDGE"
"MCDOWELL" "GOVERNOR"
"MCDOWELL" "LIEUTENANT GOVERNOR"
"MCDOWELL" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MCDOWELL" "NC STATE SENATE DISTRICT"
"MCDOWELL" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MCDOWELL" "REGISTER OF DEEDS"
The issue is that I would like to remove the extraneous part of the contest variable that makes them incomparable across counties (i.e. things like "Ironton District," "Marion District" and " I", " II", " III", etc). I will eventually be using this variable list with: tab contest, gen (contest_) in order to construct a dummy that indicates whether a particular type of election took place in each county.
What I have been doing so far is using the subinstr command that I read about in a different help file to individually remove each of the extraneous elements of the county name, i.e. "replace contest = subinstr(contest, " IRONTON DISTRICT", "", .)" repeatedly, but there are thousands of lines of code that look like this, across multiple years, and there is no consistency to the extra components that are found before or after the desired component of the variable (and yes, sometimes the part I am trying to get rid of is before, rather than after, even though no examples of this exist within this data sample) that I have to clean, so I am realizing it will take me a very, very long time to sort through the data in this way. In this example, most of the issues happen after "SCHOOL BOARD" and "COUNTY COMMISSIONER" type observations, but this is not always the case, either.
Is there any way to tell Stata that I would like to keep any certain specified elements of a string and to scrap the rest, instead of going through one by one and eliminating the extraneous components? I have also tried to use the strkeep function, but I'm not sure how to apply it to this particular situation, if it would apply at all.
Thanks very much!
(This is my first time posting, and I have tried to follow all the rules, but apologies if I have left something out -- happy to edit or provide further clarification as would be helpful.)
Related Posts with Extracting Specific String Elements of a String Variable
Merge not matching for one year for a specific variable unrelated to the mergeI am looking at data on recycling and compost rates and have merged this data with recycling and was…
Aggregating individual-level data to studysite-level dataDear all, I am still very new to Stata and would need your help with the following: I am currently …
paired data problemI have data for the lipid profile before treatment, 4 weeks, 12 weeks after treatment. Paired data …
25th UK Stata Conference (London): First Announcement and Call for Presentations25th UK Stata Conference (London): First Announcement and Call for Presentations Dates: Thursday 5 …
Predictors of improvement over timeDear statalist members, I would like to model predictors of improvement in a binary variable over t…
Subscribe to:
Post Comments (Atom)
0 Response to Extracting Specific String Elements of a String Variable
Post a Comment