Hello All,
I would like to extract certain specified string components of a string variable so that I can use the tab var, gen(x1) command to create a dummy variable across all components.
The -dataex- is as follows:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 county str64 contest
"LENOIR" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"LENOIR" "REGISTER OF DEEDS"
"LENOIR" "SECRETARY OF STATE"
"LENOIR" "SOIL AND WATER CONSERVATION DISTRICT"
"LENOIR" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"LENOIR" "SUPERIOR COURT JUDGE DISTRICT"
"LENOIR" "SUPREME COURT ASSOCIATE JUSTICE"
"LENOIR" "TREASURER"
"LENOIR" "US HOUSE OF REPRESENTATIVES DISTRICT"
"LENOIR" "US SENATE"
"LINCOLN" "ATTORNEY GENERAL"
"LINCOLN" "AUDITOR"
"LINCOLN" "SCHOOL BOARD"
"LINCOLN" "SCHOOL BOARD IRONTON DISTRICT"
"LINCOLN" "SCHOOL BOARD LINCOLNTON DISTRICT"
"LINCOLN" "SCHOOL BOARD NORTH BROOK DISTRICT"
"LINCOLN" "COMMISSIONER OF AGRICULTURE"
"LINCOLN" "COMMISSIONER OF INSURANCE"
"LINCOLN" "COMMISSIONER OF LABOR"
"LINCOLN" "COUNTY COMMISSIONER"
"LINCOLN" "COURT OF APPEALS JUDGE"
"LINCOLN" "DISTRICT COURT JUDGE DISTRICT"
"LINCOLN" "GOVERNOR"
"LINCOLN" "LIEUTENANT GOVERNOR"
"LINCOLN" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"LINCOLN" "NC STATE SENATE DISTRICT"
"LINCOLN" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"LINCOLN" "SECRETARY OF STATE"
"LINCOLN" "SOIL AND WATER CONSERVATION DISTRICT"
"LINCOLN" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"LINCOLN" "SUPREME COURT ASSOCIATE JUSTICE"
"LINCOLN" "TREASURER"
"LINCOLN" "US HOUSE OF REPRESENTATIVES DISTRICT"
"LINCOLN" "US SENATE"
"MACON" "ATTORNEY GENERAL"
"MACON" "AUDITOR"
"MACON" "COMMISSIONER OF AGRICULTURE"
"MACON" "COMMISSIONER OF INSURANCE"
"MACON" "COMMISSIONER OF LABOR"
"MACON" "COUNTY COMMISSIONER II"
"MACON" "COUNTY COMMISSIONER III"
"MACON" "COURT OF APPEALS JUDGE"
"MACON" "DISTRICT COURT JUDGE DISTRICT"
"MACON" "GOVERNOR"
"MACON" "LIEUTENANT GOVERNOR"
"MACON" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MACON" "NC STATE SENATE DISTRICT"
"MACON" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MACON" "SCHOOL BOARD DISTRICT I"
"MACON" "SCHOOL BOARDI"
"MACON" "SCHOOL BOARD DISTRICT IV"
"MACON" "SECRETARY OF STATE"
"MACON" "SOIL AND WATER CONSERVATION DISTRICT"
"MACON" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"MACON" "SUPREME COURT ASSOCIATE JUSTICE"
"MACON" "TREASURER"
"MACON" "US HOUSE OF REPRESENTATIVES DISTRICT"
"MACON" "US SENATE"
"MADISON" "ATTORNEY GENERAL"
"MADISON" "AUDITOR"
"MADISON" "COMMISSIONER OF AGRICULTURE"
"MADISON" "COMMISSIONER OF INSURANCE"
"MADISON" "COMMISSIONER OF LABOR"
"MADISON" "COUNTY COMMISSIONER"
"MADISON" "COURT OF APPEALS JUDGE"
"MADISON" "DISTRICT COURT JUDGE DISTRICT"
"MADISON" "GOVERNOR"
"MADISON" "LIEUTENANT GOVERNOR"
"MADISON" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MADISON" "NC STATE SENATE DISTRICT"
"MADISON" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MADISON" "REGISTER OF DEEDS"
"MADISON" "SECRETARY OF STATE"
"MADISON" "SOIL AND WATER CONSERVATION DISTRICT"
"MADISON" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"MADISON" "SUPREME COURT ASSOCIATE JUSTICE"
"MADISON" "TREASURER"
"MADISON" "US HOUSE OF REPRESENTATIVES DISTRICT"
"MADISON" "US SENATE"
"MARTIN" "ATTORNEY GENERAL"
"MARTIN" "AUDITOR"
"MARTIN" "BOARD OF COMMISSIONERS EASTERN DISTRICT"
"MARTIN" "SCHOOL BOARD"
"MARTIN" "COMMISSIONER OF AGRICULTURE"
"MARTIN" "COMMISSIONER OF INSURANCE"
"MARTIN" "COMMISSIONER OF LABOR"
"MARTIN" "COURT OF APPEALS JUDGE"
"MARTIN" "DISTRICT COURT JUDGE DISTRICT"
"MARTIN" "GOVERNOR"
"MARTIN" "LIEUTENANT GOVERNOR"
"MARTIN" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MARTIN" "NC STATE SENATE DISTRICT"
"MARTIN" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MARTIN" "REGISTER OF DEEDS"
"MARTIN" "SECRETARY OF STATE"
"MARTIN" "SOIL AND WATER CONSERVATION DISTRICT"
"MARTIN" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"MARTIN" "SUPREME COURT ASSOCIATE JUSTICE"
"MARTIN" "TREASURER"
"MARTIN" "US HOUSE OF REPRESENTATIVES DISTRICT"
"MARTIN" "US SENATE"
"MCDOWELL" "ATTORNEY GENERAL"
"MCDOWELL" "AUDITOR"
"MCDOWELL" "SCHOOL BOARD MARION DISTRICT"
"MCDOWELL" "SCHOOL BOARD NORTH COVE DISTRICT"
"MCDOWELL" "SCHOOL BOARD OLD FORT DISTRICT"
"MCDOWELL" "COMMISSIONER OF AGRICULTURE"
"MCDOWELL" "COMMISSIONER OF INSURANCE"
"MCDOWELL" "COMMISSIONER OF LABOR"
"MCDOWELL" "COUNTY COMMISSIONER"
"MCDOWELL" "COURT OF APPEALS JUDGE"
"MCDOWELL" "GOVERNOR"
"MCDOWELL" "LIEUTENANT GOVERNOR"
"MCDOWELL" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MCDOWELL" "NC STATE SENATE DISTRICT"
"MCDOWELL" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MCDOWELL" "REGISTER OF DEEDS"
The issue is that I would like to remove the extraneous part of the contest variable that makes them incomparable across counties (i.e. things like "Ironton District," "Marion District" and " I", " II", " III", etc). I will eventually be using this variable list with: tab contest, gen (contest_) in order to construct a dummy that indicates whether a particular type of election took place in each county.
What I have been doing so far is using the subinstr command that I read about in a different help file to individually remove each of the extraneous elements of the county name, i.e. "replace contest = subinstr(contest, " IRONTON DISTRICT", "", .)" repeatedly, but there are thousands of lines of code that look like this, across multiple years, and there is no consistency to the extra components that are found before or after the desired component of the variable (and yes, sometimes the part I am trying to get rid of is before, rather than after, even though no examples of this exist within this data sample) that I have to clean, so I am realizing it will take me a very, very long time to sort through the data in this way. In this example, most of the issues happen after "SCHOOL BOARD" and "COUNTY COMMISSIONER" type observations, but this is not always the case, either.
Is there any way to tell Stata that I would like to keep any certain specified elements of a string and to scrap the rest, instead of going through one by one and eliminating the extraneous components? I have also tried to use the strkeep function, but I'm not sure how to apply it to this particular situation, if it would apply at all.
Thanks very much!
(This is my first time posting, and I have tried to follow all the rules, but apologies if I have left something out -- happy to edit or provide further clarification as would be helpful.)
Related Posts with Extracting Specific String Elements of a String Variable
Finding common values from a variable corresponding to two different values in another variableHello, I'm interested in finding common values from a variable corresponding to two different value…
Code to import value labelsHey im trying to import value labels to a set of observations in a variable. Var name: Parentesco …
New version of cprdhesutil on SSCThanks as always to Kit Baum, a new version of the cprdhesutil package is now available for download…
Splitting stringsI have a string variable containing licence plates, names and dates separated by / . As in 122000200…
Is there a command for creating deciles by date?I understand the command xtile ... , nq(10) can create deciles based on the entire data-set being ag…
Subscribe to:
Post Comments (Atom)
0 Response to Extracting Specific String Elements of a String Variable
Post a Comment