Hello All,
I would like to extract certain specified string components of a string variable so that I can use the tab var, gen(x1) command to create a dummy variable across all components.
The -dataex- is as follows:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 county str64 contest
"LENOIR" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"LENOIR" "REGISTER OF DEEDS"
"LENOIR" "SECRETARY OF STATE"
"LENOIR" "SOIL AND WATER CONSERVATION DISTRICT"
"LENOIR" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"LENOIR" "SUPERIOR COURT JUDGE DISTRICT"
"LENOIR" "SUPREME COURT ASSOCIATE JUSTICE"
"LENOIR" "TREASURER"
"LENOIR" "US HOUSE OF REPRESENTATIVES DISTRICT"
"LENOIR" "US SENATE"
"LINCOLN" "ATTORNEY GENERAL"
"LINCOLN" "AUDITOR"
"LINCOLN" "SCHOOL BOARD"
"LINCOLN" "SCHOOL BOARD IRONTON DISTRICT"
"LINCOLN" "SCHOOL BOARD LINCOLNTON DISTRICT"
"LINCOLN" "SCHOOL BOARD NORTH BROOK DISTRICT"
"LINCOLN" "COMMISSIONER OF AGRICULTURE"
"LINCOLN" "COMMISSIONER OF INSURANCE"
"LINCOLN" "COMMISSIONER OF LABOR"
"LINCOLN" "COUNTY COMMISSIONER"
"LINCOLN" "COURT OF APPEALS JUDGE"
"LINCOLN" "DISTRICT COURT JUDGE DISTRICT"
"LINCOLN" "GOVERNOR"
"LINCOLN" "LIEUTENANT GOVERNOR"
"LINCOLN" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"LINCOLN" "NC STATE SENATE DISTRICT"
"LINCOLN" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"LINCOLN" "SECRETARY OF STATE"
"LINCOLN" "SOIL AND WATER CONSERVATION DISTRICT"
"LINCOLN" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"LINCOLN" "SUPREME COURT ASSOCIATE JUSTICE"
"LINCOLN" "TREASURER"
"LINCOLN" "US HOUSE OF REPRESENTATIVES DISTRICT"
"LINCOLN" "US SENATE"
"MACON" "ATTORNEY GENERAL"
"MACON" "AUDITOR"
"MACON" "COMMISSIONER OF AGRICULTURE"
"MACON" "COMMISSIONER OF INSURANCE"
"MACON" "COMMISSIONER OF LABOR"
"MACON" "COUNTY COMMISSIONER II"
"MACON" "COUNTY COMMISSIONER III"
"MACON" "COURT OF APPEALS JUDGE"
"MACON" "DISTRICT COURT JUDGE DISTRICT"
"MACON" "GOVERNOR"
"MACON" "LIEUTENANT GOVERNOR"
"MACON" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MACON" "NC STATE SENATE DISTRICT"
"MACON" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MACON" "SCHOOL BOARD DISTRICT I"
"MACON" "SCHOOL BOARDI"
"MACON" "SCHOOL BOARD DISTRICT IV"
"MACON" "SECRETARY OF STATE"
"MACON" "SOIL AND WATER CONSERVATION DISTRICT"
"MACON" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"MACON" "SUPREME COURT ASSOCIATE JUSTICE"
"MACON" "TREASURER"
"MACON" "US HOUSE OF REPRESENTATIVES DISTRICT"
"MACON" "US SENATE"
"MADISON" "ATTORNEY GENERAL"
"MADISON" "AUDITOR"
"MADISON" "COMMISSIONER OF AGRICULTURE"
"MADISON" "COMMISSIONER OF INSURANCE"
"MADISON" "COMMISSIONER OF LABOR"
"MADISON" "COUNTY COMMISSIONER"
"MADISON" "COURT OF APPEALS JUDGE"
"MADISON" "DISTRICT COURT JUDGE DISTRICT"
"MADISON" "GOVERNOR"
"MADISON" "LIEUTENANT GOVERNOR"
"MADISON" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MADISON" "NC STATE SENATE DISTRICT"
"MADISON" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MADISON" "REGISTER OF DEEDS"
"MADISON" "SECRETARY OF STATE"
"MADISON" "SOIL AND WATER CONSERVATION DISTRICT"
"MADISON" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"MADISON" "SUPREME COURT ASSOCIATE JUSTICE"
"MADISON" "TREASURER"
"MADISON" "US HOUSE OF REPRESENTATIVES DISTRICT"
"MADISON" "US SENATE"
"MARTIN" "ATTORNEY GENERAL"
"MARTIN" "AUDITOR"
"MARTIN" "BOARD OF COMMISSIONERS EASTERN DISTRICT"
"MARTIN" "SCHOOL BOARD"
"MARTIN" "COMMISSIONER OF AGRICULTURE"
"MARTIN" "COMMISSIONER OF INSURANCE"
"MARTIN" "COMMISSIONER OF LABOR"
"MARTIN" "COURT OF APPEALS JUDGE"
"MARTIN" "DISTRICT COURT JUDGE DISTRICT"
"MARTIN" "GOVERNOR"
"MARTIN" "LIEUTENANT GOVERNOR"
"MARTIN" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MARTIN" "NC STATE SENATE DISTRICT"
"MARTIN" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MARTIN" "REGISTER OF DEEDS"
"MARTIN" "SECRETARY OF STATE"
"MARTIN" "SOIL AND WATER CONSERVATION DISTRICT"
"MARTIN" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
"MARTIN" "SUPREME COURT ASSOCIATE JUSTICE"
"MARTIN" "TREASURER"
"MARTIN" "US HOUSE OF REPRESENTATIVES DISTRICT"
"MARTIN" "US SENATE"
"MCDOWELL" "ATTORNEY GENERAL"
"MCDOWELL" "AUDITOR"
"MCDOWELL" "SCHOOL BOARD MARION DISTRICT"
"MCDOWELL" "SCHOOL BOARD NORTH COVE DISTRICT"
"MCDOWELL" "SCHOOL BOARD OLD FORT DISTRICT"
"MCDOWELL" "COMMISSIONER OF AGRICULTURE"
"MCDOWELL" "COMMISSIONER OF INSURANCE"
"MCDOWELL" "COMMISSIONER OF LABOR"
"MCDOWELL" "COUNTY COMMISSIONER"
"MCDOWELL" "COURT OF APPEALS JUDGE"
"MCDOWELL" "GOVERNOR"
"MCDOWELL" "LIEUTENANT GOVERNOR"
"MCDOWELL" "NC HOUSE OF REPRESENTATIVES DISTRICT"
"MCDOWELL" "NC STATE SENATE DISTRICT"
"MCDOWELL" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
"MCDOWELL" "REGISTER OF DEEDS"
The issue is that I would like to remove the extraneous part of the contest variable that makes them incomparable across counties (i.e. things like "Ironton District," "Marion District" and " I", " II", " III", etc). I will eventually be using this variable list with: tab contest, gen (contest_) in order to construct a dummy that indicates whether a particular type of election took place in each county.
What I have been doing so far is using the subinstr command that I read about in a different help file to individually remove each of the extraneous elements of the county name, i.e. "replace contest = subinstr(contest, " IRONTON DISTRICT", "", .)" repeatedly, but there are thousands of lines of code that look like this, across multiple years, and there is no consistency to the extra components that are found before or after the desired component of the variable (and yes, sometimes the part I am trying to get rid of is before, rather than after, even though no examples of this exist within this data sample) that I have to clean, so I am realizing it will take me a very, very long time to sort through the data in this way. In this example, most of the issues happen after "SCHOOL BOARD" and "COUNTY COMMISSIONER" type observations, but this is not always the case, either.
Is there any way to tell Stata that I would like to keep any certain specified elements of a string and to scrap the rest, instead of going through one by one and eliminating the extraneous components? I have also tried to use the strkeep function, but I'm not sure how to apply it to this particular situation, if it would apply at all.
Thanks very much!
(This is my first time posting, and I have tried to follow all the rules, but apologies if I have left something out -- happy to edit or provide further clarification as would be helpful.)
Related Posts with Extracting Specific String Elements of a String Variable
Counting using loop members of the same household during a specific time period.Hi everyone, So I'm using Stata 12.0 and currently working on a dataset with around 500,000 indivs,…
Interraction effectstata command for running interraction effects of two independent variables on one dependent variabl…
"not sorted" after predict u command on panel frontier modelHi, I am developing a frontier model but am unable to predict inefficiency and efficiency values. Be…
Relative risk ratiohow to interpret relative risk ratio in multinomial logistic regression. Also, what is the differenc…
Selmlog command (selection bias correction based on multinomial logit)Dear Stata professionals, Even after reading the manual for the implementation of the selmlog comman…
Subscribe to:
Post Comments (Atom)
0 Response to Extracting Specific String Elements of a String Variable
Post a Comment