Dear all,
I wanted your help with the following. I am working on a patent project and below I pasting a short example of one column of my data:
Patent Abstract
Method of analyzing environmental friendly polymer to determine the amount of organic acid.
Method of analyzing environmentally friendly polymer to determine the amount of organic acid.
Method of analyzing environmental friendly to determine the amount of organic acid.
Method of analyzing environmentally efficient chemical to determine the amount of organic acid.
Method of analyzing green polymer to determine the amount of organic acid.
Method of analyzing green- efficiently catalysts to determine the amount of organic acid.
Method of analyzing environmental and friendly polymer to determine the amount of organic acid.
Method of analyzing green friend polymer to determine the amount of organic acid.
Method of analyzing environmental friendly to determine the amount of organic acid.
Method of analyzing green but not environmental efficient catalyst to determine the amount of organic acid.
Method of analyzing environmental friendly polymer to determine the amount of organic acid involves extracting organic acid compounds from environmentally friendly polymer.
Method of analyzing environmentally friendly polymer to determine the amount of organic acid involves extracting organic acid compounds from green polymer.
Specifically the column above namely 'Patent Abstract', provides a short summary of the each patent(above I am pasting an example of this column just to make sure I get the correct results with the command I will be using). For each of these rows I would like to generate a new column/variable that would search if each patent number/row has the following consecutive words: (green OR environmental) AND (efficient OR friendly) AND (chemi OR polymer OR catalyst)
and return the number of the results in a new column named 'Environmental Patents'. These words need to be consecutive (following one each other continuously, separated by spaces only) and to be able to take extensions at the end of each word - for example I want the command to also take into account the 'environmentally' as well and not only the 'environmental'. Therefore the results I want to return in a new column should be as follows:
Environmental Patents
1
1
0
1
0
1
0
0
0
0
2
1
Therefore could you advise on how I could amend the following Stata command, I was provided in my previous post(Credits to Justin Niakamal and William Lisowski(? or advise of a new command?
gen EP1 = 0
gen EP2 = 0
foreach v in "B01D 53/34" "C02F 1/54" "E03F 5/20" {
replace EP1 = EP1 + 1 if regexm(upper(IP), "`v'")
replace EP2 = EP2 + 1 if ustrregexm(upper(IP), "`v'\b")
}
Thanks in advance
Regards,
C.
Related Posts with How to count if each cell of string observations each, contains specific consecutive string observations?
Summary statistics with categorical variablesHi, i was trying to compute the summary statistics for a set of categorical variables and used asdoc…
estat ginvariant with imputed dataHi. For a project I am using plausible values (performance data) and hence use sem in combination wi…
Invalid Syntax while loopinghello, i dont know why stata keeps me showing invalid syntax in this loop: foreach x of p524a1_14 p…
Survival Analysis - different outputsHello everyone, I am trying to do survival analysis on 2018 Nigeria Demographic and Health Survey D…
Merger Simulationhi Stata people, I would like to have your expertise on a problem!! I am analyzing the demand estim…
Subscribe to:
Post Comments (Atom)
0 Response to How to count if each cell of string observations each, contains specific consecutive string observations?
Post a Comment