Dear all,
I wanted your help with the following. I am working on a patent project and below I pasting a short example of one column of my data:
Patent Abstract
Method of analyzing environmental friendly polymer to determine the amount of organic acid.
Method of analyzing environmentally friendly polymer to determine the amount of organic acid.
Method of analyzing environmental friendly to determine the amount of organic acid.
Method of analyzing environmentally efficient chemical to determine the amount of organic acid.
Method of analyzing green polymer to determine the amount of organic acid.
Method of analyzing green- efficiently catalysts to determine the amount of organic acid.
Method of analyzing environmental and friendly polymer to determine the amount of organic acid.
Method of analyzing green friend polymer to determine the amount of organic acid.
Method of analyzing environmental friendly to determine the amount of organic acid.
Method of analyzing green but not environmental efficient catalyst to determine the amount of organic acid.
Method of analyzing environmental friendly polymer to determine the amount of organic acid involves extracting organic acid compounds from environmentally friendly polymer.
Method of analyzing environmentally friendly polymer to determine the amount of organic acid involves extracting organic acid compounds from green polymer.
Specifically the column above namely 'Patent Abstract', provides a short summary of the each patent(above I am pasting an example of this column just to make sure I get the correct results with the command I will be using). For each of these rows I would like to generate a new column/variable that would search if each patent number/row has the following consecutive words: (green OR environmental) AND (efficient OR friendly) AND (chemi OR polymer OR catalyst)
and return the number of the results in a new column named 'Environmental Patents'. These words need to be consecutive (following one each other continuously, separated by spaces only) and to be able to take extensions at the end of each word - for example I want the command to also take into account the 'environmentally' as well and not only the 'environmental'. Therefore the results I want to return in a new column should be as follows:
Environmental Patents
1
1
0
1
0
1
0
0
0
0
2
1
Therefore could you advise on how I could amend the following Stata command, I was provided in my previous post(Credits to Justin Niakamal and William Lisowski(? or advise of a new command?
gen EP1 = 0
gen EP2 = 0
foreach v in "B01D 53/34" "C02F 1/54" "E03F 5/20" {
replace EP1 = EP1 + 1 if regexm(upper(IP), "`v'")
replace EP2 = EP2 + 1 if ustrregexm(upper(IP), "`v'\b")
}
Thanks in advance
Regards,
C.
Related Posts with How to count if each cell of string observations each, contains specific consecutive string observations?
bysort queryHello, I am trying to use bys to generate sequential and total number of Lines of treatments by pati…
ElasticitiesHello, I would like to estimate the exports elasticity and the imports elasticity of some countries…
Would constant annual variables across firms in a year be fully absorbed by the year dummy?Hi all, I am running the following regression using reghdfe written by @Sergio Correia: reghdfe f.D…
Storing the value of a variable from one observation in a local macro or scalarDear Statalist, I'm using Stata 16.0 trying to store a variable value from one observation in a loc…
Estimate GARCH-DCC with asymmetriesHas anyone had experience estimating GARCH-DCC models with asymmetries (GJR for example)? Can this b…
Subscribe to:
Post Comments (Atom)
0 Response to How to count if each cell of string observations each, contains specific consecutive string observations?
Post a Comment