How to count if each cell of string observations each, contains specific consecutive string observations?

Dear all,

I wanted your help with the following. I am working on a patent project and below I pasting a short example of one column of my data:

Patent Abstract
Method of analyzing environmental friendly polymer to determine the amount of organic acid.
Method of analyzing environmentally friendly polymer to determine the amount of organic acid.
Method of analyzing environmental friendly to determine the amount of organic acid.
Method of analyzing environmentally efficient chemical to determine the amount of organic acid.
Method of analyzing green polymer to determine the amount of organic acid.
Method of analyzing green- efficiently catalysts to determine the amount of organic acid.
Method of analyzing environmental and friendly polymer to determine the amount of organic acid.
Method of analyzing green friend polymer to determine the amount of organic acid.
Method of analyzing environmental friendly to determine the amount of organic acid.
Method of analyzing green but not environmental efficient catalyst to determine the amount of organic acid.
Method of analyzing environmental friendly polymer to determine the amount of organic acid involves extracting organic acid compounds from environmentally friendly polymer.
Method of analyzing environmentally friendly polymer to determine the amount of organic acid involves extracting organic acid compounds from green polymer.

Specifically the column above namely 'Patent Abstract', provides a short summary of the each patent(above I am pasting an example of this column just to make sure I get the correct results with the command I will be using). For each of these rows I would like to generate a new column/variable that would search if each patent number/row has the following consecutive words: (green OR environmental) AND (efficient OR friendly) AND (chemi OR polymer OR catalyst)
and return the number of the results in a new column named 'Environmental Patents'. These words need to be consecutive (following one each other continuously, separated by spaces only) and to be able to take extensions at the end of each word - for example I want the command to also take into account the 'environmentally' as well and not only the 'environmental'. Therefore the results I want to return in a new column should be as follows:

Environmental Patents
1
1
0
1
0
1
0
0
0
0
2
1

Therefore could you advise on how I could amend the following Stata command, I was provided in my previous post(Credits to Justin Niakamal and William Lisowski(? or advise of a new command?
gen EP1 = 0
gen EP2 = 0
foreach v in "B01D 53/34" "C02F 1/54" "E03F 5/20" {
replace EP1 = EP1 + 1 if regexm(upper(IP), "`v'")
replace EP2 = EP2 + 1 if ustrregexm(upper(IP), "`v'\b")
}

Thanks in advance

Regards,
C.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / How to count if each cell of string observations each, contains specific consecutive string observations?
How to count if each cell of string observations each, contains specific consecutive string observations?

0 Response to How to count if each cell of string observations each, contains specific consecutive string observations?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / How to count if each cell of string observations each, contains specific consecutive string observations? How to count if each cell of string observations each, contains specific consecutive string observations?

Related Posts with How to count if each cell of string observations each, contains specific consecutive string observations?

0 Response to How to count if each cell of string observations each, contains specific consecutive string observations?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / How to count if each cell of string observations each, contains specific consecutive string observations?
How to count if each cell of string observations each, contains specific consecutive string observations?