I’d greatly appreciate help working with string data. Below is a snapshot of my data that is in long format. There are two things that I want to do with it:
1) For the variable KeyEventDetails, I only want to keep the first entry of a trial stage if there are multiple entries for the same stage for the same drug. For example, the drug "(+)-discodermolide" has observation “Phase I Clinical Trial” that corresponds with “Change in Global Status” under variable KeyEvent, followed by the observation “Phase I Clinical Trial, Unspecified” that corresponds with “Discontinued Product” under variable KeyEvent. In this case, I only want to keep the first observation and delete the second.
2) I want to write a logic statement that states that for the same drug if an observation with “New Product” is followed by an observation with “No Development Reported” in variable KeyEvent, then keep only the first observation that corresponds to “New Product” for that drug.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str64 Drug str26 KeyEvent str60 KeyEventDetails "(+)-calanolide A" "CHANGE IN GLOBAL STATUS" "PHASE I CLINICAL TRIAL" "(+)-calanolide A" "CHANGE IN GLOBAL STATUS" "PHASE I/II CLINICAL TRIAL" "(+)-discodermolide" "CHANGE IN GLOBAL STATUS" "PHASE I CLINICAL TRIAL" "(+)-discodermolide" "DISCONTINUED PRODUCTS" "PHASE I CLINICAL TRIAL, UNSPECIFIED" "(+)-mefloquine, Vernalis" "DISCONTINUED PRODUCTS" "PRECLINICAL, UNSPECIFIED" "(+)-phenserine" "CHANGE IN GLOBAL STATUS" "PHASE I CLINICAL TRIAL" "(+)-phenserine" "CHANGE IN GLOBAL STATUS" "PHASE II CLINICAL TRIAL" "(-)-didesmethylsibutramine" "CHANGE IN GLOBAL STATUS" "PHASE I CLINICAL TRIAL" "(-)-didesmethylsibutramine" "CHANGE IN GLOBAL STATUS" "PHASE II CLINICAL TRIAL" "(E1)-3s" "NEW PRODUCT" "PRECLINICAL" "(R)-etodolac" "CHANGE IN GLOBAL STATUS" "PHASE I/II CLINICAL TRIAL" "(R)-fluoxetine" "DISCONTINUED PRODUCTS" "PHASE II CLINICAL TRIAL, UNSPECIFIED" "(R)-ketorolac, Sepracor" "DISCONTINUED PRODUCTS" "PRECLINICAL" "(R)-salbutamol, Contramid" "DISCONTINUED PRODUCTS" "PHASE II CLINICAL TRIAL, UNSPECIFIED" "(R)-salbutamol, Sepracor" "CHANGE IN GLOBAL STATUS" "PHASE II CLINICAL TRIAL" "(R)-salbutamol, Sepracor" "CHANGE IN GLOBAL STATUS" "PHASE III CLINICAL TRIAL" "(R)-sibutramine, metabolite" "DISCONTINUED PRODUCTS" "PHASE II CLINICAL TRIAL, STRATEGIC" "(R)-zacopride" "CHANGE IN GLOBAL STATUS" "PHASE II CLINICAL TRIAL" "(R)-zacopride" "DISCONTINUED PRODUCTS" "PHASE II CLINICAL TRIAL"
Karishma
0 Response to Handling long data in string format
Post a Comment