I am attempting to search pathology reports, which are in string format, to identify only those which have positive results for a bacteria, H.pylori. The issue with searching is that the results are not done in a uniform manner, there are misspellings, and the term "H.pylori" is often present when it is tested for, not just when it is positive.

What I have tried so far includes removing spaces, converting all to lower, and using regexm in a number of iterations: replace hp_present = 0 if regexm (`path_report', "h.pylori is not seen), replace hp_present = 0 if regexm (`path_report', "no h.pylori), replace hp_present = 0 if regexm (`path_report', "negative for h.pylori) .... and so on

Then replace hp_present = 1 if regexm (`path_report', "h.pylori is seen), replace hp_present = 1 if regexm (`path_report', "h.pylori positive) , and so on


The issue becomes that when doing an internal validation, sensitivity was only at 50% (I missed a lot of those with H pylori). I am wondering if anyone has any advice on how to approach this issue where spelling errors, non uniformity, and content need to be taken into account.

Examples of strings of the path_report are given below.

Very appreciated.


Code:

nal diagnosis                        1. "Duodenum biopsies:        histologically unremarkable duodenal mucsa with slight vascular congestion.
 
2. Stomach biopsies: Achtive chronic H. pylori gastritis.
 
3. Tranverse colon polyp" polypectomy: TUbular adenoma fragmented.
 
4. Sessile cecum polyp polpyectomy: Hyperplastic poylpfragments.
 
es PATHOLOGIST,MD
Date Jun 08 2009
 
 
 
 
 
 
 
BRIEF CLINICAL HISTORY: GERD, hx of H pylori chronic gastritis OPERATIVE FINDINGS: POSTOPERATIVE DIAGNOSIS: Surgeon: Surgeon MD
GROSS DESCRIPTIO: Specimen is submitted in formalin and labeled biopsies gastric antrum. The one fragment shows mucosal lymphoid aggregate with herminal center. Giemsa             stains show organisms the morphology of which is consistent with H.       pylori.
 
Gastric antrum biopsies:            Active chronic gastritis associated with H. pylori.
 
 
 
 
 
 
BRIEF CLINICAL HISTORY: GERD, hx of H pylori chronic gastritis OPERATIVE FINDINGS: POSTOPERATIVE DIAGNOSIS: Surgeon: Surgeon MD
GROSS DESCRIPTIO: Specimen is submitted in formalin and labeled biopsies gastric antrum. The one fragment shows mucosal lymphoid aggregate with herminal center. Warthin-Starry             stains pending.
 
Gastric antrum biopsies:            Active chronic gastritis not seen, no H. pylori.
 
 
 
BRIEF CLINICAL HISTORY: GERD, hx of H pylori chronic gastritis OPERATIVE FINDINGS: POSTOPERATIVE DIAGNOSIS: Surgeon: Surgeon MD
GROSS DESCRIPTIO: Specimen is submitted in formalin and labeled biopsies gastric antrum. The one fragment shows mucosal lymphoid aggregate with herminal center. Warthin-Starry             stains pending.
 
Gastric antrum biopsies:            Active chronic gastritis not seen, no H. pylori.
 ADDENDUM: POSITIVE HPYLORI

 
 
MEDICAL RECORD: 78907689
SURGEONPHYSICIAN: Surgeon MD
PREOPERATIVE DX: r/o Helicobacter pylori
 
Final diagnosis: rare comma shaped organisms seen consistent with H pylori
 
 
 
 
 
 
 
MEDICAL RECORD: 78907689
SURGEONPHYSICIAN: Surgeon MD
PREOPERATIVE DX: r/o Helicobacter pylori
 
Final diagnosis: no comma shaped organisms seen consistent with H pylori