Manipulate complex long strings - Drop everything after word "hectares" and keep number of hectares

Dear all,
I have searched on the Forum for quite some time and tried different approaches to manipulating a quite messy and long string. I would appreciate any help in answering my question.

I have a variable labelled "infringement" that contains a lot of text (see two examples below):

infringement
Destruir (danificar, desmatar) florestas ou demais formas de vegetações consideradas de preservação permanente (áreas do art. 2º da Lei 4.771/65)
Ficam embargadas todas e quaisquer atividades em uma área 26,823 hectares, delimitada pelas coordenadas geográficas constantes no processo administrativo correspondente.

My question is how can I extract only the number of hectares (as highlighted in red in the second example) using Stata 17?
My thought was to drop everything after hectares (including the word hectares) and then keep the numerical values that indicate the number of hectares from the end of the remaining string until the next whitespace. Note that the length of the unit of hectares can vary and that the number might be interrupted by a comma or dot. I want the full number saved as a string as I intend to subsequently destring the variable separately (i.e., although the comma should separate decimals in this dataset, it is quite messy: I find that commas and dots are likely used interchangeably).

I hope someone can help!

Thanks a lot.
Sandra

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Manipulate complex long strings - Drop everything after word "hectares" and keep number of hectares
Manipulate complex long strings - Drop everything after word "hectares" and keep number of hectares

0 Response to Manipulate complex long strings - Drop everything after word "hectares" and keep number of hectares

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Manipulate complex long strings - Drop everything after word "hectares" and keep number of hectares Manipulate complex long strings - Drop everything after word "hectares" and keep number of hectares

Related Posts with Manipulate complex long strings - Drop everything after word "hectares" and keep number of hectares

0 Response to Manipulate complex long strings - Drop everything after word "hectares" and keep number of hectares

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Manipulate complex long strings - Drop everything after word "hectares" and keep number of hectares
Manipulate complex long strings - Drop everything after word "hectares" and keep number of hectares