I have data that resembles the following:
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int var1 str6 var2 str7 var3 1 "" "D01T99" 2 "D01T99" "D01T03" 3 "D01T03" "D01T02" 4 "D01T02" "D01" 5 "D01T02" "D02" 6 "D01T03" "D03" 7 "D01T99" "D05T09" 8 "D05T09" "D05T06" 9 "D05T09" "D07T08" 10 "D05T09" "D09" 11 "D01T99" "D10T33" 12 "D10T33" "D10T12" 13 "D10T12" "D10T11" 14 "D10T11" "D10" 15 "D10T11" "D11" 16 "D10T12" "D12" 17 "D10T33" "D13T15" 18 "D13T15" "D13T14" 19 "D13T14" "D13" 20 "D13T14" "D14" 21 "D13T15" "D15" 22 "D10T33" "D16T18" 23 "D16T18" "D16" 24 "D16T18" "D17" 25 "D16T18" "D18" 26 "D10T33" "D19T23" 27 "D19T23" "D19" 28 "D19T23" "D20T21" 29 "D20T21" "D20" 30 "D20T21" "D21" 31 "D19T23" "D22T23" 32 "D22T23" "D22" 33 "D22T23" "D23" 34 "D10T33" "D24T25" 35 "D24T25" "D24" 36 "D24" "D241T31" 37 "D24" "D242T32" 38 "D24T25" "D25" 39 "D25" "D252" 40 "D25" "D25X" 41 "D10T33" "D26T28" 42 "D26T28" "D26T27" 43 "D26T27" "D26" 44 "D26T27" "D27" 45 "D26T28" "D28" 46 "D10T33" "D29T30" 47 "D29T30" "D29" 48 "D29T30" "D30" 49 "D30" "D301" 50 "D30" "D303" 51 "D30" "D304" 52 "D30" "D302A9" 53 "D10T33" "D31T33" 54 "D31T33" "D31T32" 55 "D31T32" "D325" 56 "D31T33" "D33" 57 "D01T99" "D35T39" 58 "D35T39" "D35" 59 "D35T39" "D36T39" 60 "D36T39" "D36" 61 "D36T39" "D37T39" 62 "D01T99" "D41T43" 63 "D01T99" "D45T56" 64 "D45T56" "D45T47" 65 "D45T47" "D45" 66 "D45T47" "D46" 67 "D45T47" "D47" 68 "D45T56" "D49T53" 69 "D49T53" "D49" 70 "D49T53" "D50" 71 "D49T53" "D51" 72 "D49T53" "D52" 73 "D49T53" "D53" 74 "D45T56" "D55T56" 75 "D01T99" "D58T63" 76 "D58T63" "D58T60" 77 "D58T60" "D58" 78 "D58" "D581" 79 "D58" "D582" 80 "D58T60" "D59T60" 81 "D58T63" "D61" 82 "D58T63" "D62T63" 83 "D62T63" "D62" 84 "D62T63" "D63" 85 "D01T99" "D64T66" 86 "D64T66" "D64" 87 "D64T66" "D65" 88 "D64T66" "D66" 89 "D01T99" "D68T82" 90 "D68T82" "D68" 91 "D68T82" "D69T82" 92 "D69T82" "D69T75" 93 "D69T75" "D69T71" 94 "D69T71" "D69T70" 95 "D69T70" "D69" 96 "D69T70" "D70" 97 "D69T71" "D71" 98 "D69T75" "D72" 99 "D69T75" "D73T75" 100 "D73T75" "D73" end
In the above, column 3 is of interest to me. This column contains relevant industry codes that I need. There are two types of values of this variable, "var3". In particular, they are either of length 3 or higher. The ones of length three are what I am after, as they are specific, for instance, D69 corresponds to industry 69. On the other hand, length greater than three, such as D01T03, correspond to aggregated values, which in this case, corresponds to industry 1 to 3. I want to:
1. Do away with those variable values that are greater than length three.
2. Therafter, remove the "D" character at the beginning.
Number 2. is a straightforward application of the substr(.) function, but I cannot implement it without first removing those values that are greater than length three. Any help on this is much appreciated!
Thanks,
CS
0 Response to Selecting the last few letters, when substr(.) is not appropriate
Post a Comment