Hello

I noticed something which puzzled me when trying to convert a variable stored partly as negative numbers to a factor variable.

The variable is cohort90, which is recorded as years from 1990, so -6 equals 1984,-4 equals 1986 and so on.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float caseid byte(score cohort90)
  339 49 -6
  340 18 -6
  345 46 -6
  346 43 -6
  352 17 -6
  353 29 -6
  354 15 -6
  361 19 -6
  362 45 -6
  363 12 -6
 6824  0 -4
 6826  0 -4
 6827 20 -4
 6828 32 -4
 6829  0 -4
 6834 24 -4
 6836 23 -4
13206  7 -2
13209 38 -2
13215 46 -2
13217 28 -2
13218 32 -2
18681 36  0
18682 21  0
18685 26  0
18686 34  0
26586 25  6
26591 38  6
26594 27  6
26595 28  6
31001 40  8
31005 36  8
31009 39  8
31011 44  8
end
I've tried "manual' ways of generating a factor variable from cohort 90, which do work, such as

Code:
generate cohort90yr84 = cohort90==-6
generate cohort90yr86 = cohort90==-4
generate cohort90yr88 = cohort90==-2
generate cohort90yr90 = cohort90==0
generate cohort90yr96 = cohort90==6
generate cohort90yr98 = cohort90==8
or
Code:
 generate cohort=1 if cohort90==-6
replace cohort=2 if cohort90==-4
replace cohort=3 if cohort90==-2
replace cohort=4 if cohort90==0
replace cohort=5 if cohort90==6
replace cohort=6 if cohort90==8
However I was looking for a quicker way with less typing.

Code:
 quietly tabulate cohort90, generate(new_cohort)
works but you get 6 dummy variables with this. Using

Code:
 tostring cohort90, generate(another)
encode another,gen(another_1)
gives me what I want but the curious part of this I don't understand is that another_1 has the order of the factor variable changed:

Code:
  tab another_1

     Cohort |      Freq.     Percent        Cum.
------------+-----------------------------------
         -2 |      5,245       15.43       15.43
         -4 |      6,325       18.61       34.04
         -6 |      6,478       19.06       53.10
          0 |      4,371       12.86       65.96
          6 |      4,244       12.49       78.45
          8 |      7,325       21.55      100.00
------------+-----------------------------------
      Total |     33,988      100.00
So now 1988 is the base year instead of 1984 then 1986 then 1984.You have to be careful now when selecting your reference category when doing

Code:
regress score i.another_1
Not sure what I have done wrong here but it's probably how Stata interprets the negative numbers.

Regards

Chris