
I noticed something which puzzled me when trying to convert a variable stored partly as negative numbers to a factor variable.

The variable is cohort90, which is recorded as years from 1990, so -6 equals 1984,-4 equals 1986 and so on.

* Example generated by -dataex-. To install: ssc install dataex
input float caseid byte(score cohort90)
  339 49 -6
  340 18 -6
  345 46 -6
  346 43 -6
  352 17 -6
  353 29 -6
  354 15 -6
  361 19 -6
  362 45 -6
  363 12 -6
 6824  0 -4
 6826  0 -4
 6827 20 -4
 6828 32 -4
 6829  0 -4
 6834 24 -4
 6836 23 -4
13206  7 -2
13209 38 -2
13215 46 -2
13217 28 -2
13218 32 -2
18681 36  0
18682 21  0
18685 26  0
18686 34  0
26586 25  6
26591 38  6
26594 27  6
26595 28  6
31001 40  8
31005 36  8
31009 39  8
31011 44  8
I've tried "manual' ways of generating a factor variable from cohort 90, which do work, such as

generate cohort90yr84 = cohort90==-6
generate cohort90yr86 = cohort90==-4
generate cohort90yr88 = cohort90==-2
generate cohort90yr90 = cohort90==0
generate cohort90yr96 = cohort90==6
generate cohort90yr98 = cohort90==8
 generate cohort=1 if cohort90==-6
replace cohort=2 if cohort90==-4
replace cohort=3 if cohort90==-2
replace cohort=4 if cohort90==0
replace cohort=5 if cohort90==6
replace cohort=6 if cohort90==8
However I was looking for a quicker way with less typing.

 quietly tabulate cohort90, generate(new_cohort)
works but you get 6 dummy variables with this. Using

 tostring cohort90, generate(another)
encode another,gen(another_1)
gives me what I want but the curious part of this I don't understand is that another_1 has the order of the factor variable changed:

  tab another_1

     Cohort |      Freq.     Percent        Cum.
         -2 |      5,245       15.43       15.43
         -4 |      6,325       18.61       34.04
         -6 |      6,478       19.06       53.10
          0 |      4,371       12.86       65.96
          6 |      4,244       12.49       78.45
          8 |      7,325       21.55      100.00
      Total |     33,988      100.00
So now 1988 is the base year instead of 1984 then 1986 then 1984.You have to be careful now when selecting your reference category when doing

regress score i.another_1
Not sure what I have done wrong here but it's probably how Stata interprets the negative numbers.

