Saturday, November 28, 2020

Cumulative frequency of unique values within groups

Hello statalist,

My dataset is an unbalanced time series (with repeated years). Each company can have several classifications per year, ranging from 1 to * (i.e., there can be hundreds). Below is a snapshot of the dataset for 1 company with classifications A, B, C

company year classification
1 1985 A
1 1991 A
1 1993 B
1 1993 B
1 1993 B
1 1994 A
1 1994 A
1 1994 B
1 1994 B
1 1994 B
1 1994 C
1 1994 C
1 1994 C
1 1994 C
1 1994 C

I would like to create cumulative frequency for the classifications by company and year, however, I would like the count to only sum the same values of classification.

I started by creating a frequency var:
. bysort company year classification: gen freq = _N
Also created a tag:
. egen tag = tag(company year classification)

company Year classification freq tag
1 1985 A 1 1
1 1991 A 1 1
1 1993 B 3 1
1 1993 B 3 0
1 1993 B 3 0
1 1994 A 2 1
1 1994 A 2 0
1 1994 B 3 1
1 1994 B 3 0
1 1994 B 3 0
1 1994 C 5 1
1 1994 C 5 0
1 1994 C 5 0
1 1994 C 5 0
1 1994 C 5 0

Now, summing the freq var by(company year classification) with egen does not produce the cumulative frequency I'm looking for. I would like the cumulative count to refer to the same type of class only. If I call this var 'cum', then this is what I am looking for:

company Year classification freq tag cum
1 1985 A 1 1 1
1 1991 A 1 1 2
1 1993 B 3 1 3
1 1993 B 3 0 3
1 1993 B 3 0 3
1 1994 A 2 1 4
1 1994 A 2 0 4
1 1994 B 3 1 6
1 1994 B 3 0 6
1 1994 B 3 0 6
1 1994 C 5 1 5
1 1994 C 5 0 5
1 1994 C 5 0 5
1 1994 C 5 0 5
1 1994 C 5 0 5

How can I create such cumulative var?

Thank you.

No comments:

Post a Comment