Dear Stata community,

I am new to the forum and to Stata, too. I am trying to calculate the HHI (Herfindahl-Hirschman Index) in my panel data set in STATA 16.1. To give you an idea how my data structure with the relevant variables looks like, please see the following example values and variables:

The variables for the firms is gvkey, sale is the net sales of the firm, sic is the 4 digit SIC code and fyear is the fiscal year.


[CODE]

* Example generated by -dataex-. To install: ssc install dataex

clear

input str6 gvkey double fyear str16 sic double sale

"001001" 1984 "5812" 32.007

"001001" 1985 "5812" 53.798

"001003" 1983 "5712" 13.793

"001003" 1984 "5712" 13.829

"001003" 1986 "5712" 36.308

"001003" 1987 "5712" 37.356

"001003" 1988 "5712" 32.808

"001004" 1980 "5080" 132.482

"001004" 1981 "5080" 175.924

"001004" 1982 "5080" 155.006

"001004" 1983 "5080" 177.762

"001004" 1984 "5080" 218.946

"001004" 1985 "5080" 248.012

"001004" 1986 "5080" 298.192

"001004" 1987 "5080" 347.64

"001004" 1988 "5080" 406.36

"001004" 1989 "5080" 444.875

"001004" 1990 "5080" 466.542

"001004" 1991 "5080" 422.657

"001004" 1992 "5080" 382.78

"001004" 1993 "5080" 407.754

"001005" 1980 "3724" 23.382

"001005" 1981 "3724" 35.921

"001007" 1980 "3652" 9.262

"001007" 1981 "3652" 7.261

"001007" 1982 "3652" 4.993

"001007" 1983 "3652" 3.839

"001008" 1985 "3577" .705

"001009" 1982 "3460" 36.01

"001009" 1983 "3460" 18.753

"001009" 1984 "3460" 21.019

"001009" 1985 "3460" 20.507

"001009" 1986 "3460" 19.266

"001009" 1987 "3460" 19.55

"001009" 1988 "3460" 28.419




I started with the following code:


ssc install hhi


hhi sale, by (sic fyear)

sort sic fyear




However, I got the error „Negative values in varlist“. Therefore I changed the code to:


ssc install hhi


replace sale = . if sale < 0

drop if sale == .

hhi sale, by (sic fyear)

sort sic fyear




The code runs and I get the following result:


[CODE]

* Example generated by -dataex-. To install: ssc install dataex

clear

input str6 gvkey double fyear str16 sic double sale float hhi_sale

"008596" 1980 "0100" 405.884 .440715

"009391" 1980 "0100" 5.073 .440715

"010390" 1980 "0100" 25.022 .440715

"010884" 1980 "0100" 3762.579 .440715

"002099" 1980 "0100" 87.565 .440715

"001266" 1980 "0100" 12.517 .440715

"010971" 1980 "0100" 165.766 .440715

"002812" 1980 "0100" 1733.501 .440715

"010802" 1980 "0100" 79.928 .440715

"010884" 1981 "0100" 4058.385 .6711329

"010971" 1981 "0100" 247.284 .6711329

"009391" 1981 "0100" 5.478 .6711329

"010802" 1981 "0100" 76.542 .6711329

"001266" 1981 "0100" 12.346 .6711329

"002099" 1981 "0100" 100.759 .6711329

"010390" 1981 "0100" 20.977 .6711329

"008596" 1981 "0100" 477.995 .6711329

"001266" 1982 "0100" 9.955 .4613737

"002099" 1982 "0100" 110.442 .4613737

"010802" 1982 "0100" 78.755 .4613737

"002812" 1982 "0100" 1823.232 .4613737

"010390" 1982 "0100" 20.854 .4613737

"008596" 1982 "0100" 557.398 .4613737

"009391" 1982 "0100" 5.596 .4613737

"009062" 1982 "0100" 4.527 .4613737

"010971" 1982 "0100" 222.384 .4613737

"010390" 1983 "0100" 23.553 .4112865

"010884" 1983 "0100" 3360.441 .4112865

"006275" 1983 "0100" .424 .4112865

"008596" 1983 "0100" 505.434 .4112865

"001266" 1983 "0100" 8.877 .4112865

"009062" 1983 "0100" 4.496 .4112865

"002812" 1983 "0100" 1551.725 .4112865

"010971" 1983 "0100" 274.672 .4112865

"009391" 1983 "0100" 2.908 .4112865

"002099" 1983 "0100" 111.037 .4112865

"003702" 1984 "0100" 1.731 .40012285

"010390" 1984 "0100" 22.14 .40012285

"010971" 1984 "0100" 258.357 .40012285

"002812" 1984 "0100" 1520.088 .40012285


I am really unsure if my code is correct and if I really get the result I am looking for. Could someone of you have a look and check if the procedure is right?

Why do I write the code hhi sale, by (sic fyear) and not only hhi sale, by (sic)? Meaning why do I include the fyear in my HHI? And what does my result including the fyear is telling me now?

I would really appreciate your help as I am quite lost.

Thank you already in advance.

Shakira