Dear Statalist,

The following shows a snippet of the dataset used:

HTML Code:
compid    compname    industry_classification    industryid    compid_in_industry    sum_count_compid    abnormal_msreturn    abnormal_annual_volatility    abnormal_volume    msreturn    cv_ptb    cv_lev
ym    compid    compname    industry_classification    industryid    compid_in_industry    sum_count_compid    abnormal_msreturn    abnormal_annual_volatility    abnormal_volume    msreturn    cv_ptb    cv_lev
2022m11    18    Autoneum Holding Ag    Automobiles & Auto Parts    2    1    4    .2524526    -.0041824    -3495.858    .2121214    .2945096    1.783716
2022m12    18    Autoneum Holding Ag    Automobiles & Auto Parts    2    1    4    -.0972508    .0076708    -5521.412    -.1206896    .3644247    1.783716
2023m1    18    Autoneum Holding Ag    Automobiles & Auto Parts    2    1    4    .2284532    -.0018746    1843.499    .1960784    .3242081    1.783716
2023m2    18    Autoneum Holding Ag    Automobiles & Auto Parts    2    1    4    .1438912    -.0053739    -1763.732    .1180329    .3761804    1.783716
2017m5    63    Daetwyler Holding Ag    Automobiles & Auto Parts    2    2    4    -.0426654    .0043835    1811.052    -.0225652    2.558055    .2189781
2017m6    63    Daetwyler Holding Ag    Automobiles & Auto Parts    2    2    4    -.0323213    .0015906    -470.6808    -.0115432    2.496325    .2189781
2017m7    63    Daetwyler Holding Ag    Automobiles & Auto Parts    2    2    4    -.0442455    -.0005095    -8111.853    -.0282728    2.314436    .2013656
2017m8    63    Daetwyler Holding Ag    Automobiles & Auto Parts    2    2    4    -.061213    .0040422    -2662.163    -.0493361    2.239011    .2013656
The intention is to do a regression analysis where:
- the dependent variable(s) is(are) abnormal_msreutrn, or abnormal_annual_volatility, or abnormal_volume
- the independent variables are msreturn, cv_ptb, cv_lev
- additionally include firm- and year-fixed-effects based on industry.

In order to apply the analysis, the dataset needs to be extended by all possible firmpair combinations for each specific industry.
Example given: If industryid == 2, the following firmpairs need to be generated: (1,2), (1,3), (1,4), (2,1), (3,1), (4,1). Firmpair (1,1) should be neglected. Please note, that the numbers mentioned in brackets address the variable compid_in_industry.

Explanation of the variables prevalent in the dataset:
  • compid: unique company identifier over the entire dataset
  • industryid: unique industry identifier over the entire dataset
  • compid_in_industry: unique company identifier in each industry
  • sum_count_compid: the sum of distinct companies prevalent in the respecting industry
By already reading several posts about generating all possible pairs based on group and id, I could not manage to handle it for my dataset.
How can the expected dataset be achieved?

Best,
Rob