Hi Statalisters,

I am a novice user in Stata and it's my first post. I'm working with Stata.14 and Windows 7.

I'm working on a Panel Data Set for all commerical banks (ID Variable) in the U.S. for the period Q4-1995 - Q4-2018 (time variable). So I have data on a bank-year level (I only use the Q4 data of each year). My goals are

1) to calculate four bank risk proxies
2) to show the correlations between all four risk proxies
3) and do finally two regeressions with the risk proxies (a binary probability model).

I have converted the string variables name and date to numeric variables name2 date2. I have replaced missing variables with 0 and have checked for other missing variables.
There are duplicates within my bank names, (because banks with the same name have several bank branches ) which I fixed with generating the ID variable id.
I tryed to use the dataex command to provide you a data sample, but I'm not sure if I used the command correctly, so here is a data example:
id name2 date2 asset lnatres ore
1 1st American State Bank of Minnesota Q4 1995 16050 16050 16050
2 1st Bank
Q4 1995
15908 16050 16050
3 1st Bank & Trust Q4 1995 12888 16050 16050
4 1st Bank of Troy Q4 1995 16050 16050 16050
5 1st Business Bank Q4 1995 16050 16050 16050
6 1st Constitution Bank Q4 1995 16050 16050 16050
7 1st Financial Bank South Dakota Q4 1995 16050 16050 16050
8 1st Floyd Bank Q4 1995 16050 16050 16050
9 1st National Bank Q4 1995
16050 16050 16050

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float id long name2 float date2 double asset long(lnatres ore)
 1  6 143   16050   103   37
 2  7 143   99230  1419    0
 3  8 143   54413   450    0
 4 11 143   14627   132    0
 5 13 143  922734  8995  692
 6 18 143  130484  1077    0
 7 23 143   77569   393    0
 8 28 143   40343   921    5
 9 30 143   26353   245    0
10 34 143   62315   207    0
11 37 143   38636   266  111
12 42 143   20637   114    0
13 43 143   20313   184    0
14 44 143 1736391 27470 1452
15 47 143   54356   257  270
end
format %tq!Qq-CCYY date2
label values name2 name2
label def name2 6 "1st American State Bank of Minnesota", modify
label def name2 7 "1st Bank", modify
label def name2 8 "1st Bank & Trust", modify
label def name2 11 "1st Bank of Troy", modify
label def name2 13 "1st Business Bank", modify
label def name2 18 "1st Choice Bank", modify
label def name2 23 "1st Constitution Bank", modify
label def name2 28 "1st Financial Bank South Dakota", modify
label def name2 30 "1st Floyd Bank", modify
label def name2 34 "1st National Bank", modify
label def name2 37 "1st National Community Bank", modify
label def name2 42 "1st Security Bank of Laurel", modify
label def name2 43 "1st Security Bank of West Yellowstone, Montana", modify
label def name2 44 "1st Source Bank", modify
label def name2 47 "1st State Bank and Trust Company of Palos Hills", modify
Code:
replace p3asset = 0 if (p3asset >= .)
Code:
mdesc
Code:
gen id = _n
Code:
sort id date2
Code:
xtset id date2
       panel variable:  id (weakly balanced)
        time variable:  date2, Q4-1995 to Q4-2018
                delta:  1 quarter
Code:
xtdescribe

      id:  1, 2, ..., 172431                                 n =     172431
   date2:  Q4-1995, Q4-1996, ..., Q4-2018                    T =         24
           Delta(date2) = 1 quarter
           Span(date2)  = 93 periods
           (id*date2 uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       1       1         1         1       1       1

     Freq.  Percent    Cum. |  Pattern*
 ---------------------------+--------------------------
     9940      5.76    5.76 |  1.......................
     9527      5.53   11.29 |  .1......................
     9142      5.30   16.59 |  ..1.....................
     8773      5.09   21.68 |  ...1....................
     8579      4.98   26.65 |  ....1...................
     8314      4.82   31.48 |  .....1..................
     8079      4.69   36.16 |  ......1.................
     7887      4.57   40.74 |  .......1................
     7769      4.51   45.24 |  ........1...............
    94421     54.76  100.00 | (other patterns)
 ---------------------------+--------------------------
   172431    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXX
 ------------------------------------------------------
 *Each column represents 4 periods.
A weakly balanced dataset arise, if each panel contains the same number of observations, but NOT the same time points. I do have this case, because I have a lot more observations in one time period.
Can you please help me, how to have a balanced panel? Does a weakly balanced panel has an influence on my correlation and regression results?

Thank you very much!

Katharina