Hi all,

I am quite stuck in plotting a proper box plot with STATA. The following is the plot resulting from
Code:
graph box avsales_new_no_outliers lagged_tot_sales
: Array .

Of course it is not informative. The summary statistics of the two variables look as follows:

Code:
. sum avsales_new_no_outliers lagged_tot_sales

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
avsales_ne~s |      2890     9118740    3.55e+07   11.22149   7.46e+08
lagged_tot~s |      9680    4.19e+08    2.40e+09    1.99628   5.03e+10
and the detailed:
Code:
                   avsales_new_no_outliers
-------------------------------------------------------------
      Percentiles      Smallest
 1%     44.37548       11.22149
 5%     292.2828       12.29292
10%     1510.966       13.19586       Obs                2890
25%        23579       16.25492       Sum of Wgt.        2890

50%     451135.3                      Mean            9118740
                        Largest       Std. Dev.      3.55e+07
75%      3227006       4.25e+08
90%     1.88e+07       4.34e+08       Variance       1.26e+15
95%     4.50e+07       5.36e+08       Skewness       9.474786
99%     1.49e+08       7.46e+08       Kurtosis       128.6959

                      lagged_tot_sales
-------------------------------------------------------------
      Percentiles      Smallest
 1%      41.2445        1.99628
 5%     639.6671       2.220227
10%     4347.424       2.519224       Obs                9680
25%     72393.32       3.017936       Sum of Wgt.        9680

50%      1134182                      Mean           4.19e+08
                        Largest       Std. Dev.      2.40e+09
75%     1.33e+07       3.88e+10
90%     1.89e+08       4.30e+10       Variance       5.76e+18
95%     1.08e+09       4.50e+10       Skewness       8.670407
99%     1.48e+10       5.03e+10       Kurtosis       98.53456
Now, I am trying with the logs but do not know if making the box plot of the logs is the right thing to do. The problem seems to be the high variability of lagged_tot_sales and the more observations of that variable w.r.t. avsales_new_no_outliers.
Have you got any idea of what's happening and what should I do?

Many thanks,

Federico