Hello all,

This is my first post but I've read up on the FAQ so I hope it will be acceptable. To give some context, I'm working with a dataset that records power outages. Each observation is a sensor that records when an outage begins and when it ends. Here is an example dataset:

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input int(outage_id site_id) long(outage_time restore_time) str24 sensor_id
1 14 1528913151 1528919452 "530039001351363038393739"
1 14 1528913153 1528919542 "200031000951343334363138"
1 19 1528913151 1528919423 "3b0045000151353432393339"
1 36 1528913152 1528935236 "2b004b001251363038393739"
1 36 1528913151 1528935235 "380025001451343334363036"
2 14 1529042683 1529047119 "530039001351363038393739"
2 16 1529042684 1529047117 "43005d000951343334363138"
2 17 1529042684 1529047119 "280021001251363038393739"
2 30 1529042675 1529061132 "48003c001151363038393739"
2 39 1529042682 1529061134 "560044000151353432393339"
2 44 1529042682 1529061134 "500030001951353339373130"
2 46 1529042683 1529061132 "2e001f001251363038393739"
2 46 1529042684 1529061134 "1e0036000951343334363138"
end
The outage is recorded by the sensor (i.e. 'sensor_id'). Each sensor is located at a site (i.e. 'site_id'). Each outage grouping has its own id (i.e. 'outage_id') and is defined as outages that occur around the same time (within 90 seconds of another sensor reporting an outage). The 'outage_time' and 'restore_time' variables record when the outage begins and ends, respectively. These variables will be converted to date-time variables at a later point.

My goal: create a new variable 'med_restore_time' that is the median restore time within each 'outage_id'. I'm using the egen function in Stata 17.0. Here is what I have tried:

Code:
    * begin by looking at what the median should be
    desc restore_time
    quietly sum restore_time if outage_id==1, d
    di %12.0g `r(p50)'
    quietly sum restore_time if outage_id==2, d
    di %12.0g `r(p50)'
    
    * try median using egen
    by outage_id: egen med_restore1 = median(restore_time)
    format %12.0g med_restore1
    desc med_restore1
    
    * now let's try using different storage types
    recast double restore_time
    by outage_id: egen double med_restore2 = median(restore_time) // specify type
    format %12.0g med_restore2
    desc med_restore2
As you can see, calculating the median with egen does not lead to the actual median. I thought it could have had something to do with variable types but that didn't seem to change anything. Why is this behavior happening with the egen function and how do I make it do what I want to do?

Best,
Adam