The objective of my analysis is calculate non-compliance to minimum wages at the industry-state-year (jst) level for a country over a period of time. For this purpose, I first estimate the kernel density curve of actual wages for each jst employing unit(factory)-level data on actual wages, denoted by w {ijst}. Next, I measure non-compliance by calculating the area under the density curve and to the left of the minimum wage corresponding to the same jst, denoted by wm{jst}. Note that the panel data is not uniform and most of the variables vary across the years including states, industries and factory units within industries. For this reason, I would need to use a loop to calculate violation for each jst with a single command.

Now there are two ways to measure the area under the curve. (i) One way is to estimate CDF. However, while the command for calculating CDF gives the cumulative density at each level of w{ijst}, it does not give the cumulative density at the level of wm{jst} as wm{jst} does not belong to the list of w{ijst}. This approach can be used if there is a way to calculate CDF at the level of wm{jst}. (ii) The second approach is to measure violation by integrating the kernel density function pertaining to each jst from limits 0 to wm{jst}. For this, I would require the exact functional form of the kernel density curve for each jst.

I want to figure out which approach is more feasible to be carried out on Stata along with the relevant codes and packages. Any input would be greatly appreciated.
Year State Code NIC-05 Factory_ID Actual Wage MW Violation
2010 Delhi 11042 xyz1 56 57 0.25
2010 Delhi 11042 xyz2 58 57 0.25
2010 Delhi 11042 xyz3 49 57 0.25
2010 Delhi 11042 xyz4 62 57 0.25
2010 Delhi 65093 pqr1 65 40 0.33
2010 Delhi 65093 pqr2 37 40 0.33
2010 Delhi 65093 pqr3 43 40 0.33
2010 Bihar 01611 abc1 71 70 0.40
2010 Bihar 01611 abc2 72 70 0.40
2010 Bihar 01611 abc3 68 70 0.40
2010 Bihar 01611 abc4 55 70 0.40
2010 Bihar 01611 abc5 76 70 0.40
2011 Assam 22201 lmn1 101 100 0.50
2011 Assam 22201 lmn2 95 100 0.50
2011 Assam 22201 lmn3 91 100 0.50
2011 Assam 22201 lmn4 105 100 0.50
2011 Delhi 11042 xyz1 55 60 0.40
2011 Delhi 11042 xyz2 62 60 0.40
2011 Delhi 11042 xyz4 65 60 0.40
2011 Delhi 11042 xyz5 71 60 0.40
2011 Delhi 11042 xyz6 58 60 0.40
2011 Bihar 01611 abc2 67 68 1.00
2011 Bihar 01611 abc3 66 68 1.00
2011 Bihar 01611 abc5 66 68 1.00
2011 Bihar 24105 def1 45 35 0.00
2011 Bihar 24105 def2 51 35 0.00
2011 Bihar 24105 def3 77 35 0.00
2011 Bihar 24105 def4 49 35 0.00
2011 Bihar 24105 def5 57 35 0.00
The table above is a sample created to demonstrate the panel structure as well as the lack of uniformity in the data. Note that the variable 'Violation' is yet to be calculated using the methodology (area under the KD curve and to the left of MW) stated above. This is how the data will look like post this calculation. Collapsing the data by jst will give me a single value of Violation for each jst, which is what I require.