Hi Statalist community,

Gig workers are individuals who have multiple employers and they can work for Uber, Lyft, or Doordash at the same time. I am trying to determine a gig worker's primary employer based on several rules.
  1. The primary employer is the one where the employee worked the most hours.
  2. If the most hours worked by an employee are equal across two or more employers, the primary employer is the one where the employee earned the most wages.
  3. If the most hours worked by an employee are equal across two or more employers, and most wages earned by an employee are equal across two or more employers, the primary employer is the one where the employee received the most benefits.
  4. If the most hours worked by an employee are equal across two or more employers, and most wages earned by an employee are equal across two or more employers, and the most benefits received by an employee are equal across two or more employers, the primary employer is based on the earliest quarter that an employee was employed at.
Below is a sample of my unbalanced panel dataset. I have the following variables:
  • worker_id - this is an employee id
  • employer_id - this is the employer's id
  • quarter - time is represented in quarters from 1-4
  • hours_worked - this is the number of hours worked in a quarter
  • wages- this is wages earned in a quarter
  • benefits - this is the benefits received in a quarter
Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(worker_id employer_id quarter hours_worked) int(wages benefits) byte primary_employer
1 10 1 20 1000 200 10
1 20 2 19 1200 100 10
2 40 1  8  500 200 40
2 20 1  8  400 350 40
2 30 2  5  300 200 40
3 10 2 10  200 100 20
3 20 2 10  200 400 20
3 30 2 10  200 300 20
3 40 2 10  200 100 20
4 40 2  2   50  25 30
4 10 3  5   90  50 30
4 20 4  5  100  50 30
4 30 1  5  100  50 30
end

I wrote the following loop but I get an error that says that I cannot combine if with by.


Code:
gen primary_employer=.

by worker_id: if max(hours_worked) {
        replace primary_employer=employer_id
    }
    else if max(wages) {
        replace primary_employer=employer_id
    }
    else if max(benefits) {
        replace primary_employer=employer_id
    }
    else if min(quarter) {
        replace primary_employer=employer_id
    }

In the dummy dataset, I manually created a variable called primary_employer which is what I am trying to accomplish. Does anyone know how to solve my problem? Thank you.