Hello

I am new to this forum, so sorry if this question has been asked before (tried searching for it but could not find anything similar to my issue). I am currently writing my master thesis using insurance data and I'm having some difficulties using the relevant variables in the dataset because of how it is organized. What i would like to do is merge all the "insurance_policy_ID" observations that are duplicates into just 1 observation while at the same time summing the "insurance_premium_amount". I don't need the "coverage_ID" variable to be organized so that the final merged observations will display the "highest" coverage within the policy ID. If I just had a couple thousand observations, it would take time to merge them manually but it would be doable, but i have over 6 million observations (The dataset is from one of the biggest insurance companies in my country and its over a year of insurance data).


Simple illustration of how the data is displayed:
insurance_policy_ID product_ID coverage_ID insurance_premium_amount
12345 Car Sub coverage 1 (f.ex required) 100
12345 Car Sub coverage 2 (f.ex partial coverage) 400
12345 Car Sub coverage 3 (f.ex comprehensive coverage) 700
12345 Car Sub coverage 4 (F.ex full comprehensive coverage ("top" insurance)) 1000

How i would want it to look:
insurance_policy_ID product_ID coverage_ID insurace_premium_amount
12345 Car Sub coverage x (not important) 2200

Thanks in advance for any help!

Tor