Question: Merging observations in one variable while simultaneously summing the numbers of the related observations in another variable

Hello

I am new to this forum, so sorry if this question has been asked before (tried searching for it but could not find anything similar to my issue). I am currently writing my master thesis using insurance data and I'm having some difficulties using the relevant variables in the dataset because of how it is organized. What i would like to do is merge all the "insurance_policy_ID" observations that are duplicates into just 1 observation while at the same time summing the "insurance_premium_amount". I don't need the "coverage_ID" variable to be organized so that the final merged observations will display the "highest" coverage within the policy ID. If I just had a couple thousand observations, it would take time to merge them manually but it would be doable, but i have over 6 million observations (The dataset is from one of the biggest insurance companies in my country and its over a year of insurance data).

Simple illustration of how the data is displayed:

insurance_policy_ID	product_ID	coverage_ID	insurance_premium_amount
12345	Car	Sub coverage 1 (f.ex required)	100
12345	Car	Sub coverage 2 (f.ex partial coverage)	400
12345	Car	Sub coverage 3 (f.ex comprehensive coverage)	700
12345	Car	Sub coverage 4 (F.ex full comprehensive coverage ("top" insurance))	1000

How i would want it to look:

insurance_policy_ID	product_ID	coverage_ID	insurace_premium_amount
12345	Car	Sub coverage x (not important)	2200

Thanks in advance for any help!

Tor

BJ Data Tech Solution

0 Response to Question: Merging observations in one variable while simultaneously summing the numbers of the related observations in another variable

Post a Comment

Related Posts with Question: Merging observations in one variable while simultaneously summing the numbers of the related observations in another variable

0 Response to Question: Merging observations in one variable while simultaneously summing the numbers of the related observations in another variable

Post a Comment