advice on using duplicates drop command

Hello,

I would like to make use of "duplicates drop" command to drop out certain rows,
For example, my original dataset as follows

Disno statea namea stateb nameb strtyr
173 365 RUS 2 USA 1958
173 710 CHN 2 USA 1958
608 365 RUS 2 USA 1958
608 365 RUS 2 USA 1958
1124 90 GUA 2 USA 1958
2187 731 PRK 2 USA 1958
2187 731 PRK 2 USA 1958
2187 731 PRK 2 USA 1958
2187 731 PRK 2 USA 1958
2215 365 RUS 2 USA 1958
2216 365 RUS 2 USA 1958
2857 339 ALB 2 USA 1958

I would like to remain the rows as below

Disno statea namea stateb nameb strtyr
173 365 RUS 2 USA 1958
173 710 CHN 2 USA 1958
608 365 RUS 2 USA 1958
1124 90 GUA 2 USA 1958
2187 731 PRK 2 USA 1958
2857 339 ALB 2 USA 1958

Although Disno 173 has a duplicate, since it has a different combination like statea 365 and state 710, I would like to remain these two rows.
My purpose is to remain Disno depending on a specific statea and stateb combination.

If I use "duplicates drop Disno, force", one of 173 would be dropped out.

Instead, I tried to make a distinct row identifier like "gen dyad_year=statea*1000000000+stateb*10000+ strtyr" and used
"duplicates drop dyad_year, force", but it dropped out many unwanted rows.

Are there any tips to make use duplicates more efficiently?

Thank you

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / advice on using duplicates drop command
advice on using duplicates drop command

0 Response to advice on using duplicates drop command

Post a Comment

Home / Data Cleaning / Data management / Data Processing / advice on using duplicates drop command advice on using duplicates drop command

Related Posts with advice on using duplicates drop command

0 Response to advice on using duplicates drop command

Post a Comment

Home / Data Cleaning / Data management / Data Processing / advice on using duplicates drop command
advice on using duplicates drop command