I’m using generate _n and getting a result I know is wrong, so I know I’m misunderstanding something. For example, I tried
sort ID v1
by ID v1: generate v2 = _n

v2 ends up with a count of the distinct number of observations in v1 for each ID, which is not what I’m trying to achieve.



I have 41,353 observations and 1000 variables, but here is enough information to show the problem. My data looks like this where ID is the person, v1 is the name of the firm, v2 is the number of different firms each person visits, and v3 is the name of the firm each person has visited (in which case persons who visit more than one firm are given a value labeled "Multiple Firms")
ID v1
1 1
2 1
2 1
3 2
3 2
4 2
4 3
5 1
5 2
5 3

I would like to create a variable called v2 that counts the number of different values each ID has for v1. For example
ID v1 v2
1 1 1
2 1 1
2 1 1
3 2 1
3 2 1
4 2 2
4 3 2
5 1 3
5 2 3
5 3 3

Anyone have ideas for how I might achieve this?

Ultimately, I will then use additional steps to create a third variable with the information from v2. My goal for the third variable is to be part of my wide shaped dataset where I only have one row for each ID. With this particular information, I want a table that shows the number of unique IDs that each v1 has. For example,

Narrow Dataset
ID v1 v2 v3
1 1 1 1
2 1 1 1
2 1 1 1
3 2 1 2
3 2 1 2
4 2 2 99999
4 3 2 99999
5 1 3 99999
5 2 3 99999
5 3 3 99999

Wide Dataset
ID v3
1 1
2 1
3 2
4 99999
5 99999


The table I would like to ultimately create
v3 frequency %
Firm 1 2 40%
Firm 2 1 20%
Multiple Firms 2 40%