Hi, I have a panel data (large n and small T) with -id- as the panel variable and -year- as the time variable. The year variable ranges from 2000-2003 and id captures the number of firms.

The id variable is coded as follows:

--id-- --year--
1232 2000
1232 2001
1232 2002
1232 2003

1234 2000
1234 2001
1234 2002
1234 2003

When I summarise the dataset, minimum for id is 1 and max is 20,236. However, the id variable is not equally spaced so I cannot take 20,236 as the total number of firms in the data.

I need to find the number of firms in the dataset based on the id variable. How to proceed?