Hello,
I'm working with a panel that has a variable that identifies the companies. I used encode to change the string variable into a categorical, however I'm struggling to handle the data now because I'm not being able to quickly identify what value is assigned to each company. I know that I can use br to check manually what value corresponds to each category, but I have 75 different companies and I need a more efficient way to check this. When I use the command 'tab' it displays all the categories but not the value assigned to them.
For example, I need to drop the observations from 11 companies. Is there any way to drop them using the tag instead of the numerical value assigned to these companies?
My dofile is the following:
clear
import excel "/Users/nicolasmorales/Downloads/Base Actualizada Entidades.xlsx", sheet("Base Consolidada") cellrange(A1:G19893) firstrow case(lower)
encode company, gen(company_temp)
drop company
rename company_temp company
sort company date
duplicates report company date
duplicates tag company date, gen(tag)
tab tag
drop if tag>0
* now, I need to drop the observations for 11 companies that are outliers. Is there an efficient way to do so?
tab entidad
*this displays the companies that I have, but not the values assigned to them in the categorical value
Thanks for the assistance
The dataset looks like this
company -- v1 -- v2 -- v3 -- ... -- vk date
Company_a -- 1 -- 2 -- 1 -- ... -- 1 2000
Company_a -- 0 -- 2 -- 1 -- ... -- 1 2001
Company B -- 0 -- 4 -- 2 -- ... -- 5 2000
....
Company Z -- ................................1 2019
There are 70 categories in the variable company and the dataset has 10,000 observations and information from the year 2000 to 2019.
Thanks for your assistance
Related Posts with Working with a categorical variable with many categories in a panel
Error with rename using local macrosI importing many excel files with long variables names and lots of space. The truncated variable nam…
Problems trying to estimate a dose response function (doseresponse2)Hi all. I have a panel dataset with socio-economic and political information on Brazilian municipali…
Aggregating monthly time series data into a yearly time series, and calculating the product of probabilityHello all, I would like to calculate the product of probabilities of annual coup risk based on the…
How to calculate the frequency of one variable occurring in the other?Hello everyone, I want to calculate the percentage of migrants in each district. I tried the codes …
Additive interaction with conditional logistic regressionDear Statalist, I am analyzing data from a case-control study using conditional logistic regression…
Subscribe to:
Post Comments (Atom)
0 Response to Working with a categorical variable with many categories in a panel
Post a Comment