Saturday, August 31, 2019

How to use CPI or inflation indices to convert nominal prices with DAILY frequency?

I'm working with some raw financial data from a local business covering about 4 years from 2015-2019. I haven't worked with a daily delta very often but noticed CPI and all other related indices don't provide daily. What's the best way of going about this, without converting it all to a lower frequency? Is it possible? It's not a terribly long range so it's possible I can run the analysis without discounting anything. I'll be running a regression afterwards.

I appreciate any help.

LD calculation by logit regression

Hello,

I would like to know how I can calculate the LD50, LD90 and LD 95 (lethal dose) for bioassays using logit regression?

I found probit regression commands, however I would like to apply the formula to STATA programming, please could someone help me?

1) For LD50 in probit and logit regression the formula is the same:

local LD50 LD50: exp (-_ b [_cons] / _ b [logdose])

2) However, for the other LD there is a change:

I found the links described at the end for the probit model as follows:

Probit: LDp = (invnormal (p) -constant) / coef

STATA command:

local ld90 LD90: exp ((invnorm (0.9) -_ b [_cons]) / _ b [logdose])

However, for the logit model the formula recommends is:

Logit: LDp = (log (p / (1-p)) - constant) / coef

How to program the above command in stata for Logit regression?

Another question is about normality: If the distribution is not normal, should I calculate LD from the Logit regression? I have read in some places that if the distribution is not parametric, logit regression should be applied for LD.
Would you have another model suggestion for calculating LD with non-normal distribution?

Thanks

LINKS:

https://www.stata.com/statalist/arch.../msg00184.html
https://www.statalist.org/forums/for...r-logit-probit
https://stats.idre.ucla.edu/other/mu...git-or-probit/

How to solve the problem of endogeneity of a policy (gravity model)?

Hi everyone,
I analyze the impact of an agreement concluded between 4 countries on the mobility of people in these 4 countries. The agreement came into force and the same time for the 4 countries in 1995. In addition to these 4 countries, I added another 15 countries as a control group. I use a matrix of 19 countries of origin and 19 countries of destination (matrix 19 * 19).
The database covers the periods from 1990 to 2000.

lnN(ijt) : number of people from country i who go to country j to time t
lngdp(it) : log of gdp (constant 2000 US$) in the country of origin
lngdp(jt): log of gdp (constant 2000 US$) in the country of destination
lndist(ij): Log of distance between origin and destination countries
contig(ij): common border between origin and destination countries
lang(ij): common language in origin and destination countries
lnstock1990(ij): stock of migrants from country i and country j at year 1990
bothcountry1995 = 1 if i and j are both members of agreement at time t and 0 otherwise
onecountry1995 = 1 if the country (i) belongs to agreement but the country (j) does not
l(it): time-varying origin-country dummies which control for multilateral resistances
l(jt): time-varying destination-country dummies that account for multilateral resistances
l(ij): captures the country-pair fixed effects

The augmented gravity equation is:
LnN(ijt) = bo + b1lngdp(it) + b2lngdp(jt) + b3lndist(ij) + b4conti(ij) + b5lang(ij)+ b6stock(ij) +b7bothcountry1995(ijt) + b8onecountry1995(ijt) + e(ijt)

The variables created in stata are the following:
gen logN_ij=log(N_ij)
gen loggdp_i=log(gdp_i)
gen loggdp_j=log(gdp_j)
gen logdist_ij=log(dist_ij)
gen logstock1990_ij=log(stock1990_ij)
egen orig_time= group(origin year)
tabulate orig_time, generate(ORIGIN_TIME_FE)
egen dest_time = group(destination year)
tabulate dest_time, generate(DESTINATION_TIME_FE)
egen pair_id = group(origin destination)
tabulate pair_id, generate(PAIR_FE)
egen id = group(origin destination)
tsset id year

1) To solve the problem of endogeneity of the agreement, should I include time-varying origin-country dummies which control for multilateral resistances, time-varying destination-country dummies that account for multilateral resistances and the country-pair fixed effects ? or , should I include country-pair fixed effects and year effects ?
2) Do I use vce (robust) or cluster (pair_id)?

xi:xtreg logN_ij
loggdp_i
loggdp_j
logstock1990_ij logdist_ij lang_ij contig_ij bothcountry1995 onecountry1995 ORIGIN_TIME_FE* DESTINATION_TIME_FE*, fe vce (robust)

reg logN_ij bothcountry1995 onecountry1995 PAIR_FE* ORIGIN_TIME_FE* DESTINATION_TIME_FE*, cluster(pair_id)

xi:xtreg logN_ij
loggdp_i
loggdp_j
logstock1990_ij logdist_ij lang_ij contig_ij bothcountry1995 onecountry1995 i.year, fe vce (robust) (country-pair fixed effects and year effects)

reg logN_ij bothcountry1995 onecountry1995 PAIR_FE* i.year, cluster(pair_id) (country-pair fixed effects and year effects)

Thank you so much for your help in advance

Matching variables in CEM.

Hi all - I am using CEM in stata and have a quick question. I am using four variables in my dataset for matching and using CEM weights in the regression model as control. When I include the four matching variables in the regression as well, I find that the two of the variables are significant. Is that odd that these variables are significant even though they were used for matching/balancing? Let me know what could be going on or if you need more details. Thanks, Rajat.

Checking Residuals after running Mswitch

I am working on the Markov Switching Model (using mswitch command), I have calculated standardized residuals. How can I calculate p values of Normality, ARCH(1) and Autocorrelation of standardized residuals in Stata to compare different models?

Construct matrix counting pairwise agreements

I'm interested in a set of problems in which the data from each individual is a vector of integer-valued responses, and I want to know the number of times that each distinct pair of individuals gives responses that agree with each other. Thus, I want an agreement matrix A, where A[i,j] contains the number of responses on which subject i agrees with (is identical to) subject j. It's like a similarity matrix, where similarity is the number of agreements across the response vector. I could not figure out a way to do this with - matrix dissimilarity-, which seems to implement a matching option only for binary data. I found no reasonable way to create this agreement matrix in Stata per se, and I need to work with it in Mata anyway.

The Mata code below is the best I came up with for speed. I'm interested in any suggestions or critique. Avoiding looping over measurements certainly helps, and it would be nice to avoid explicitly looping over observations as well.

Code:
// Simulate data
set seed 1243
mata mata clear
local nmeas = 15   // number of measurements
local top = 4      // measurements go from 1, ... top
local nobs = 500
local nreps = 50   // for timing only
mata: X = ceil(`top ' * runiform(`nobs', `nmeas'))
//
// Count up agreements
mata:
N = rows(X)
A = J(N, N, .)
timer_clear(1)
timer_on(1)
for (reps = 1; reps <= `nreps' ; reps++) { // outer loop just for timing
   for (i = 1; i <= N - 1; i++) {
       for (j = i+1; j <= N; j++) {
          A[i,j] = rowsum(X[i,.] :== X[j,.])  // avoid loop over measurements
       }
   }
}    
timer_off(1)
timer()
end

Rolling standard dev

I have an unbalanced panel dataset with many (15000+) firms and their daily stock returns from 1960-2018.
From these daily stock returns I want to calculate the one-year "rolling standard deviation" (according to the paper I'm following).
I have managed to come up with the command using -rangestat- and I'm just wondering if I'm doing it correctly as there is no other way for me to check whether the resulting standard deviations from rangestat are correct.

I have also noticed a couple of things while using -rangestat-:

I use the count option to see how many observations it uses and while sometimes it uses 1 observation (this is fine), in other times it uses 2 observations which I'm not sure why it's doing that since I want to use one day returns (the previous day).

Also the standard deviation is sometimes 0 and other times " . " (missing value), I wonder why this is and how I should interpret the difference between these 2 in the output of the rangestat command?

I will post below an example of the data I'm using, but since there is so many observations I cannot post a completely representative sample here. So I will just go with the returns of 1 company (Apple):



Code:
* Example generated by -dataex-. To install: ssc install dataex

clear

input long date str8 ticker str36 comnam double permco float Returns

8417 "AAPL" "APPLE COMPUTER INC" 7    .03319269

8418 "AAPL" "APPLE COMPUTER INC" 7  -.022553137

8419 "AAPL" "APPLE COMPUTER INC" 7   .007132721

8420 "AAPL" "APPLE COMPUTER INC" 7     .1054022

8421 "AAPL" "APPLE COMPUTER INC" 7      -.00033

8424 "AAPL" "APPLE COMPUTER INC" 7    -.0588665

8425 "AAPL" "APPLE COMPUTER INC" 7    .03793554

8426 "AAPL" "APPLE COMPUTER INC" 7    .03980917

8427 "AAPL" "APPLE COMPUTER INC" 7    .06625561

8428 "AAPL" "APPLE COMPUTER INC" 7   .005786227

8431 "AAPL" "APPLE COMPUTER INC" 7 -.0033834374

8432 "AAPL" "APPLE COMPUTER INC" 7    .02086082

8433 "AAPL" "APPLE COMPUTER INC" 7   .026269455

8434 "AAPL" "APPLE COMPUTER INC" 7    .03968534

8435 "AAPL" "APPLE COMPUTER INC" 7  -.014424606

8438 "AAPL" "APPLE COMPUTER INC" 7   -.04090528

8439 "AAPL" "APPLE COMPUTER INC" 7  -.009235363

end

format %d date


note: permco is simply a unique identifier that each company in the dataset has, a unique number per company.

I then use the following command:

rangestat (sd) Returns (count) Returns, interval (date -1 0) by (permco)

Thanks in advance

Dropping all panel observations if any one of them have missing value for any one of the variables

Hi there,

I have a firm-level data for two years. I want to drop a firm entirely (both years' observations) if there is missing value in any one year for any one of the three variables, var1, var2, and var3.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id year var1 var2 var3)
1 1 17 1 3
1 2 19 1 3
2 1 20 0 3
2 2 35 1 .
3 1  . 0 1
3 2 18 0 5
4 1 19 1 4
4 2 14 0 2
end
In this case, firm 2 and 3 will be removed entirely (all four lines of data), and only the data for firm 1 and 4 will remain.

I was going for something like:

Code:
foreach var in var1 var2 var3 {
   drop _n _n+1 if missing(`v') & year==1
   drop _n _n-1 if missing (`v') & year==2 
}
Obviously, this is flawed. Any suggestions would be highly appreciated, thanks!

Creating Mills Ratios

Hi all,

This is my first post here so I apologise in advance if I've broken any rules, and I also apologise if this is a very simple question, as I am new to Stata.

To give some background, I am attempting to see whether or not alcohol consumption effects the level of income one earns. Observations have been split into 3 groups within one variable: Heavy drinkers, moderate drinkers and non-drinkers.

My plan is to first run a multinomial logit with the drinking variable as the dependent variable and moderate drinker as the base. I am doing this in order to obtain the mills ratios for each of the types of drinker, which will then be used in 3 OLS regressions with log income as the dependent variable in order to correct for selection bias.

My question is how do I obtain these Mills ratios?

How to create variable of relative change?

Dear experts,

I have a dataset including monthly observations from June 2017 - June 2019, where I would like to calculate the 12-month relative change in price over a year for a specific product (i.e. pit=(pit/pit-12)). Whit the first possible number to get will then be June 2018.

Variable y_m is year and month such as 1804 corresponds to April 2018.

Any good advice on how to do this in STATA?

This is how the data looks like:

Array


Thank you,
Sofie

Annualising coefficients in a loop

Hi all,
I am currently working on project about volatility timing.
The data I am working with is daily and I am not entirely sure what I can do to annualise the coefficients from the loop. The coefficients I need to annualise are alpha(constant) and beta and if possible the t-stat from alpha.

foreach var of varlist United_States-Global_Ex_US{
foreach var2 of varlist vUnited_States-vGlobal_Ex_US{
if "v`var'"=="`var2'"{
asdoc reg `var2' `var', robust nest save(volatilityI) dec(2)

}
}
}

Any help would be appreciated.

Thanks,
Marc

extract table from multiple pages of the web by Stata ?

Hi ,
I'm now trying to extract data from web " https://ift.tt/2HE5woy" , there are a lot of pages in web address how can I import that tables of data into stata because there are 30956 observations in the table format in this web.
Currently, I am using the following, but not effective, please help me efficient ways

readhtmltable https://cdd.drdmyanmar.org/en/vdp-files

Binary Variable Graph

Hello all,

I simply want to plot a graph with the number of times a binary variable (civil war incidence) = 1 for each year on the y-axis and the year on the x-axis, with five plots indicating five different regions.
Help would be much appreciated.

Thanks,
Rishi

Inclusion of interaction terms and interpretation of margins output in RE xtprobit

Hi all,

I am conducting a study into the transmission of time preferences in spouses through their smoking behaviours. I have a balanced dataset where N=1464 and T=3. I am using a random effects probit model with errors clustered at their personal ID numbers.

The model is as follows:

xtprobit smokes_f c.sctimeimpulse_f i.smokes_m_1##c.sctimeimpulse_m c.age_f c.age_m i.employment_f i.employment_m i.good_ghealth_f i.reduc_f i.reduc_m c.rellength i.child_under15 c.loghhincome i.rural i.deprivation_scale i.state##i.year, re vce(cluster pid)

sctimeimpulse_f = Female time preference measure
sctimeimpulse_m = Male time preference measure
smokes_f = dummy variable of female's smoking status at t
smokes_m_1 = dummy variable of male's smoking status t-1


I have 2 problems which I would appreciate advice upon. I have attached the STATA outputs within a notepad file since the line size was too large.

Model 1 double interaction.txt

1) I know that there is collinearity between smokes_m and sctimeimpulse_m, yet what I am interested in is whether the likelihood of a female smoking given the partner smokes is increased by his time preference rate. Would I be making a mistake to include sctimeimpulse_m as an independent variable as well as being in the interaction, is it more correct to not do so due to the collinearity? As an independent variable, sctimeimpulse_m is insignificant.
I perform the margins command:


margins r.smokes_m_1, dydx(sctimeimpulse_m) atmeans

The result is insignificant also yet the interaction term is significant in the xtprobit output. Why is this if in the xtprobit output the interaction is significant?

2) I performed margins, dydx(*) out of general interest for the other variables in the model. The p values change a fair bit from the xtprobit output. Why is this? Which output should I refer to for the general significance of the variables: xtprobit or margins?


How to compare two datasets

Hi,
I want to verify / confirm that two datasets are identical. I reproduced my dataset from my colleague's work and want to compare to her dataset. What is one of the ways I could do that? Thanks.

Is there a way to get ey/ex marginal effects after xtreg command with fixed effects?

Hello all,

I would like to see the proportional marginal effects (i.e.: ey/ex) after a regression with fixed effects. There is always an error message which says:

"could not calculate numerical derivatives -- discontinuous region with missing values
encountered r(459);"

I am able to attain dy/ex and dy/dx but because of the variable measurements, I need to see ey/ex to be consistent. ey/ex is possible with random effects, however, they are not appropriate in my model.

Is there a way to calculate ey/ex?

Thank you,
Karen.

Testing for equivalence of marginal effects of the same variable at different '_at'

Hello all,

I am trying to test for the equivalence of marginal effects of the same variable at two different indicator variable set points after a regression with an interaction term. Here is the jist of my commands:

reg Y c.X1 i.dummy c.X1#i.dummy
margins dy/dx(X1) at(dummy=0 dummy=1) post

Then I would like to test for whether the marginal effect of X1 at dummy=0 is different than the marginal effect of X1 at dummy=1. I am having trouble with referring to the same variable within the same test command. i.e.: test X1=X1. Thank you in advance for what I am sure is a very simple solution.

Karen.

Instrument variable TSLS

Dear members,

i am just confused about one thing. i want to use lag dependent and lag independent variables as an instruments in IV regression (two stage least square). can i use these instruments or i need to find instrument variable other than lag variables. Kindly give your valuable comments.

Regards,
Rabiya

"By" and "Margins" command error; alternative approach?

Hi Everyone,

I am writing a thesis for my Honours year (Advanced Undergraduate Degree) and I am writing it on adolescent drinking levels with an excellent panel dataset. However, the panel data is split into two "cohorts" for example, it is set up as follows:

ID Wave Cohort X Y Z
33 1 K 3 5 7
55 1 B 4.3 6 1
33 2 K 23 123 213
55 2 B 312 32 1231
33 3 K 231 44 34
55 3 B 213 23 43

I want to know the effects on the different cohorts because they are different ages and have different effect levels. My (simplified) code is as follows:

sort cohort
by cohort: ologit alckid i.alcfreqp1 i.alcfreqp2 i.friendsalc, ro
by cohort: margins, dydx(alcfreqp1 alcfreqp2 friendsalc) post

However, I receive an error that the by command and the margins command may not be combined:

margins may not be combined with by

Does anyone have any suggestions for how I can run the margins command to find the margins by each separate cohort? I am struggling to find an alternative at this point.

Thanks in advance,

Jason





Can I use a zero inflated negative binomial regression?

Hello everyone,

this is my first post, so please be kind and understanding if I don't meet the forum norms.

So my regression equation was: reg amount12 ib1.lng_origins c.pca_generaltrst1 ib6.religion controls
The outcome variable is amount12=amount remitted in past 12 months, lng_origins=language origins in SA such as (Sotho, Venda, Tsonga etc) and religion=Atheists, christians etc.

So the very first problem that I have is that if I only look at the amount remitted of those that remit I may have selection bias. I cannot use a heckman because my selection equation does not have a variable that is different from the second stage so the exclusion restriction is violated.

Then I talked to a professor and he said that I should simply recode the missing values in the amount remitted to zeros because those people are not remitting any amount. So I did that and I also recoded two other variables with missings to zeros that I want to include as controls because I figured that otherwise stata only takes the values into account that are non-missing but to account for selection bias it has to take all the observations into account right? These are the control that I recoded: (1) relationship to remittance receiver (2) frequency of remittances.

Now I cant use OLS because the error terms are not distributed normally and I have a loooot of zeros which is why I thought I may be able to use a zero inflated negative binomial regression. Then in inflate() I would plug in my logit regression (all variables & controls without the outcome variable) that estimated whether a person remits or not:

zinb new_amount12 ib1.pop_lngorigins c.pca_generaltrst1 ib6.religion controls, inflate(ib1.pop_lngorigins c.pca_generaltrst1 ib6.religion other controls)

Unfortunately, the inflate regression does not give me the same or similar results as the logit regression that I did already, why is that? Can I still use the coefficients that I get for amount remitted?

Please note that this is a master thesis and that it does not have to be perfect (I would like it to be but I am pretty much new to these models so I think it is very normal that it will not be perfect right away).Thank you so much for your help in advance!!



STATA seems to add spaces into path where there are none although using double quotes

Dear statalist,

I am using Stata/SE 15.1 for Mac. I am using a global to specify my path at the beginning of my do file. I write it like this:
Code:
global path "/Users/felix/Dropbox⁩/Felix⁩/F&Vs PHD⁩/EINITE⁩/Aggregation⁩/Paper⁩/Aggregation_21082019⁩/"
However, in the command window Stata returns it like this, with additional spaces, e.g. after "felix":
Code:
global path "/Users/felix ⁩/Dropbox ⁩/Felix⁩/F&Vs PHD ⁩⁩/ ⁨EINITE⁩/ ⁨Aggregation⁩/Paper⁩/Aggregation_21082019"
This later causes problems when I later refer to my path using other commands, such as import excel, as STATA tells me that it cannot find the specified file:
I specify my command like this:
Code:
import excel "$path/TOTAL WITH PROPERTYLESS/Germany_total_1400.xlsx", firstrow
and then STATA returns the command like this, with additional spaces in the command window with an error message "not found":
Code:
file /Users/felix ⁩/Dropbox ⁩/Felix⁩/F&Vs PHD ⁩⁩/ ⁨EINITE⁩/ ⁨Aggregation⁩/Paper⁩/Aggregation_21082019/TOTAL WITH PROPERTYLESS/Germany_total_1400.xlsx
So sum up, the problem seems to be that STATA adds spaces to my command that are not included in the folder names and this seems to misspecify the path when I try to draw on it. I checked spelling and everything is in double-quotes.

Many thanks in advance for your help.


Stata seems to add spaces into path where there are none, wich leads to error message

Dear statalist,

I am using Stata/SE 15.1 for Mac. I am using a global to specify my path at the beginning of my do file. I write it like this:
Code:
global path "/Users/felix/Dropbox⁩/Felix⁩/F&Vs PHD⁩/EINITE⁩/Aggregation⁩/Paper⁩/Aggregation_21082019⁩/"
However, in the command window Stata returns it like this, with additional spaces, e.g. after "felix":
Code:
global path "/Users/felix ⁩/Dropbox ⁩/Felix⁩/F&Vs PHD ⁩⁩/ ⁨EINITE⁩/ ⁨Aggregation⁩/Paper⁩/Aggregation_21082019"
This later causes problems when I later refer to my path using other commands, such as import excel, as STATA tells me that it cannot find the specified file:
I specify my command like this:
Code:
import excel "$path/TOTAL WITH PROPERTYLESS/Germany_total_1400.xlsx", firstrow
and then STATA returns the command like this, with additional spaces in the command window with an error message "not found":
Code:
file /Users/felix ⁩/Dropbox ⁩/Felix⁩/F&Vs PHD ⁩⁩/ ⁨EINITE⁩/ ⁨Aggregation⁩/Paper⁩/Aggregation_21082019/TOTAL WITH PROPERTYLESS/Germany_total_1400.xlsx
So sum up, the problem seems to be that STATA adds spaces to my command that are not included in the folder names and this seems to misspecify the path when I try to draw on it. I checked spelling and everything is in double-quotes.

Many thanks in advance for your help.


event-study (event-study2)

Hi all,

I am using Stata 16.

My study: I have two periods that I will evaluate; 2008 - 2013 and then 2014 - 2018. I have a total of 63 baseline events - 23 in the 2008 - 2013 period and 40 in the 2014 - 2018 period - and I am using open/close data. Due to data access, I was unable to obtain intraday data (for less than £1500) but I feel using the open/close data will enable me to provide a broad view and still comment on any pre-announcement drift and lasting effects of FOMC news (thus commenting on efficient market theory). To that end, I would like to evaluate the way in which the Dow anticipates FOMC news 3 days prior to the announcement and the way in which news is incorporated, 3 days after the event.

The majority of event studies I have seen appear to evaluate a stock against an index (eg. apple against the S&P 500 is very common) to thus estimate normal performance, abnormal and cumulative abnormal returns, as well as being able to test for significance. Is it possible to conduct an event study with just the Dow Jones?

I am unsure whether anyone has any direction regarding the use of eventstudy2 but I also had the idea to evaluate the effects of unscheduled meetings - so effectively, another set of events. I considered using dummies/ changing (d.var d2.var etc) for x days – leading and lagging (1, 2,3, 4 ) to see if there is any reaction and whether it is more dramatic or not of the scheduled meetings. I suppose they would be in reaction to shocks and a such not working in the same way as the schedule ones. I have done something similar on a project that was not finance related so excuse the novice approach to the event study application.

I have read the 'help eventstudy2' and also used the Princeton , event Studies with Stata guide. However, I I am generating missing values when setting-up my data and I cannot figure what is up with the way I have prepared my data in Excel.

Attached in a copy of my data prepared in excel. I have manually calculated cumulative returns for the whole index, for both periods and I have further calculated cumulative returns for the index over the whole period and removed the event days - this will allow me to compare event-days versus all other trading days. Any direction at this stage would be much appreciated.

Thanks in advance.

Friday, August 30, 2019

Linear interpolation

Hi!

I have a variable called year in my dataset but this contains just the year 2006, 2009, 2011, 2013, 2015 and 2017. I want to aggregate intermediate years but i do not know how, the other variables have to have an average in those years.

Example: I have 2006 and 2009 in years and i have population to theese years. So, i want to aggregate the intermediate years and make an average from this population in the intermediate years.

I would be grateful if you can help me.
Thanks!

Taking Results from Tab2 and Using them to Generate/Recode a new variable

I have a big dataset (1.7million obs) and I need to create contour plots by a member_id within the dataset.

I have the member_id, and each member has about 45,000 obs. After filtering and looking at a subset, I end up with about 35,000 obs for each member.

I tried to use histogram/contour plot to create a contour plot/heatmap and with all of the data it took a really long time to render.

To increase the processing speed and distill the data I did the following....

For the 35,000 obs, I have two continuous variables that are the X and Y variable... and I want their incidence/intensity displayed in a contourplot or a heat map.

I created two categorical variables, a 15-category variable and an 11-category variable that were created by substantive concerns for 1 (known categories with substantive import) and distributional properties for the other (created reasonably sized groups in each interval.

I cross-tabbed the two categorical variables and then created a counter/incidentce variable using the crosstab results....

I did it using brute force...

generate counter=0

recode counter 0=(number in crosstab) if v1==1 and V2==1

recode counter 0=(number in crosstab) if v1==1 and v2==2

I did this for all of the cells in an 11x15 crosstab....

I then created a contour plot with the x variable, the y variable, and used the 'counter' as the z-variable.

The plots work and, with some options, convey the appropriate information.

BUT, this is hugely clumsy and labor intensive when I have 40 'members' and need to do this for 3-4 files per week.

What I would like to do is run the crosstab -

tab v1 v2 - generate the results, and use a mat command to automate the recode of the 'counter_group1' and place each of the cell values in the code to create my counter.

I hope this is clear and I have data to use for an example...

I have set up matrices and run tabs and then used the mat commands to export data into tables, I assume I can do the same and use the mat command to place 'results' into a recode statement that prevents me from manually taking crosstab results and typing them into recode statements.

Any thoughts?

Quick questions about the interaction methods

Dear Researchers,
Greetings,

I have quick questions about interaction variables, please.

At first, I am going to examine the effect of specific standards on the financial reporting quality in the same country for industrial firms for the period 2006-2018. The standards issued let’s say in 2006 and firms started adopting the standards voluntarily. In other words, firms that did not adopt in 2006, they adopt in 2007, and those that did not adopt in 2007 adopted in 2008 and so on. I’ve read a lot about the Difference in Differences method (DID) but it seems that this method will not work in my case as the DID approach require control sample (non-adopters), and in my case, all non-adopters become adopters during the period. Now I am confused about which methods I can use.
To illustrate:
Let us assume the following model:
This is the basic model before studying the effect of the standards.
  1. Financial Reporting quality = α0+ β1 size + β2 Audit + β3 Growth + €.
Where:
Financial Reporting quality, size, and growth are scales (ratio) variables while Audit is a binary variable.
Now, I want to examine the effect of the standards on the financial reporting quality as below:
Financial Reporting quality = α0+ β1size + β2 Audit + β3 Growth + β4 GG + β5GG*size + β6GG*Audit+ βGG*Growth+ €.
Where:
GG = are the standards adopted
Q (1) Can I code GG (1) for a firm (X) for the observations under the adoption period and (0) for the same firm but for observation under the old standard period.
Q(2) can I consider those observations that coded (0) under the old standards as a control?
Q(3) Do the following are correct:
β4 capture the effect of the standards on the financial reporting quality and if it is positive means that the average values of financial reporting quality are higher under the new standards than the old standards keeping other factors constant.
β1 Capture the effect of size before the adoption of the new standards
β5 Capture the effect of size after the adoption of the new standards
β1+ β5 Capture the overall effect (before and after).

Thank you very much in advance for your cooperation.



















Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data

Greetings,

I'm running Stata 15.1 on OSX. I've created a repeated cross-sectional dataset to assess variations in attitudes towards crime (the dependent variable). My time variable essentially stores the month and year in which each survey (31 in total) was conducted. For my independent variable, I created and merged (using the time variable) a newspaper issue-salience index that stores the percent of monthly New York Times articles that refer to crime-related issues. My expectation is that in months in which crime is salient in the media, we will see an increase in the percent of respondents saying crime 'is a serious issue'. To prepare the dataset for analysis, I created (using 'collapse') a variable that stores the mean percent of respondents that gave the 'serious issue' response in each survey (i.e. by year/month). I did the same with the salience index variable (separate dataset) and merged it into the collapsed survey dataset. I ran a simple Pearson correlation between the index and the survey response variable and uncovered a strong relationship (r=0.87). However, a colleague of mine who saw the resulting graph warned me that I 'shouldn't correlate time series with strong autocorrelation' and that, instead, I should 'create first-order difference sequences and correlate those'. I'm not quite sure how to go about doing this. The dataset has no panel ID, so I tried creating one:

Code:
gen id=_n
(note that the id is then simply a number assigned to each survey--31 in total)

I then entered the following:

Code:
xtset id year
To create the 'first difference' variable I tried the following:

Code:
gen indexdiff=D.index
However, Stata subsequently gave me the '31 missing values generation' message.

What am I doing wrong here and how do I get it right? Thanks for your time!

Sample data:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(index crime_serious yearmonth year id)
1.0898919 37.634632 441 1996  1
 1.141105  41.15658 449 1997  2
 .9819449 31.580814 453 1997  3
1.1344688  35.43958 475 1999  4
1.2987403   39.7779 487 2000  5
1.1022217  39.37875 488 2000  6
 1.045117 32.872364 521 2003  7
 .7872596 35.538055 522 2003  8
 .8885489  38.24273 523 2003  9
 .9927688  35.79262 524 2003 10
 .7067459  39.30157 539 2004 11
1.0929303 36.767914 548 2005 12
1.0707874  25.04893 572 2007 13
1.0773966  34.76981 573 2007 14
1.0685753  29.70381 576 2008 15
 1.118886   27.0324 580 2008 16
 .9239349  31.63132 584 2008 17
 .7300239 23.623867 597 2009 18
 .7975035  28.98842 598 2009 19
1.1477937 34.304623 613 2011 20
1.0149189  38.20615 614 2011 21
1.1804827   34.5046 624 2012 22
1.3056893  39.55238 648 2014 23
1.2751036  41.03848 649 2014 24
 1.369863  42.47158 650 2014 25
1.8246716  52.22675 662 2015 26
 2.096708  48.12559 667 2015 27
1.6774454  47.23487 668 2015 28
1.5856438  42.08379 669 2015 29
 2.575059  57.32762 686 2017 30
2.7088645   64.2695 689 2017 31
end
format %tm yearmonth


No observations r(2000) with Regress and SEM

Hello all. I am working with a dataset that contains many categorical variables with values that are still in written word format, which I am trying to convert into numbers. Initially when I typed "summarize" no observations would register for any of the variables, even though I could see the values when I manually scrolled through the dataset. When I tried to run PCA and Regression on some of the variables, these commands would obviously not work because the data were still in string format and needed to be converted to numbers. I fixed that issue by using: generate oldvar=real(oldvar) for all of the variables I was planning on including in my analyses. After running the real() function I was able to obtain summary statistics and observations for all of the variables i needed for my regression. So I thought my problem of the "no observations" error would be fixed when I ran a regression using the new numerical variables. This was not the case, however. I am able to run PCA on several of the variables successfully (Vo1-Vo10), but the regression model I ran still returned the "no observations" error again when I included the now numerical "Race, Age, Education, Ethnicity, and Income" variables. The same error arises when I try to run a SEM with some of the variables. Has anyone found a fix to this "no observations" error arising even after string variables have been converted to numeric? I'll copy and paste my code below:

*Create race dummy variable with 1 as white, 0 as non white
gen race=""
replace race=Q101
replace race="1" if Q101=="White"
replace race="0" if Q101=="Black or African American"
replace race="0" if Q101=="American Indian or Alaska Native"
replace race="0" if Q101=="Asian"
replace race="0" if Q101=="Native Hawaiian or Pacific Islander"
replace race="0" if Q101=="Other"

*Create ethnicity categorical variable
gen ethnicity="."
replace ethnicity=Q102
replace ethnicity="1" if Q102=="Yes"
replace ethnicity="0" if Q102=="No"

*create income categorical variable
gen income="."
replace income=Q100
replace income="1" if Q100=="Less than $19,999"
replace income="2" if Q100=="$20,000 - $39,999"
replace income="3" if Q100=="$40,000 - $59,999"
replace income="4" if Q100=="$60,000 - $79,999"
replace income="5" if Q100=="$80,000 - $99,999"
replace income="6" if Q100=="$100,000 - $149,999"
replace income="7" if Q100=="More than $150,000"

*Create education categorical variable
gen education="."
replace education=Q99
replace education="1" if Q99=="Less than high school"
replace education="2" if Q99=="High school graduate"
replace education="3" if Q99=="Some college"
replace education="4" if Q99=="2 year degree"
replace education="5" if Q99=="4 year degree"
replace education="6" if Q99=="Professional degree"
replace education="7" if Q99=="Doctorate"

*Create age categorical variable
gen age="."
replace age=Q98
replace age="1" if Q98=="16 - 24"
replace age="2" if Q98=="25 - 34"
replace age="3" if Q98=="35 - 44"
replace age="4" if Q98=="45 - 54"
replace age="5" if Q98=="55 - 64"
replace age="6" if Q98=="65 - or older"

*Code value oriententation questions to prep for PCA
gen vo1="."
replace vo1=DT
replace vo1="1" if DT=="Strongly disagree"
replace vo1="2" if DT=="Somewhat disagree"
replace vo1="3" if DT=="Neither agree nor disagree"
replace vo1="4" if DT=="Somewhat agree"
replace vo1="5" if DT=="Strongly agree"

gen vo2="."
replace vo2=DU
replace vo2="1" if DU=="Strongly disagree"
replace vo2="2" if DU=="Somewhat disagree"
replace vo2="3" if DU=="Neither agree nor disagree"
replace vo2="4" if DU=="Somewhat agree"
replace vo2="5" if DU=="Strongly agree"

gen vo3="."
replace vo3=DV
replace vo3="1" if DV=="Strongly disagree"
replace vo3="2" if DV=="Somewhat disagree"
replace vo3="3" if DV=="Neither agree nor disagree"
replace vo3="4" if DV=="Somewhat agree"
replace vo3="5" if DV=="Strongly agree"

gen vo4="."
replace vo4=DW
replace vo4="1" if DW=="Strongly disagree"
replace vo4="2" if DW=="Somewhat disagree"
replace vo4="3" if DW=="Neither agree nor disagree"
replace vo4="4" if DW=="Somewhat agree"
replace vo4="5" if DW=="Strongly agree"

gen vo5="."
replace vo5=DX
replace vo5="1" if DX=="Strongly disagree"
replace vo5="2" if DX=="Somewhat disagree"
replace vo5="3" if DX=="Neither agree nor disagree"
replace vo5="4" if DX=="Somewhat agree"
replace vo5="5" if DX=="Strongly agree"

gen vo6="."
replace vo6=DY
replace vo6="1" if DY=="Strongly disagree"
replace vo6="2" if DY=="Somewhat disagree"
replace vo6="3" if DY=="Neither agree nor disagree"
replace vo6="4" if DY=="Somewhat agree"
replace vo6="5" if DY=="Strongly agree"

gen vo7="."
replace vo7=DZ
replace vo7="1" if DZ=="Strongly disagree"
replace vo7="2" if DZ=="Somewhat disagree"
replace vo7="3" if DZ=="Neither agree nor disagree"
replace vo7="4" if DZ=="Somewhat agree"
replace vo7="5" if DZ=="Strongly agree"

gen vo8="."
replace vo8=EA
replace vo8="1" if EA=="Strongly disagree"
replace vo8="2" if EA=="Somewhat disagree"
replace vo8="3" if EA=="Neither agree nor disagree"
replace vo8="4" if EA=="Somewhat agree"
replace vo8="5" if EA=="Strongly agree"

gen vo9="."
replace vo9=EB
replace vo9="1" if EB=="Strongly disagree"
replace vo9="2" if EB=="Somewhat disagree"
replace vo9="3" if EB=="Neither agree nor disagree"
replace vo9="4" if EB=="Somewhat agree"
replace vo9="5" if EB=="Strongly agree"

gen vo10="."
replace vo10=EC
replace vo10="1" if EC=="Strongly disagree"
replace vo10="2" if EC=="Somewhat disagree"
replace vo10="3" if EC=="Neither agree nor disagree"
replace vo10="4" if EC=="Somewhat agree"
replace vo10="5" if EC=="Strongly agree"

*Convert model variables from string to numeric
generate Race=real(race)
generate Age=real(age)
generate Education=real(education)
generate Ethnicity=real(ethnicity)
generate Income=real(income)
generate Vo1=real(vo1)
generate Vo2=real(vo2)
generate Vo3=real(vo3)
generate Vo4=real(vo4)
generate Vo5=real(vo5)
generate Vo6=real(vo6)
generate Vo7=real(vo7)
generate Vo8=real(vo8)
generate Vo9=real(vo9)
generate Vo10=real(vo10)

One-sided test of coefficients obtained by mixed

Dear all,
I ran mixed 'outcome' 'treatment' 'binary_covariates' 'metric_covaraites' [fweight=w] || block: . The coefficient for treatment is non-significant, but I would like to be more certain it is not negative. Thus I would like to perform a one-sided test to find out. How can I best do that?



output

Array
The treatment variable is called SRI. All variables except area_aux_rice are binary. The sample contains cross-sectional data of farming households that were matched into groups using Mahalinobis matching.

regards,
Sarah

Looping in Stata to extract values from one file to another

Hi all!
So I am a bit stuck on something I have to do.

1. I have a dataset which is multi level, and includes several individual observations for different countries (total 40), for different years/waves. (So for each year, there are 40 countries each having several individual observations of its own). So this is an individual level file.
2. In another dataset, I have the country level variables for all these countries and many more, which has variables like gdp of the country in that particular year
3. I have created an id variable - such that (country-wave) which is the same in both the files.
SO if the code is 47, it represents for country 4 in wave 7.

I want to import the gdp values for each country-wave variable from dataset in 2. such that it gets inputed into all the individual level observations corresponding to that particular (country-wave) variable.

All i need is a loop code such that it would take the gdp value corresponding to the country-wave number (e.g. 47) and put it into the individual file for observations that also correspond to the same country-wave number.

These are two different stata files.

Please help if you can! Will be very well appreciated.

rho value in heckman model

Hello dear proffessor;
could you help me please in one small issue; i have panel data with n=7 t=8 ,, I use bootstrap methode with Heckman selection model but i find some diffecult to interpret the results , especialy the validation of rho value as it aqual to 1,more detail is in attachement
i appreciate any help from you
Attached Files

Problem with estimation of coefficient in panel regression

My aim is to estimate the association between a country's real per capita growth (realgthpc) and exported-weighted real per capita growth of its trading partners (tpg_aenia for advanced economies, tpg_em for emerging markets, tpg_chn for china). I am using a panel of 26 countries with data at annual frequency. I control for the usual variables that are associated with growth such as inflation, physical and human capital, global growth etc. Moderation, crisis, recovery dummies capture the periods 2003-07, 2008-09, and 2010-11 respectively.

I first run the regression for the period 2001 to 2018. Guided by the Hausman test, I have used country fixed effects. Also, standard errors are clustered at the country-level. I find that a 1% increase in growth of EM trade partners is associated with a 0.5% increase in the growth of the reporting country.
Code:
xtreg realgthpc l.tpg_aenia l.tpg_em l.tpg_chn realpcgdp_5yago_bn gfcf_nomgdp prim_enrol cpi_ep_inflatn depratio open_gdp globalgthpc oiltrdbal_gdp oiltrdbal_gdp_5y gth_commexp_prind crisis_dummy moderation_dummy recovery_dummy, fe vce(cluster CountryNo)
Code:
Fixed-effects (within) regression               Number of obs     =        343
Group variable: CountryNo                       Number of groups  =         26

R-sq:                                           Obs per group:
     within  = 0.3832                                         min =          4
     between = 0.0080                                         avg =       13.2
     overall = 0.1749                                         max =         17

                                                F(16,25)          =      15.10
corr(u_i, Xb)  = -0.5907                        Prob > F          =     0.0000

                                   (Std. Err. adjusted for 26 clusters in CountryNo)
------------------------------------------------------------------------------------
                   |               Robust
         realgthpc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
         tpg_aenia |
               L1. |  -.0420772   .1956026    -0.22   0.831    -.4449283    .3607739
                   |
            tpg_em |
               L1. |   .5208979   .2481788     2.10   0.046      .009764    1.032032
                   |
           tpg_chn |
               L1. |   .1076752   .5621433     0.19   0.850    -1.050081    1.265431
                   |
realpcgdp_5yago_bn |   .0309591   .0211014     1.47   0.155    -.0125001    .0744183
       gfcf_nomgdp |   .2013728   .0805251     2.50   0.019     .0355283    .3672173
        prim_enrol |   .0969751   .0289507     3.35   0.003     .0373501    .1566001
    cpi_ep_inflatn |  -.1490856   .0699263    -2.13   0.043    -.2931015   -.0050697
          depratio |  -.0002485   .0571743    -0.00   0.997    -.1180011    .1175041
          open_gdp |   .0272978    .023562     1.16   0.258    -.0212292    .0758247
       globalgthpc |   .7846087   .2894573     2.71   0.012     .1884603    1.380757
     oiltrdbal_gdp |    .061986    .079659     0.78   0.444    -.1020748    .2260467
  oiltrdbal_gdp_5y |   .0130573   .1709793     0.08   0.940     -.339081    .3651957
 gth_commexp_prind |   .0252099   .0140975     1.79   0.086    -.0038245    .0542443
      crisis_dummy |    .134008   .9356604     0.14   0.887    -1.793021    2.061037
  moderation_dummy |   .8694931   .6094907     1.43   0.166    -.3857765    2.124763
    recovery_dummy |   .5936562   .6663803     0.89   0.381    -.7787797    1.966092
             _cons |  -14.92877   6.315646    -2.36   0.026    -27.93609   -1.921454
-------------------+----------------------------------------------------------------
           sigma_u |  2.7801935
           sigma_e |  2.6945358
               rho |  .51564218   (fraction of variance due to u_i)
------------------------------------------------------------------------------------

Redistributing a Group's percentiles in panel data

Hi, this is my first post in Statalist and I'm hoping you can help me.

I am using a panel data set with really good coverage 2001-2018, however, the DTI for 1999 and 2000 is shaky at best for a specific group (where DTI drops to 50 obs where there should be 10,000.) We used multiple imputations as a fix but when compared to a benchmark, they fell 7 points lower than expected DTI's.

I have a benchmark for the 10th, 25th, 50th, 75th, and 90th percentiles from a reliable source for 1999 and 2000 with a total count for the year.

I have never created weights before/ have barely used them.
But should I make sample weights for my specific group for 2000, or create buckets?

I know this is vague, but I'm not sure what information would help for clarity.

Best and Thanks,
Sian

Geonear producing unstable results

I'm trying to use geonear to identify the nearest neighbors within a dataset at the household-day level.

I've got longitude with a level of precision ranging between the 4-6 decimal place. I've got latitude to the 6th decimal place always.

I ran a do-file, which produces the six nearest neighbors, four times. The first two times I ran it, they produced the same results. The second two times I ran it they varied in the final neighbor identification.

Is there a place where I can expect this type of variation in output to occur? Is it less likely if I specify a within()?

For what it is worth, here is my code.

Code:
preserve

rename hhd_id nbor_index
sort nbor_index, stable
tempfile nbor
save `nbor'

restore

sort hhd_id, stable
geonear hhd_id lat lon using `nbor', n(nbor_index lat lon) ign wide near(6)

Generating mean returns, based on xtile returns

Hi comminuty,

I seem to have problems with calculating mean returns (ret_sup in my data). My dataset has weekly obeservations for returns(customer en supplier). The lagged customer returns are split into quintiles of 5 with xtile. Based on the rank of the lagged customer returns, I want to calculate the weekly return of the associated supplier returns per quintile.

I used the following codes, but when I manually calculate the returns with excel for double checking, it doesnt give me the same results:
Code:
by stic ctic: gen L_ret= ret_cus[_n-1]
egen retQ = xtile(L_ret), by( week_id ) nq(5)
by week_id : egen retQ1=mean(ret_cus) if  retQ ==1
by week_id : egen retQ2=mean(ret_cus) if  retQ ==2
by week_id : egen retQ3=mean(ret_cus) if  retQ ==3
by week_id : egen retQ4=mean(ret_cus) if  retQ ==4
by week_id : egen retQ5=mean(ret_cus) if  retQ ==5
My dataex exactly consists of 1 week of paneldata.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str8(stic ctic) float(week_id ret_sup ret_cus L_ret retQ retQ1 retQ2 retQ3 retQ4 retQ5)
"PCZ"   "CWG"   1926  .023622047  .069767445   .01775148 3         .         . .0150178 .           .
"CPCI"  "SNE"   1926  .067307696   .02152642   .02152642 3         .         . .0150178 .           .
"CSDS"  "GND"   1926  -.11594203  -.06666667      -.0625 1 .05393586         .        . .           .
"PGEX"  "FTR"   1926   .05633803         .03 .0033444816 2         . .03446858        . .           .
"HMTT"  "IOM"   1926   .04918033    .1119403  -.02189781 1 .05393586         .        . .           .
"SRI"   "GM"    1926           .  .016393442  .029535865 3         .         . .0150178 .           .
"VICL"  "RPR"   1926 -.007352941  .011666667           0 2         . .03446858        . .           .
"KOPN"  "ROK"   1926  -.04807692  .030674847 -.012121212 1 .05393586         .        . .           .
"MXC"   "AQP"   1926  -.06451613  .008130081 -.008064516 2         . .03446858        . .           .
"TSGI"  "NYN"   1926  -.06666667  .012562814  .015306123 3         .         . .0150178 .           .
"AKS"   "GM"    1926   .04153354  .016393442  .029535865 3         .         . .0150178 .           .
"LEA"   "GM"    1926 -.006493506  .016393442  .029535865 3         .         . .0150178 .           .
"JCI"   "GM"    1926   -.0056899  .016393442  .029535865 3         .         . .0150178 .           .
"VCOM"  "DSCP"  1926 -.014084507 -.010526316   .11111111 5         .         .        . . .0038411536
"SHLO"  "GM"    1926  .028169014  .016393442  .029535865 3         .         . .0150178 .           .
"IVAC"  "HMTT"  1926   .04635762   .04918033   .18217054 5         .         .        . . .0038411536
"CSPLF" "CWG"   1926  -.01724138  .069767445   .01775148 3         .         . .0150178 .           .
"BJICA" "DRYR"  1926   -.0825688 -.016666668 -.024390243 1 .05393586         .        . .           .
"MNR"   "AJL"   1926  -.02777778  .032786883  -.07575758 1 .05393586         .        . .           .
"ITT"   "GM"    1926  .005780347  .016393442  .029535865 3         .         . .0150178 .           .
"ALET"  "ALET"  1926   .04166667   .04166667   .04166667 5         .         .        . . .0038411536
"MCCL"  "WMX"   1926  -.08571429 -.011235955  .011363637 3         .         . .0150178 .           .
"SPTR"  "OCC"   1926   .02352941   .06498195 -.017730497 1 .05393586         .        . .           .
"LRCI"  "BKCO"  1926    .0925926   .03448276  .035714287 5         .         .        . . .0038411536
"JPMX"  "DBD"   1926           0  -.07212476           . .         .         .        . .           .
"DSPT"  "GM"    1926           0  .016393442  .029535865 3         .         . .0150178 .           .
"PRCP"  "GM"    1926 -.028368793  .016393442  .029535865 3         .         . .0150178 .           .
"CPCI"  "SNE"   1926  .067307696   .02152642  -.04307116 1 .05393586         .        . .           .
"CVTX"  "BGEN"  1926    .0877193   .15337424 -.006097561 2         . .03446858        . .           .
"PCZ"   "TRP"   1926  .023622047   .02142857  .007194245 2         . .03446858        . .           .
"CEPH"  "CHIR"  1926   .11229946    -.040625  .012658228 3         .         . .0150178 .           .
"CFCI"  "REY"   1926   .02479339   .03809524  .024390243 3         .         . .0150178 .           .
"ROG"   "AFLX"  1926 -.009090909  -.10784314   .05154639 5         .         .        . . .0038411536
"AFLX"  "SFLX"  1926  -.10784314 -.015151516  .023255814 3         .         . .0150178 .           .
"KLOC"  "CBS"   1926  -.06666667           0  .018072288 3         .         . .0150178 .           .
"HIW"   "HIW"   1926 -.003610108 -.003610108  .029739777 5         .         .        . . .0038411536
"DEFI"  "GM"    1926  -.01923077  .016393442  .029535865 3         .         . .0150178 .           .
"RGLD"  "PDG"   1926         -.1   .03030303  .012269938 3         .         . .0150178 .           .
"SIND"  "SHX"   1926   .11764706   .07692308  -.02803738 1 .05393586         .        . .           .
"STCL"  "RCII"  1926    .1935484   .02255639   .11764706 5         .         .        . . .0038411536
"STRB"  "CBS"   1926           0           0  .018072288 3         .         . .0150178 .           .
"WN"    "GM"    1926   -.3415638  .016393442  .029535865 3         .         . .0150178 .           .
"MCCL"  "MCCL"  1926  -.08571429  -.08571429   .14130434 5         .         .        . . .0038411536
"DECO"  "RBD"   1926           0   .03910615  .017045455 3         .         . .0150178 .           .
"CSYI"  "SGMA"  1926  -.01369863   .05755396  .007246377 3         .         . .0150178 .           .
"SMPS"  "GM"    1926 -.011494253  .016393442  .029535865 3         .         . .0150178 .           .
"BGG"   "ELUXY" 1926  .013812155  -.02484472   .07572383 5         .         .        . . .0038411536
"HILI"  "GM"    1926   .15789473  .016393442  .029535865 3         .         . .0150178 .           .
"ALET"  "ALET"  1926   .04166667   .04166667           0 2         . .03446858        . .           .
"NFG"   "UGI"   1926  -.02285714  .005586592           0 2         . .03446858        . .           .
"ZRAN"  "IOM"   1926   .22826087    .1119403  -.02189781 1 .05393586         .        . .           .
"SUP"   "GM"    1926 -.010869565  .016393442  .029535865 3         .         . .0150178 .           .
"GLFD"  "RPR"   1926        -.02  .011666667           0 2         . .03446858        . .           .
"TIII"  "NYN"   1926    .1122449  .012562814  .015306123 3         .         . .0150178 .           .
"CHK"   "AQP"   1926   -.1440678  .008130081 -.008064516 2         . .03446858        . .           .
"PTSI"  "GM"    1926 -.020833334  .016393442  .029535865 3         .         . .0150178 .           .
"NAII"  "AJL"   1926  -.01369863  .032786883  -.07575758 1 .05393586         .        . .           .
"SPZE"  "EIRE"  1926       .1875   .13414635  .031446543 5         .         .        . . .0038411536
"HICKA" "GM"    1926      -.0125  .016393442  .029535865 3         .         . .0150178 .           .
"ORCI"  "GM"    1926  -.03448276  .016393442  .029535865 3         .         . .0150178 .           .
"AMSC"  "CWP"   1926 -.032967035  .037037037 -.010471204 2         . .03446858        . .           .
"HAVA"  "GM"    1926         .35  .016393442  .029535865 3         .         . .0150178 .           .
"GNTX"  "GM"    1926  -.07317073  .016393442  .029535865 3         .         . .0150178 .           .
"ARRS"  "TCOMA" 1926   .07476635   .04424779  -.02586207 1 .05393586         .        . .           .
"GGNS"  "IIII"  1926  -.11764706           0 -.006993007 2         . .03446858        . .           .
"NSCC"  "OHM"   1926   .04347826 -.014492754           0 2         . .03446858        . .           .
"RGR"   "ELY"   1926  .006535948   .12820514 -.012658228 1 .05393586         .        . .           .
"IMPX"  "ROK"   1926  .071428575  .030674847 -.012121212 1 .05393586         .        . .           .
"AWRE"  "USRX"  1926   .25714287  .008561644   .06181818 5         .         .        . . .0038411536
"STRT"  "GM"    1926           0  .016393442  .029535865 3         .         . .0150178 .           .
"IMPX"  "IOM"   1926  .071428575    .1119403  -.02189781 1 .05393586         .        . .           .
"MAYS"  "FULT"  1926           0   .10843374           0 2         . .03446858        . .           .
"SWTX"  "SNE"   1926  -.05357143   .02152642  -.04307116 1 .05393586         .        . .           .
"AOS"   "YRK"   1926  -.06072874  .009433962 -.023041476 1 .05393586         .        . .           .
"LCBM"  "CHIR"  1926 -.006849315    -.040625  .012658228 3         .         . .0150178 .           .
"FSCR"  "GM"    1926  .022900764  .016393442  .029535865 3         .         . .0150178 .           .
"INVI"  "GM"    1926           0  .016393442  .029535865 3         .         . .0150178 .           .
"ORBT"  "GM"    1926    .0882353  .016393442  .029535865 3         .         . .0150178 .           .
"SPA"   "WAT"   1926  .008064516   .06521739 -.033613447 1 .05393586         .        . .           .
"NAII"  "NUS"   1926  -.01369863    -.041841  .012711864 3         .         . .0150178 .           .
"NFG"   "NUI"   1926  -.02285714           0           0 2         . .03446858        . .           .
"CTS"   "GM"    1926  .028744327  .016393442  .029535865 3         .         . .0150178 .           .
"KVHI"  "GM"    1926           0  .016393442  .029535865 3         .         . .0150178 .           .
"WALB"  "GM"    1926           0  .016393442  .029535865 3         .         . .0150178 .           .
"SEC"   "MCD"   1926  .037037037   .04519774           0 2         . .03446858        . .           .
"DASTY" "DASTY" 1926 -.011961723 -.011961723   .14520548 5         .         .        . . .0038411536
"DRTE"  "RPR"   1926      -.0625  .011666667           0 2         . .03446858        . .           .
"PMD"   "GM"    1926   .02654867  .016393442  .029535865 3         .         . .0150178 .           .
"KRT"   "HCDV"  1926  .014492754           0 -.032967035 1 .05393586         .        . .           .
"ASI"   "ASI"   1926   .29801324   .29801324  -.01948052 1 .05393586         .        . .           .
"PAR"   "MCD"   1926   .14049587   .04519774           0 2         . .03446858        . .           .
"PDKL"  "COGE"  1926 -.023809524   .09574468 -.005291005 2         . .03446858        . .           .
"PSDS"  "DRI"   1926   .10204082  -.03076923 -.029850746 1 .05393586         .        . .           .
"WCSTF" "GM"    1926  -.01599016  .016393442  .029535865 3         .         . .0150178 .           .
"BWA"   "GM"    1926   .04140127  .016393442  .029535865 3         .         . .0150178 .           .
end
format %tw week_id
Thanks in advance, help would be appreciated here.

Misstable

Hello,

Looking to do misstable summarize. I have a variable split into cases and controls. Then I have another variable that I want to see how many missing values there are. But I want to know how many missing values there are for the cases and controls. Is there a code for this so that I don't just see the total number of missing values but rather have it split via another variable?

Thanks

Age and Gender

Bonjour,

First and foremost, I have to warn you that Im such a beginner on STATA & statistics overall ...
But with regards to my master thesis on fashion windows displays & consumer behavior, I had to do statistics analyses !
Anyway, Im facing now the issue of including gender & age, as covariates/moderator variables of the model.

gender: male (116), female (97), non-binary (27)
age: 18-24 (195), 25-34 (27), 35-44 (7), 45-60+ (11)
In total 240 participants, but the sample sizes are different! I think it is a problem.

I tried to code both as dummy variables, but received the message "no observations".

I want to study if including genders identities influence Entry Intention (Y outcome variable) regarding f4 different Fashion Windows Displays (independent variables: X1Mannequins X2PunchyWarmColors X3InteractiveTechnology X4CreativeDesign)
And I want to know if age has an impact on the attraction towards technology-based window.

- maybe: dummy variables are not the solution?
- others models should be used like: a binary response model?
- maybe ANOVA? by separating the population using a median split?

I already reg per gender & the three models are significant p-value <0,05, R²<1.
Also, F-Test is OK.
VIF: checked as well.

I hope that I have expressed myself correctly.
I include here my Excel file.

Brgrds,
Isabelle Deschamps

event study - eventstudy2

I am a postgraduate student writing my thesis. I am somewhat a novice in the field I have chosen to study, however this has undoubtedly broadened my horizons. I am attempting to evaluate the impact of FOMC news on the dow jones industrial average. Please excuse me for resorting to the forum the for help but I am getting quite confused with all the material I have seen on event studies and would really appreciate a specific direction at this stage.

My study: I have two periods that I will evaluate; 2008 - 2013 and then 2013 - 2018. I have a total of 63 events - 23 in the 2008 - 2013 period and 40 in the 2013 - 2018 period - and I am using open/close data. Due to data access, I was unable to obtain intraday data (for less than £1500) but I feel using the open/close data will enable me to provide a broad view and still comment on any pre-announcement drift and lasting effects of FOMC news (thus commenting on efficient market theory). To that end, I would like to evaluate the way in which the Dow anticipates FOMC news 3 days prior to the announcement and the way in which news is incorporated, 3 days after the event.

I have used Stata for past projects, unrelated to finance and feel its use would will be quicker than attempting to evaluate each event in Excel, further I would like to make use of the estout or outreg2 for output tables. I also had the idea to evaluate the effects of unscheduled meetings - so effectively, another set of events. I considered using dummies/ changing (d.var d2.var etc) for x days – leading and lagging (1, 2,3, 4 ) to see if there is any reaction and whether it is more dramatic or not of the scheduled meetings. I suppose they would be in reaction to shocks and a such not working in the same way as the schedule ones. I have done something similar on a project that was not finance related so excuse the novice approach but is this something that could still be done using for example, eventstudy2?

The majority of event studies I have seen appear to evaluate a stock against an index (eg. apple against the S&P 500 is very common) to thus estimate normal performance, abnormal and cumulative abnormal returns, as well as being able to test for significance. Is it possible to conduct an event study with just the Dow Jones?

If all seems feasible, does anyone have direction of how I could go about starting my study using eventstudy2. There is usually an abundance of information regarding the stata packages but there seems to be little on eventstudy2.

Thanks in advance.

Dummy Variable??

Hi All,

I have tried to find this answer in the forum but cant seem to.
I have a set of panel data for 15 countries over 10 years and want to create a dummy based on per capita income levels, per year but split into 3. Hope this makes sense
Usually with cross section i would first create the median income per year (as i want equal observations in each half(75))

egen MEDINCOME=median(income), by (year)

then gen dummy = income>MEDINCOME

However, I would like to have 3 splits instead of just two, eg 50 observations in each to get the equivalent of high income, middle income and lower income.

Is this possible?

Thank you

Safia

Suest - Can I cluster just 1 model?

Dear all,

I'm trying to compare the coefficients of an OLS regression with a multilevel regression. I tried to use the Hausman command but got an error saying that
Code:
 
 "model fitted on these data fails to meet the asymptotic assumptions of the Hausman test; see suest for a generalized test"
I'm trying to use the Suest command as a proper solution, but this command doesn't work after Mixed command. I understand that there's an option to use
Code:
vce(cluster clustervar)
in order to cluster 1-level models and with that use this command for mixed models.

However, in my case there's only one model with clustering (the other is just a normal linear regression), so if I try this syntax -

Code:
regress sqrt_Relative_time_days White Hispanic Other Age Sex i.Seriousness_of_crime_7 Death_Penalty i.year_of_conviction_groups

estimates store fixed


mixed sqrt_Relative_time_days White Hispanic Other Age Sex i.Seriousness_of_crime_7 Death_Penalty i.year_of_conviction_groups  || states_combined:

estimates store random

suest fixed random, vce(cluster clustervar)
the command create two identical clustered regression, where I want just the second one to be clustered (the model I call "random").

Is there a way to cluster just 1 model in the suest command?

Thanks!
Eran

Thursday, August 29, 2019

Adjusted variable and deviation

Hi,

I want to generate a new variable (RC) with the sum of industry-year (-Risky_Ind- and -year-) adjusted -INV- and -REC- as a percentage of -TA-, and expressed as deviation from the corresponding industry-year mean.

I would be grateful if you could help me with generating the new variable.

Note: IS is the company identifier.

example:

. dataex IS INV REC TA year risky_ind in 1/50

----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str11 IS double(INV REC TA) int year float risky_ind
"IQ1001135"     2.20403  2.52538   42.87435 2013 0
"IQ1001135"     2.83681   3.1106   24.37639 2014 0
"IQ10038253"   10.03707  9.87853   67.32937 2017 0
"IQ100475337" 140.42434 64.60702 1320.80513 2013 0
"IQ100475337"  95.74748 65.44843  857.81385 2014 0
"IQ100475337"  53.54069  52.1743  574.70545 2015 0
"IQ100475337"  58.86901  38.4869  501.58682 2016 0
"IQ100475337"  46.88747 37.78913  427.04402 2017 0
"IQ100475337"  17.20336 47.58633  450.97572 2018 0
"IQ10059716"   13.97032  2.20655   64.91229 2014 0
"IQ10059716"   12.58804  1.39677   79.95269 2015 0
"IQ10059716"   13.46922  1.33011   71.59447 2016 0
"IQ10059716"   15.93507  2.52412   76.18798 2017 0
"IQ10059716"   16.80082  1.94413   83.10289 2018 0
"IQ100601859"  59.32227 15.65512  200.16184 2013 0
"IQ100601859"  69.46581   9.0277  250.01989 2014 0
"IQ100601859"   76.5801 13.21964  289.17414 2015 0
"IQ100601859"  57.17937  9.38205  265.69254 2016 0
"IQ100601859"  45.53861  6.08402  329.99212 2017 0
"IQ100601859"  41.12032  5.39557  338.10262 2018 0
"IQ10063218"     .10101   .55322   16.63223 2016 0
"IQ10063218"     .13055  1.21617   18.80271 2017 0
"IQ10063218"     .09417  1.42922   15.64643 2018 0
"IQ1007880"        44.6    175.1     3016.5 2014 0
"IQ1007880"        39.5    168.1     2765.1 2015 0
"IQ102377327"    16.528    7.584     63.839 2013 0
"IQ102377327"    12.146    3.755     40.429 2014 0
"IQ102377327"    10.037    3.182     29.465 2015 0
"IQ102377327"    12.977    3.546     35.056 2016 0
"IQ102377327"    19.175    3.427     66.001 2018 0
"IQ1027520"      .13204 39.70413  174.29746 2013 1
"IQ1027520"      .10263 40.96519  164.38773 2014 1
"IQ1027520"       .1358  47.1315  206.26628 2015 1
"IQ1027520"       .1389 44.97404   207.2047 2016 1
"IQ1027520"      .10174 54.73586  267.06623 2017 1
"IQ1027520"      .09828 58.96796  302.21079 2018 1
"IQ10332217"    3.54973    3.542   10.19086 2013 0
"IQ10332217"     4.0283   3.5752   11.44259 2014 0
"IQ10332217"    4.58241  3.68841   12.71978 2015 0
"IQ10332217"    5.49894   4.2628   14.14477 2016 0
"IQ10332217"    4.99721  4.58772   15.03865 2017 0
"IQ10332217"    4.97193  4.85585   16.30446 2018 0
"IQ1034090"           .    348.4     1094.6 2013 0
"IQ1034090"           .    326.1     1031.5 2014 0
"IQ1034090"           .    300.8     1086.3 2015 0
"IQ1034090"           .    339.8     1074.5 2016 0
"IQ1034090"           .    364.9     1149.9 2017 0
"IQ1034090"           .    335.4     1197.6 2018 0
"IQ1034269"       2.198    1.781     18.498 2013 0
"IQ1034269"        .067     .465       7.56 2014 0
end
------------------ copy up to and including the previous line ------------------

Listed 50 out of 4887 observations

.



Best,
Mahmoud

Error in converting time (hourly) variable

Hi. I have hourly level data with the format, "2010123123" representing "23:00 Dec 31, 2010". I can convert it into time variable by using "double" but there is a problem for some observations. The data show "00:00 Jan 1, 2011" as 2010123124 and Stata does not automatically change it to the midnight next day and generate an empty cell, shown as below. I would really appreciate it if you could help me to correctly convert it into the next day format. Thank you very much!

Array

taking a number from a returned matrix?

Dear All, I run the following xthreg (please search xthreg, and install) command
Code:
xthreg i q1 q2 q3 d1 qd1, rx(cf1) qx(d1) thnum(1) grid(400) trim(0.01) bs(100)
matrix f = e(Fstat)
matrix list f  
/*
f[1,7]
              RSS        MSE      Fstat       Prob     Crit10      Crit5      Crit1
Single  17.781836  .00225201  35.196397          0  12.908204  15.651497  21.814463
*/
outreg2 using "thrpd", word dec(4) adds(F, 35.196397) adec(3) ctitle(linear) replace
Note that you have to (ssc) install outreg2.

My question is: how can I using something like f[1,3] (the is not correct, but what is the correct command?) instead of 35.196397 in the `adds(F, 35.196397)' option?

Sub-group meta-analysis in Stata

I have two question:
1: Does Stata allow
sub-group meta-analysis using
mixed effects ? If there is a code available, Please share.
2. How can I get Q-value and its it corresponding p-value for difference between subgroups in meta-analysis. What is the code for that? Many thanks for your support. I am doing a meta-analysis of prevalence studies, but if you have a answer/ code for any other kind of studies, that is fine too.

Reshaping Datastream Data (Currency) for Stata

I have been trying to transpose Datastream data by date. I know how to do this in SAS but I am having trouble replicating this on STATA. I have provided a dataex sample below


* Example generated by -dataex-. To install: ssc install dataex
clear
input float(date uk australia brazil bulgaria)
-1065 2.7982 . . .
-1037 2.7975 . . .
-1008 2.7875 . . .
-976 2.7912 . . .
-945 2.7918 . . .
-917 2.7912 . . .
-884 2.785 . . .
-854 2.7832 . . .
-823 2.7956 . . .
-792 2.7968 . . .
-763 2.8062 . . .
-731 2.8594 . . .
-700 2.8162 . . .
-672 2.8162 . . .
-641 2.8162 . . .
-611 2.8182 . . .
-581 2.8168 . . .
-550 2.8032 . . .
-519 2.805 . . .
-490 2.8018 . . .
-458 2.8068 . . .
-427 2.8062 . . .
-399 2.8044 . . .
-366 2.8025 . . .
-336 2.8094 . . .
-308 2.8055 . . .
-276 2.8162 . . .
-246 2.8162 . . .
-217 2.8118 . . .
-185 2.8138 . . .
-154 2.8112 . . .
-123 2.8056 . . .
-93 2.8044 . . .
-63 2.8056 . . .
-32 2.8018 . . .
-1 2.8 . . .
28 2.8025 . . .
59 2.8038 . . .
90 2.8075 . . .
119 2.8088 . . .
151 2.8025 . . .
181 2.8069 . . .
210 2.8089 . . .
243 2.8113 . . .
273 2.8106 . . .
304 2.8156 . . .
334 2.8106 . . .
364 2.8038 . . .
396 2.8044 . . .
424 2.7956 . . .
455 2.7975 . . .
483 2.7962 . . .
516 2.7919 . . .
546 2.7881 . . .
577 2.795 . . .
608 2.8056 . . .
637 2.8144 . . .
669 2.815 . . .
699 2.8125 . . .
728 2.8081 . . .
761 2.8119 . . .
789 2.815 . . .
819 2.8138 . . .
850 2.8125 . . .
881 2.81 . . .
910 2.8088 . . .
942 2.805 . . .
973 2.8019 . . .
1001 2.8012 . . .
1034 2.8012 . . .
1064 2.8031 . . .
1095 2.8025 . . .
1126 2.8044 . . .
1154 2.8025 . . .
1183 2.7995 . . .
1215 2.8006 . . .
1246 2.8002 . . .
1274 2.8009 . . .
1307 2.8007 . . .
1337 2.7979 . . .
1368 2.7977 . . .
1399 2.7981 . . .
1428 2.7973 . . .
1460 2.7966 . . .
1491 2.7978 . . .
1519 2.7982 . . .
1551 2.7984 . . .
1581 2.7999 . . .
1610 2.7979 . . .
1642 2.7917 . . .
1673 2.7882 . . .
1704 2.7839 . . .
1734 2.7833 . . .
1764 2.785 . . .
1795 2.7912 . . .
1826 2.7901 1.1118151 . .
1855 2.792 1.1120847 . .
1883 2.7941 1.1135865 . .
1916 2.7905 1.1125064 . .
1946 2.7991 1.1139809 . .
end

I am trying to transpose the data so that the data is sorted by date and there is an ID called 'country' for the column variables which are currently listed as country names. This is how the datastream currency data appears in wide form. I tried doing this using the reshape long command but nothing seems to work. I've also uploaded the raw dataset in csv form if one needs more info about the dataset.

Thank you in advance.

Productivity effects

Dear Statalist users,

I am trying to examine the impact of export participation of the firm on firm productivity. To this end, I want to examine how the productivity of the firm changes one year from its participation in export market, two years after participation and so on. To understand if the effect of participating in the export market on its productivity dissipate or accelerate over time. I have an unbalanced panel for the period 2001-2013. TFP is my dependent variable and export status (t1) is my independent variable which is a dummy. I think I need to proceed with a psm-did but I am not able to understand its operation for my dataset.

If I could get some inputs on how to achieve this in stata would be much appreciated

A small sample of my data

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double id float(year t1 ltfp)
 365 2001 0 .39686665
 365 2002 1  .3967393
 365 2003 1  .6374343
 365 2004 0  .6560543
 365 2005 1  .6640503
 365 2006 1  .7698439
 365 2007 0  .8161478
 365 2008 0    .87566
 365 2009 0 1.1369226
 365 2010 0 1.2058983
 365 2011 0 1.0512627
 365 2012 0 1.0222095
 365 2013 0  .9116122
 381 2001 1 .28473413
 381 2002 1  .2790297
 381 2003 1  .4228754
 381 2004 1  .4589705
 381 2005 1   .443521
 381 2006 1  .6932002
 381 2007 1 .51895165
 381 2008 1  .4412919
 381 2009 1  .4278684
 381 2010 1  .4917918
 569 2001 0  .4084495
 569 2002 0 .32361555
 569 2003 0   .312629
 569 2004 0  .3549544
 569 2005 0  .4214703
 569 2006 0  .5034326
 569 2007 0 .52623117
 569 2008 0 .58975697
 569 2009 0 .55094194
 569 2010 0  .3773664
 595 2006 0  .2977381
 595 2007 0  .3644855
 595 2008 0  .3293638
 595 2009 0  .3146657
 595 2010 0  .3185207
 595 2011 0  .4280073
 595 2012 0  .5122329
 595 2013 0  .4275252
 600 2009 0  .7721639
 600 2010 0   .787946
 600 2011 0  .6881346
 600 2012 0  .6645668
 783 2001 0   .729115
 783 2002 0  .7449946
 783 2003 0  .7614059
 783 2004 0   .870962
 783 2005 0  .8693857
 783 2006 0  .8836148
 783 2007 0 1.0283844
 783 2008 0 1.0673563
 783 2009 0 1.2047342
 783 2010 0 1.2074325
 783 2011 0 1.2030094
 783 2012 0  1.209522
 783 2013 0  1.193356
 870 2010 0  .4383901
 934 2004 0  .6618422
 934 2005 0   .677335
 934 2006 0   .698532
 934 2007 0  .6574489
 934 2008 0   .602454
 934 2009 0  .9306454
1120 2001 1 1.0948371
1120 2002 1 1.1048775
1120 2003 1 1.2104284
1120 2004 1 1.1987357
1120 2005 1 1.3653517
1120 2006 1  1.479699
1120 2007 1  1.614259
1120 2008 1 1.8814026
1120 2009 1 2.1619618
1120 2010 1  1.977049
1120 2011 1  2.043859
1120 2012 1  2.134581
1120 2013 1 2.2729049
1621 2007 0  .3721843
1621 2008 0  .4280191
1621 2009 0   .435424
1621 2010 0  .4076315
1621 2011 0  .3898215
1621 2012 0  .4241333
1954 2009 0  .4050439
1954 2010 0 .41772455
2015 2004 0 .17348923
2015 2005 0 .25310284
2015 2006 0  .3803225
2015 2007 0  .4936106
2015 2008 0  .5817891
2015 2009 0  .4436422
2015 2010 0 .42046455
2015 2011 0 .51623964
2015 2012 0  .5415064
2015 2013 0  .4652313
2216 2001 1  .3285122
2216 2002 0  .3717326
2216 2003 0    .40993
2216 2004 0  .3402351
end

Univariate analysis

Hello,

So I want to apply this univariate test, subdividing my sample into high and low groups by the mean values of independent variables to report the differences in the dependent variable.

My question's How to subdivide a sample using such condition and what is the command in STATA to do so and apply the test.

My literature reference is Thanatawee, Y. (2019). ‘Foreign Institutional Ownership and Liquidity: Evidence from Thailand’. ABAC Journal, Vol.39, No.4, Forthcoming. pp. 1-21.

Thank you!

Is it okay to use multiple imputation to create imputed values in a second dataset?

Say I have two datasets A & B. They have the same variables in each except that set A has one additional variable.

Would it be valid to impute the variable for dataset B, and then to only conduct analysis on dataset B?

For example:

Code:
webuse lbw, clear
set seed 12345
generate wave = round(uniform(),1)
replace bwt = . if wave==1 //artificially creating missing data in half of dataset

mi set mlong
mi register imputed bwt
mi impute chained (pmm, knn(5)) bwt = age lwt i.race, add(5) rseed(12345)
mi estimate : glm age lwt i.race bwt if wave==1 //command runs, but is only conducted on part of sample where all data was missing originally

Using outreg2 to export Heckman results to LaTeX

Hi, I'm having problems using outreg2 to export the results from a Heckman model to LaTeX (same problem occurs when exporting to a .doc file). The following code produces a hypothetical Heckman model and exports the results to LaTex. For some reason, the 3rd column replicates the 2nd column. I would like to only keep column 1 and 2.

A similar problem, but no solution, was posted here: https://www.statalist.org/forums/for...ach-regression

clear
input yvar xvar1 xvar2
3 2 1
4 0 3
. 3 1
. 1 2
2 7 0
6 0 1
. 3 7
0 1 1
. 2 2
1 1 5
end

heckman yvar xvar1 xvar2, select (xvar1 xvar2) vce(robust) iterate(10)
outreg2 using "test_outreg2.tex", replace tex(pretty) keep(xvar1 xvar2)

Creating Lead variables with respect to month (1, 3, 6) when I have daily data

hello.
I'm working on the U.S. stock market.
I have entire stock market data ( thousands of stocks). It's panel data( sorted by firms and dates).
My question is how I can create lead variables with respect to month when I have daily data. I should not collapse data since I have to use daily return later.
In detail, I have past 1 month return on each month of each stock. I should create future return which is dragged from the past return.
For example, If 1 month past return on Feb of firm A is 0.1, I have to make a new variable (future return) on Jan of firm A 0.1.
1 month past return of Feb 01, 02, 03, 04..(they are same). into 1 month past return of Jan 01, 02, 03, 04...(they are same)
I created 1 month mean past returns and they are same within the same month. For example, past_ret_1 of Jan 01is same as past_ret_1 of Jan 02, Jan 03, Jan 04...
So, I have to put past_ret_1 of Feb 01, Feb 02, Feb 03 which are same into past_ret_1 of Jan 01, 02, 03....

permno is the firm code
date is date
past_ret_1 is 1 month past return
mdate is month identifier.
My data looks like below and I add my dataex below too. I have to create B's from Bs (B and B' are same)
permno date mdate past_ret_1 future_ret1
10001 Jan 01 2019 2019m1 A B'
10001 Jan 02 2019 2019m1 A B'
10001 Jan 03 2019 2019m1 A B'
... ...
10001 Feb 01 2019 2019m2 B
10001 Feb 02 2019 2019m2 B
... ... ...
10002 2019m1 0.1
10002 2019m1 0.1
My dataex is

----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double permno long date double prc int mdate float(past_ret_1 past_ret_3 past_ret_6)
10001 20457 7.519999980926514 672           .          .          .
10001 20458 7.420000076293945 672           .          .          .
10001 20459  7.53000020980835 672           .          .          .
10001 20460 7.639999866485596 672           .          .          .
10001 20461 7.789999961853027 672           .          .          .
10001 20464   8.0600004196167 672           .          .          .
10001 20465              8.25 672           .          .          .
10001 20466              8.25 672           .          .          .
10001 20467 8.130000114440918 672           .          .          .
10001 20468 7.989999771118164 672           .          .          .
10001 20472 8.119999885559082 672           .          .          .
10001 20473 8.079999923706055 672           .          .          .
10001 20474 8.100000381469727 672           .          .          .
10001 20475 8.220000267028809 672           .          .          .
10001 20478                 8 672           .          .          .
10001 20479 8.100000381469727 672           .          .          .
10001 20480 8.199999809265137 672           .          .          .
10001 20481 8.199999809265137 672           .          .          .
10001 20482 8.319999694824219 672           .          .          .
10001 20485 8.420000076293945 673   .11968087  .11968087  .11968087
10001 20486 8.489899635314941 673   .11968087  .11968087  .11968087
10001 20487 8.260000228881836 673   .11968087  .11968087  .11968087
10001 20488                 8 673   .11968087  .11968087  .11968087
10001 20489  8.34000015258789 673   .11968087  .11968087  .11968087
10001 20492 8.630000114440918 673   .11968087  .11968087  .11968087
10001 20493 8.210000038146973 673   .11968087  .11968087  .11968087
10001 20494 8.300000190734863 673   .11968087  .11968087  .11968087
10001 20495 8.050000190734863 673   .11968087  .11968087  .11968087
10001 20496 8.300000190734863 673   .11968087  .11968087  .11968087
10001 20500              8.25 673   .11968087  .11968087  .11968087
10001 20501 8.199999809265137 673   .11968087  .11968087  .11968087
10001 20502 8.100000381469727 673   .11968087  .11968087  .11968087
10001 20503 7.900000095367432 673   .11968087  .11968087  .11968087
10001 20506 7.860000133514404 673   .11968087  .11968087  .11968087
10001 20507 7.940000057220459 673   .11968087  .11968087  .11968087
10001 20508 7.880000114440918 673   .11968087  .11968087  .11968087
10001 20509 7.940000057220459 673   .11968087  .11968087  .11968087
10001 20510 7.840000152587891 673   .11968087  .11968087  .11968087
10001 20513 7.860000133514404 673   .11968087  .11968087  .11968087
10001 20514 8.300000190734863 674 -.014251768  .10372343  .10372343
10001 20515  8.15999984741211 674 -.014251768  .10372343  .10372343
10001 20516 8.199999809265137 674 -.014251768  .10372343  .10372343
10001 20517 8.199999809265137 674 -.014251768  .10372343  .10372343
10001 20520   8.1899995803833 674 -.014251768  .10372343  .10372343
10001 20521 8.079999923706055 674 -.014251768  .10372343  .10372343
10001 20522 8.029999732971191 674 -.014251768  .10372343  .10372343
10001 20523 8.119999885559082 674 -.014251768  .10372343  .10372343
10001 20524                 8 674 -.014251768  .10372343  .10372343
10001 20527 7.980000019073486 674 -.014251768  .10372343  .10372343
10001 20528 7.900000095367432 674 -.014251768  .10372343  .10372343
10001 20529 8.069999694824219 674 -.014251768  .10372343  .10372343
10001 20530 8.119999885559082 674 -.014251768  .10372343  .10372343
10001 20531 8.199999809265137 674 -.014251768  .10372343  .10372343
10001 20534 8.199999809265137 674 -.014251768  .10372343  .10372343
10001 20535 8.199999809265137 674 -.014251768  .10372343  .10372343
10001 20536 8.239999771118164 674 -.014251768  .10372343  .10372343
10001 20537 8.079999923706055 674 -.014251768  .10372343  .10372343
10001 20541 7.900000095367432 674 -.014251768  .10372343  .10372343
10001 20542 7.889999866485596 674 -.014251768  .10372343  .10372343
10001 20543 7.889999866485596 674 -.014251768  .10372343  .10372343
10001 20544 7.809999942779541 674 -.014251768  .10372343  .10372343
10001 20545 7.800000190734863 675  -.06024096  .03723407  .03723407
10001 20548 7.769999980926514 675  -.06024096  .03723407  .03723407
10001 20549 7.230000019073486 675  -.06024096  .03723407  .03723407
10001 20550 7.079999923706055 675  -.06024096  .03723407  .03723407
10001 20551 7.369999885559082 675  -.06024096  .03723407  .03723407
10001 20552 7.579999923706055 675  -.06024096  .03723407  .03723407
10001 20555 7.769999980926514 675  -.06024096  .03723407  .03723407
10001 20556 7.869999885559082 675  -.06024096  .03723407  .03723407
10001 20557 7.800000190734863 675  -.06024096  .03723407  .03723407
10001 20558 7.599999904632568 675  -.06024096  .03723407  .03723407
10001 20559 7.449999809265137 675  -.06024096  .03723407  .03723407
10001 20562  7.21999979019165 675  -.06024096  .03723407  .03723407
10001 20563 7.309999942779541 675  -.06024096  .03723407  .03723407
10001 20564 7.179999828338623 675  -.06024096  .03723407  .03723407
10001 20565                 7 675  -.06024096  .03723407  .03723407
10001 20566 7.179999828338623 675  -.06024096  .03723407  .03723407
10001 20569 7.199999809265137 675  -.06024096  .03723407  .03723407
10001 20570 7.349999904632568 675  -.06024096  .03723407  .03723407
10001 20571  7.28000020980835 675  -.06024096  .03723407  .03723407
10001 20572 7.329999923706055 675  -.06024096  .03723407  .03723407
10001 20573 7.300000190734863 675  -.06024096  .03723407  .03723407
10001 20576 6.909999847412109 676   -.1141026 -.17933494 -.08111704
10001 20577 7.039999961853027 676   -.1141026 -.17933494 -.08111704
10001 20578 7.039999961853027 676   -.1141026 -.17933494 -.08111704
10001 20579 7.014999866485596 676   -.1141026 -.17933494 -.08111704
10001 20580 6.880000114440918 676   -.1141026 -.17933494 -.08111704
10001 20583 6.889999866485596 676   -.1141026 -.17933494 -.08111704
10001 20584  6.96999979019165 676   -.1141026 -.17933494 -.08111704
10001 20585 7.050000190734863 676   -.1141026 -.17933494 -.08111704
10001 20586 7.099999904632568 676   -.1141026 -.17933494 -.08111704
10001 20587 7.150000095367432 676   -.1141026 -.17933494 -.08111704
10001 20590 7.119999885559082 676   -.1141026 -.17933494 -.08111704
10001 20591 7.139999866485596 676   -.1141026 -.17933494 -.08111704
10001 20592 7.059999942779541 676   -.1141026 -.17933494 -.08111704
10001 20593 6.900000095367432 676   -.1141026 -.17933494 -.08111704
10001 20594 6.880000114440918 676   -.1141026 -.17933494 -.08111704
10001 20597 6.900000095367432 676   -.1141026 -.17933494 -.08111704
10001 20598 6.900000095367432 676   -.1141026 -.17933494 -.08111704
10001 20599 6.949999809265137 676   -.1141026 -.17933494 -.08111704
end
format %d date
format %tm mdate
------------------ copy up to and including the previous line ------------------

Listed 100 out of 5502429 observations
Use the count() option to list more