I'm working with some raw financial data from a local business covering about 4 years from 2015-2019. I haven't worked with a daily delta very often but noticed CPI and all other related indices don't provide daily. What's the best way of going about this, without converting it all to a lower frequency? Is it possible? It's not a terribly long range so it's possible I can run the analysis without discounting anything. I'll be running a regression afterwards.
I appreciate any help.
Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Saturday, August 31, 2019
LD calculation by logit regression
Hello,
I would like to know how I can calculate the LD50, LD90 and LD 95 (lethal dose) for bioassays using logit regression?
I found probit regression commands, however I would like to apply the formula to STATA programming, please could someone help me?
1) For LD50 in probit and logit regression the formula is the same:
local LD50 LD50: exp (-_ b [_cons] / _ b [logdose])
2) However, for the other LD there is a change:
I found the links described at the end for the probit model as follows:
Probit: LDp = (invnormal (p) -constant) / coef
STATA command:
local ld90 LD90: exp ((invnorm (0.9) -_ b [_cons]) / _ b [logdose])
However, for the logit model the formula recommends is:
Logit: LDp = (log (p / (1-p)) - constant) / coef
How to program the above command in stata for Logit regression?
Another question is about normality: If the distribution is not normal, should I calculate LD from the Logit regression? I have read in some places that if the distribution is not parametric, logit regression should be applied for LD.
Would you have another model suggestion for calculating LD with non-normal distribution?
Thanks
LINKS:
https://www.stata.com/statalist/arch.../msg00184.html
https://www.statalist.org/forums/for...r-logit-probit
https://stats.idre.ucla.edu/other/mu...git-or-probit/
I would like to know how I can calculate the LD50, LD90 and LD 95 (lethal dose) for bioassays using logit regression?
I found probit regression commands, however I would like to apply the formula to STATA programming, please could someone help me?
1) For LD50 in probit and logit regression the formula is the same:
local LD50 LD50: exp (-_ b [_cons] / _ b [logdose])
2) However, for the other LD there is a change:
I found the links described at the end for the probit model as follows:
Probit: LDp = (invnormal (p) -constant) / coef
STATA command:
local ld90 LD90: exp ((invnorm (0.9) -_ b [_cons]) / _ b [logdose])
However, for the logit model the formula recommends is:
Logit: LDp = (log (p / (1-p)) - constant) / coef
How to program the above command in stata for Logit regression?
Another question is about normality: If the distribution is not normal, should I calculate LD from the Logit regression? I have read in some places that if the distribution is not parametric, logit regression should be applied for LD.
Would you have another model suggestion for calculating LD with non-normal distribution?
Thanks
LINKS:
https://www.stata.com/statalist/arch.../msg00184.html
https://www.statalist.org/forums/for...r-logit-probit
https://stats.idre.ucla.edu/other/mu...git-or-probit/
How to solve the problem of endogeneity of a policy (gravity model)?
Hi everyone,
I analyze the impact of an agreement concluded between 4 countries on the mobility of people in these 4 countries. The agreement came into force and the same time for the 4 countries in 1995. In addition to these 4 countries, I added another 15 countries as a control group. I use a matrix of 19 countries of origin and 19 countries of destination (matrix 19 * 19).
The database covers the periods from 1990 to 2000.
lnN(ijt) : number of people from country i who go to country j to time t
lngdp(it) : log of gdp (constant 2000 US$) in the country of origin
lngdp(jt): log of gdp (constant 2000 US$) in the country of destination
lndist(ij): Log of distance between origin and destination countries
contig(ij): common border between origin and destination countries
lang(ij): common language in origin and destination countries
lnstock1990(ij): stock of migrants from country i and country j at year 1990
bothcountry1995 = 1 if i and j are both members of agreement at time t and 0 otherwise
onecountry1995 = 1 if the country (i) belongs to agreement but the country (j) does not
l(it): time-varying origin-country dummies which control for multilateral resistances
l(jt): time-varying destination-country dummies that account for multilateral resistances
l(ij): captures the country-pair fixed effects
The augmented gravity equation is:
LnN(ijt) = bo + b1lngdp(it) + b2lngdp(jt) + b3lndist(ij) + b4conti(ij) + b5lang(ij)+ b6stock(ij) +b7bothcountry1995(ijt) + b8onecountry1995(ijt) + e(ijt)
The variables created in stata are the following:
gen logN_ij=log(N_ij)
gen loggdp_i=log(gdp_i)
gen loggdp_j=log(gdp_j)
gen logdist_ij=log(dist_ij)
gen logstock1990_ij=log(stock1990_ij)
egen orig_time= group(origin year)
tabulate orig_time, generate(ORIGIN_TIME_FE)
egen dest_time = group(destination year)
tabulate dest_time, generate(DESTINATION_TIME_FE)
egen pair_id = group(origin destination)
tabulate pair_id, generate(PAIR_FE)
egen id = group(origin destination)
tsset id year
1) To solve the problem of endogeneity of the agreement, should I include time-varying origin-country dummies which control for multilateral resistances, time-varying destination-country dummies that account for multilateral resistances and the country-pair fixed effects ? or , should I include country-pair fixed effects and year effects ?
2) Do I use vce (robust) or cluster (pair_id)?
xi:xtreg logN_ij
reg logN_ij bothcountry1995 onecountry1995 PAIR_FE* ORIGIN_TIME_FE* DESTINATION_TIME_FE*, cluster(pair_id)
xi:xtreg logN_ij
reg logN_ij bothcountry1995 onecountry1995 PAIR_FE* i.year, cluster(pair_id) (country-pair fixed effects and year effects)
I analyze the impact of an agreement concluded between 4 countries on the mobility of people in these 4 countries. The agreement came into force and the same time for the 4 countries in 1995. In addition to these 4 countries, I added another 15 countries as a control group. I use a matrix of 19 countries of origin and 19 countries of destination (matrix 19 * 19).
The database covers the periods from 1990 to 2000.
lnN(ijt) : number of people from country i who go to country j to time t
lngdp(it) : log of gdp (constant 2000 US$) in the country of origin
lngdp(jt): log of gdp (constant 2000 US$) in the country of destination
lndist(ij): Log of distance between origin and destination countries
contig(ij): common border between origin and destination countries
lang(ij): common language in origin and destination countries
lnstock1990(ij): stock of migrants from country i and country j at year 1990
bothcountry1995 = 1 if i and j are both members of agreement at time t and 0 otherwise
onecountry1995 = 1 if the country (i) belongs to agreement but the country (j) does not
l(it): time-varying origin-country dummies which control for multilateral resistances
l(jt): time-varying destination-country dummies that account for multilateral resistances
l(ij): captures the country-pair fixed effects
The augmented gravity equation is:
LnN(ijt) = bo + b1lngdp(it) + b2lngdp(jt) + b3lndist(ij) + b4conti(ij) + b5lang(ij)+ b6stock(ij) +b7bothcountry1995(ijt) + b8onecountry1995(ijt) + e(ijt)
The variables created in stata are the following:
gen logN_ij=log(N_ij)
gen loggdp_i=log(gdp_i)
gen loggdp_j=log(gdp_j)
gen logdist_ij=log(dist_ij)
gen logstock1990_ij=log(stock1990_ij)
egen orig_time= group(origin year)
tabulate orig_time, generate(ORIGIN_TIME_FE)
egen dest_time = group(destination year)
tabulate dest_time, generate(DESTINATION_TIME_FE)
egen pair_id = group(origin destination)
tabulate pair_id, generate(PAIR_FE)
egen id = group(origin destination)
tsset id year
1) To solve the problem of endogeneity of the agreement, should I include time-varying origin-country dummies which control for multilateral resistances, time-varying destination-country dummies that account for multilateral resistances and the country-pair fixed effects ? or , should I include country-pair fixed effects and year effects ?
2) Do I use vce (robust) or cluster (pair_id)?
xi:xtreg logN_ij
loggdp_i
loggdp_j
logstock1990_ij logdist_ij lang_ij contig_ij bothcountry1995 onecountry1995 ORIGIN_TIME_FE* DESTINATION_TIME_FE*, fe vce (robust)reg logN_ij bothcountry1995 onecountry1995 PAIR_FE* ORIGIN_TIME_FE* DESTINATION_TIME_FE*, cluster(pair_id)
xi:xtreg logN_ij
loggdp_i
loggdp_j
logstock1990_ij logdist_ij lang_ij contig_ij bothcountry1995 onecountry1995 i.year, fe vce (robust) (country-pair fixed effects and year effects)reg logN_ij bothcountry1995 onecountry1995 PAIR_FE* i.year, cluster(pair_id) (country-pair fixed effects and year effects)
Thank you so much for your help in advance
Matching variables in CEM.
Hi all - I am using CEM in stata and have a quick question. I am using four variables in my dataset for matching and using CEM weights in the regression model as control. When I include the four matching variables in the regression as well, I find that the two of the variables are significant. Is that odd that these variables are significant even though they were used for matching/balancing? Let me know what could be going on or if you need more details. Thanks, Rajat.
Checking Residuals after running Mswitch
I am working on the Markov Switching Model (using mswitch command), I have calculated standardized residuals. How can I calculate p values of Normality, ARCH(1) and Autocorrelation of standardized residuals in Stata to compare different models?
Construct matrix counting pairwise agreements
I'm interested in a set of problems in which the data from each individual is a vector of integer-valued responses, and I want to know the number of times that each distinct pair of individuals gives responses that agree with each other. Thus, I want an agreement matrix A, where A[i,j] contains the number of responses on which subject i agrees with (is identical to) subject j. It's like a similarity matrix, where similarity is the number of agreements across the response vector. I could not figure out a way to do this with - matrix dissimilarity-, which seems to implement a matching option only for binary data. I found no reasonable way to create this agreement matrix in Stata per se, and I need to work with it in Mata anyway.
The Mata code below is the best I came up with for speed. I'm interested in any suggestions or critique. Avoiding looping over measurements certainly helps, and it would be nice to avoid explicitly looping over observations as well.
The Mata code below is the best I came up with for speed. I'm interested in any suggestions or critique. Avoiding looping over measurements certainly helps, and it would be nice to avoid explicitly looping over observations as well.
Code:
// Simulate data
set seed 1243
mata mata clear
local nmeas = 15 // number of measurements
local top = 4 // measurements go from 1, ... top
local nobs = 500
local nreps = 50 // for timing only
mata: X = ceil(`top ' * runiform(`nobs', `nmeas'))
//
// Count up agreements
mata:
N = rows(X)
A = J(N, N, .)
timer_clear(1)
timer_on(1)
for (reps = 1; reps <= `nreps' ; reps++) { // outer loop just for timing
for (i = 1; i <= N - 1; i++) {
for (j = i+1; j <= N; j++) {
A[i,j] = rowsum(X[i,.] :== X[j,.]) // avoid loop over measurements
}
}
}
timer_off(1)
timer()
end
Rolling standard dev
I have an unbalanced panel dataset with many (15000+) firms and their daily stock returns from 1960-2018.
From these daily stock returns I want to calculate the one-year "rolling standard deviation" (according to the paper I'm following).
I have managed to come up with the command using -rangestat- and I'm just wondering if I'm doing it correctly as there is no other way for me to check whether the resulting standard deviations from rangestat are correct.
I have also noticed a couple of things while using -rangestat-:
I use the count option to see how many observations it uses and while sometimes it uses 1 observation (this is fine), in other times it uses 2 observations which I'm not sure why it's doing that since I want to use one day returns (the previous day).
Also the standard deviation is sometimes 0 and other times " . " (missing value), I wonder why this is and how I should interpret the difference between these 2 in the output of the rangestat command?
I will post below an example of the data I'm using, but since there is so many observations I cannot post a completely representative sample here. So I will just go with the returns of 1 company (Apple):
note: permco is simply a unique identifier that each company in the dataset has, a unique number per company.
I then use the following command:
rangestat (sd) Returns (count) Returns, interval (date -1 0) by (permco)
Thanks in advance
From these daily stock returns I want to calculate the one-year "rolling standard deviation" (according to the paper I'm following).
I have managed to come up with the command using -rangestat- and I'm just wondering if I'm doing it correctly as there is no other way for me to check whether the resulting standard deviations from rangestat are correct.
I have also noticed a couple of things while using -rangestat-:
I use the count option to see how many observations it uses and while sometimes it uses 1 observation (this is fine), in other times it uses 2 observations which I'm not sure why it's doing that since I want to use one day returns (the previous day).
Also the standard deviation is sometimes 0 and other times " . " (missing value), I wonder why this is and how I should interpret the difference between these 2 in the output of the rangestat command?
I will post below an example of the data I'm using, but since there is so many observations I cannot post a completely representative sample here. So I will just go with the returns of 1 company (Apple):
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long date str8 ticker str36 comnam double permco float Returns 8417 "AAPL" "APPLE COMPUTER INC" 7 .03319269 8418 "AAPL" "APPLE COMPUTER INC" 7 -.022553137 8419 "AAPL" "APPLE COMPUTER INC" 7 .007132721 8420 "AAPL" "APPLE COMPUTER INC" 7 .1054022 8421 "AAPL" "APPLE COMPUTER INC" 7 -.00033 8424 "AAPL" "APPLE COMPUTER INC" 7 -.0588665 8425 "AAPL" "APPLE COMPUTER INC" 7 .03793554 8426 "AAPL" "APPLE COMPUTER INC" 7 .03980917 8427 "AAPL" "APPLE COMPUTER INC" 7 .06625561 8428 "AAPL" "APPLE COMPUTER INC" 7 .005786227 8431 "AAPL" "APPLE COMPUTER INC" 7 -.0033834374 8432 "AAPL" "APPLE COMPUTER INC" 7 .02086082 8433 "AAPL" "APPLE COMPUTER INC" 7 .026269455 8434 "AAPL" "APPLE COMPUTER INC" 7 .03968534 8435 "AAPL" "APPLE COMPUTER INC" 7 -.014424606 8438 "AAPL" "APPLE COMPUTER INC" 7 -.04090528 8439 "AAPL" "APPLE COMPUTER INC" 7 -.009235363 end format %d date
note: permco is simply a unique identifier that each company in the dataset has, a unique number per company.
I then use the following command:
rangestat (sd) Returns (count) Returns, interval (date -1 0) by (permco)
Thanks in advance
Dropping all panel observations if any one of them have missing value for any one of the variables
Hi there,
I have a firm-level data for two years. I want to drop a firm entirely (both years' observations) if there is missing value in any one year for any one of the three variables, var1, var2, and var3.
In this case, firm 2 and 3 will be removed entirely (all four lines of data), and only the data for firm 1 and 4 will remain.
I was going for something like:
Obviously, this is flawed. Any suggestions would be highly appreciated, thanks!
I have a firm-level data for two years. I want to drop a firm entirely (both years' observations) if there is missing value in any one year for any one of the three variables, var1, var2, and var3.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte(id year var1 var2 var3) 1 1 17 1 3 1 2 19 1 3 2 1 20 0 3 2 2 35 1 . 3 1 . 0 1 3 2 18 0 5 4 1 19 1 4 4 2 14 0 2 end
I was going for something like:
Code:
foreach var in var1 var2 var3 {
drop _n _n+1 if missing(`v') & year==1
drop _n _n-1 if missing (`v') & year==2
}
Creating Mills Ratios
Hi all,
This is my first post here so I apologise in advance if I've broken any rules, and I also apologise if this is a very simple question, as I am new to Stata.
To give some background, I am attempting to see whether or not alcohol consumption effects the level of income one earns. Observations have been split into 3 groups within one variable: Heavy drinkers, moderate drinkers and non-drinkers.
My plan is to first run a multinomial logit with the drinking variable as the dependent variable and moderate drinker as the base. I am doing this in order to obtain the mills ratios for each of the types of drinker, which will then be used in 3 OLS regressions with log income as the dependent variable in order to correct for selection bias.
My question is how do I obtain these Mills ratios?
This is my first post here so I apologise in advance if I've broken any rules, and I also apologise if this is a very simple question, as I am new to Stata.
To give some background, I am attempting to see whether or not alcohol consumption effects the level of income one earns. Observations have been split into 3 groups within one variable: Heavy drinkers, moderate drinkers and non-drinkers.
My plan is to first run a multinomial logit with the drinking variable as the dependent variable and moderate drinker as the base. I am doing this in order to obtain the mills ratios for each of the types of drinker, which will then be used in 3 OLS regressions with log income as the dependent variable in order to correct for selection bias.
My question is how do I obtain these Mills ratios?
How to create variable of relative change?
Dear experts,
I have a dataset including monthly observations from June 2017 - June 2019, where I would like to calculate the 12-month relative change in price over a year for a specific product (i.e. pit=(pit/pit-12)). Whit the first possible number to get will then be June 2018.
Variable y_m is year and month such as 1804 corresponds to April 2018.
Any good advice on how to do this in STATA?
This is how the data looks like:
Array
Thank you,
Sofie
I have a dataset including monthly observations from June 2017 - June 2019, where I would like to calculate the 12-month relative change in price over a year for a specific product (i.e. pit=(pit/pit-12)). Whit the first possible number to get will then be June 2018.
Variable y_m is year and month such as 1804 corresponds to April 2018.
Any good advice on how to do this in STATA?
This is how the data looks like:
Array
Thank you,
Sofie
Annualising coefficients in a loop
Hi all,
I am currently working on project about volatility timing.
The data I am working with is daily and I am not entirely sure what I can do to annualise the coefficients from the loop. The coefficients I need to annualise are alpha(constant) and beta and if possible the t-stat from alpha.
foreach var of varlist United_States-Global_Ex_US{
foreach var2 of varlist vUnited_States-vGlobal_Ex_US{
if "v`var'"=="`var2'"{
asdoc reg `var2' `var', robust nest save(volatilityI) dec(2)
}
}
}
Any help would be appreciated.
Thanks,
Marc
I am currently working on project about volatility timing.
The data I am working with is daily and I am not entirely sure what I can do to annualise the coefficients from the loop. The coefficients I need to annualise are alpha(constant) and beta and if possible the t-stat from alpha.
foreach var of varlist United_States-Global_Ex_US{
foreach var2 of varlist vUnited_States-vGlobal_Ex_US{
if "v`var'"=="`var2'"{
asdoc reg `var2' `var', robust nest save(volatilityI) dec(2)
}
}
}
Any help would be appreciated.
Thanks,
Marc
extract table from multiple pages of the web by Stata ?
Hi ,
I'm now trying to extract data from web " https://ift.tt/2HE5woy" , there are a lot of pages in web address how can I import that tables of data into stata because there are 30956 observations in the table format in this web.
Currently, I am using the following, but not effective, please help me efficient ways
readhtmltable https://cdd.drdmyanmar.org/en/vdp-files
I'm now trying to extract data from web " https://ift.tt/2HE5woy" , there are a lot of pages in web address how can I import that tables of data into stata because there are 30956 observations in the table format in this web.
Currently, I am using the following, but not effective, please help me efficient ways
readhtmltable https://cdd.drdmyanmar.org/en/vdp-files
Binary Variable Graph
Hello all,
I simply want to plot a graph with the number of times a binary variable (civil war incidence) = 1 for each year on the y-axis and the year on the x-axis, with five plots indicating five different regions.
Help would be much appreciated.
Thanks,
Rishi
I simply want to plot a graph with the number of times a binary variable (civil war incidence) = 1 for each year on the y-axis and the year on the x-axis, with five plots indicating five different regions.
Help would be much appreciated.
Thanks,
Rishi
Inclusion of interaction terms and interpretation of margins output in RE xtprobit
Hi all,
I am conducting a study into the transmission of time preferences in spouses through their smoking behaviours. I have a balanced dataset where N=1464 and T=3. I am using a random effects probit model with errors clustered at their personal ID numbers.
The model is as follows:
xtprobit smokes_f c.sctimeimpulse_f i.smokes_m_1##c.sctimeimpulse_m c.age_f c.age_m i.employment_f i.employment_m i.good_ghealth_f i.reduc_f i.reduc_m c.rellength i.child_under15 c.loghhincome i.rural i.deprivation_scale i.state##i.year, re vce(cluster pid)
sctimeimpulse_f = Female time preference measure
sctimeimpulse_m = Male time preference measure
I have 2 problems which I would appreciate advice upon. I have attached the STATA outputs within a notepad file since the line size was too large.
Model 1 double interaction.txt
1) I know that there is collinearity between smokes_m and sctimeimpulse_m, yet what I am interested in is whether the likelihood of a female smoking given the partner smokes is increased by his time preference rate. Would I be making a mistake to include sctimeimpulse_m as an independent variable as well as being in the interaction, is it more correct to not do so due to the collinearity? As an independent variable, sctimeimpulse_m is insignificant.
margins r.smokes_m_1, dydx(sctimeimpulse_m) atmeans
The result is insignificant also yet the interaction term is significant in the xtprobit output. Why is this if in the xtprobit output the interaction is significant?
2) I performed margins, dydx(*) out of general interest for the other variables in the model. The p values change a fair bit from the xtprobit output. Why is this? Which output should I refer to for the general significance of the variables: xtprobit or margins?
I am conducting a study into the transmission of time preferences in spouses through their smoking behaviours. I have a balanced dataset where N=1464 and T=3. I am using a random effects probit model with errors clustered at their personal ID numbers.
The model is as follows:
xtprobit smokes_f c.sctimeimpulse_f i.smokes_m_1##c.sctimeimpulse_m c.age_f c.age_m i.employment_f i.employment_m i.good_ghealth_f i.reduc_f i.reduc_m c.rellength i.child_under15 c.loghhincome i.rural i.deprivation_scale i.state##i.year, re vce(cluster pid)
sctimeimpulse_f = Female time preference measure
sctimeimpulse_m = Male time preference measure
smokes_f = dummy variable of female's smoking status at t
smokes_m_1 = dummy variable of male's smoking status t-1
I have 2 problems which I would appreciate advice upon. I have attached the STATA outputs within a notepad file since the line size was too large.
Model 1 double interaction.txt
1) I know that there is collinearity between smokes_m and sctimeimpulse_m, yet what I am interested in is whether the likelihood of a female smoking given the partner smokes is increased by his time preference rate. Would I be making a mistake to include sctimeimpulse_m as an independent variable as well as being in the interaction, is it more correct to not do so due to the collinearity? As an independent variable, sctimeimpulse_m is insignificant.
I perform the margins command:
margins r.smokes_m_1, dydx(sctimeimpulse_m) atmeans
The result is insignificant also yet the interaction term is significant in the xtprobit output. Why is this if in the xtprobit output the interaction is significant?
2) I performed margins, dydx(*) out of general interest for the other variables in the model. The p values change a fair bit from the xtprobit output. Why is this? Which output should I refer to for the general significance of the variables: xtprobit or margins?
How to compare two datasets
Hi,
I want to verify / confirm that two datasets are identical. I reproduced my dataset from my colleague's work and want to compare to her dataset. What is one of the ways I could do that? Thanks.
I want to verify / confirm that two datasets are identical. I reproduced my dataset from my colleague's work and want to compare to her dataset. What is one of the ways I could do that? Thanks.
Is there a way to get ey/ex marginal effects after xtreg command with fixed effects?
Hello all,
I would like to see the proportional marginal effects (i.e.: ey/ex) after a regression with fixed effects. There is always an error message which says:
"could not calculate numerical derivatives -- discontinuous region with missing values
encountered r(459);"
I am able to attain dy/ex and dy/dx but because of the variable measurements, I need to see ey/ex to be consistent. ey/ex is possible with random effects, however, they are not appropriate in my model.
Is there a way to calculate ey/ex?
Thank you,
Karen.
I would like to see the proportional marginal effects (i.e.: ey/ex) after a regression with fixed effects. There is always an error message which says:
"could not calculate numerical derivatives -- discontinuous region with missing values
encountered r(459);"
I am able to attain dy/ex and dy/dx but because of the variable measurements, I need to see ey/ex to be consistent. ey/ex is possible with random effects, however, they are not appropriate in my model.
Is there a way to calculate ey/ex?
Thank you,
Karen.
Testing for equivalence of marginal effects of the same variable at different '_at'
Hello all,
I am trying to test for the equivalence of marginal effects of the same variable at two different indicator variable set points after a regression with an interaction term. Here is the jist of my commands:
reg Y c.X1 i.dummy c.X1#i.dummy
margins dy/dx(X1) at(dummy=0 dummy=1) post
Then I would like to test for whether the marginal effect of X1 at dummy=0 is different than the marginal effect of X1 at dummy=1. I am having trouble with referring to the same variable within the same test command. i.e.: test X1=X1. Thank you in advance for what I am sure is a very simple solution.
Karen.
I am trying to test for the equivalence of marginal effects of the same variable at two different indicator variable set points after a regression with an interaction term. Here is the jist of my commands:
reg Y c.X1 i.dummy c.X1#i.dummy
margins dy/dx(X1) at(dummy=0 dummy=1) post
Then I would like to test for whether the marginal effect of X1 at dummy=0 is different than the marginal effect of X1 at dummy=1. I am having trouble with referring to the same variable within the same test command. i.e.: test X1=X1. Thank you in advance for what I am sure is a very simple solution.
Karen.
Instrument variable TSLS
Dear members,
i am just confused about one thing. i want to use lag dependent and lag independent variables as an instruments in IV regression (two stage least square). can i use these instruments or i need to find instrument variable other than lag variables. Kindly give your valuable comments.
Regards,
Rabiya
i am just confused about one thing. i want to use lag dependent and lag independent variables as an instruments in IV regression (two stage least square). can i use these instruments or i need to find instrument variable other than lag variables. Kindly give your valuable comments.
Regards,
Rabiya
"By" and "Margins" command error; alternative approach?
Hi Everyone,
I am writing a thesis for my Honours year (Advanced Undergraduate Degree) and I am writing it on adolescent drinking levels with an excellent panel dataset. However, the panel data is split into two "cohorts" for example, it is set up as follows:
I want to know the effects on the different cohorts because they are different ages and have different effect levels. My (simplified) code is as follows:
sort cohort
by cohort: ologit alckid i.alcfreqp1 i.alcfreqp2 i.friendsalc, ro
by cohort: margins, dydx(alcfreqp1 alcfreqp2 friendsalc) post
However, I receive an error that the by command and the margins command may not be combined:
margins may not be combined with by
Does anyone have any suggestions for how I can run the margins command to find the margins by each separate cohort? I am struggling to find an alternative at this point.
Thanks in advance,
Jason
I am writing a thesis for my Honours year (Advanced Undergraduate Degree) and I am writing it on adolescent drinking levels with an excellent panel dataset. However, the panel data is split into two "cohorts" for example, it is set up as follows:
| ID | Wave | Cohort | X | Y | Z |
| 33 | 1 | K | 3 | 5 | 7 |
| 55 | 1 | B | 4.3 | 6 | 1 |
| 33 | 2 | K | 23 | 123 | 213 |
| 55 | 2 | B | 312 | 32 | 1231 |
| 33 | 3 | K | 231 | 44 | 34 |
| 55 | 3 | B | 213 | 23 | 43 |
I want to know the effects on the different cohorts because they are different ages and have different effect levels. My (simplified) code is as follows:
sort cohort
by cohort: ologit alckid i.alcfreqp1 i.alcfreqp2 i.friendsalc, ro
by cohort: margins, dydx(alcfreqp1 alcfreqp2 friendsalc) post
However, I receive an error that the by command and the margins command may not be combined:
margins may not be combined with by
Does anyone have any suggestions for how I can run the margins command to find the margins by each separate cohort? I am struggling to find an alternative at this point.
Thanks in advance,
Jason
Can I use a zero inflated negative binomial regression?
Hello everyone,
this is my first post, so please be kind and understanding if I don't meet the forum norms.
So my regression equation was: reg amount12 ib1.lng_origins c.pca_generaltrst1 ib6.religion controls
The outcome variable is amount12=amount remitted in past 12 months, lng_origins=language origins in SA such as (Sotho, Venda, Tsonga etc) and religion=Atheists, christians etc.
So the very first problem that I have is that if I only look at the amount remitted of those that remit I may have selection bias. I cannot use a heckman because my selection equation does not have a variable that is different from the second stage so the exclusion restriction is violated.
Then I talked to a professor and he said that I should simply recode the missing values in the amount remitted to zeros because those people are not remitting any amount. So I did that and I also recoded two other variables with missings to zeros that I want to include as controls because I figured that otherwise stata only takes the values into account that are non-missing but to account for selection bias it has to take all the observations into account right? These are the control that I recoded: (1) relationship to remittance receiver (2) frequency of remittances.
Now I cant use OLS because the error terms are not distributed normally and I have a loooot of zeros which is why I thought I may be able to use a zero inflated negative binomial regression. Then in inflate() I would plug in my logit regression (all variables & controls without the outcome variable) that estimated whether a person remits or not:
zinb new_amount12 ib1.pop_lngorigins c.pca_generaltrst1 ib6.religion controls, inflate(ib1.pop_lngorigins c.pca_generaltrst1 ib6.religion other controls)
Unfortunately, the inflate regression does not give me the same or similar results as the logit regression that I did already, why is that? Can I still use the coefficients that I get for amount remitted?
Please note that this is a master thesis and that it does not have to be perfect (I would like it to be but I am pretty much new to these models so I think it is very normal that it will not be perfect right away).Thank you so much for your help in advance!!
this is my first post, so please be kind and understanding if I don't meet the forum norms.
So my regression equation was: reg amount12 ib1.lng_origins c.pca_generaltrst1 ib6.religion controls
The outcome variable is amount12=amount remitted in past 12 months, lng_origins=language origins in SA such as (Sotho, Venda, Tsonga etc) and religion=Atheists, christians etc.
So the very first problem that I have is that if I only look at the amount remitted of those that remit I may have selection bias. I cannot use a heckman because my selection equation does not have a variable that is different from the second stage so the exclusion restriction is violated.
Then I talked to a professor and he said that I should simply recode the missing values in the amount remitted to zeros because those people are not remitting any amount. So I did that and I also recoded two other variables with missings to zeros that I want to include as controls because I figured that otherwise stata only takes the values into account that are non-missing but to account for selection bias it has to take all the observations into account right? These are the control that I recoded: (1) relationship to remittance receiver (2) frequency of remittances.
Now I cant use OLS because the error terms are not distributed normally and I have a loooot of zeros which is why I thought I may be able to use a zero inflated negative binomial regression. Then in inflate() I would plug in my logit regression (all variables & controls without the outcome variable) that estimated whether a person remits or not:
zinb new_amount12 ib1.pop_lngorigins c.pca_generaltrst1 ib6.religion controls, inflate(ib1.pop_lngorigins c.pca_generaltrst1 ib6.religion other controls)
Unfortunately, the inflate regression does not give me the same or similar results as the logit regression that I did already, why is that? Can I still use the coefficients that I get for amount remitted?
Please note that this is a master thesis and that it does not have to be perfect (I would like it to be but I am pretty much new to these models so I think it is very normal that it will not be perfect right away).Thank you so much for your help in advance!!
STATA seems to add spaces into path where there are none although using double quotes
Dear statalist,
I am using Stata/SE 15.1 for Mac. I am using a global to specify my path at the beginning of my do file. I write it like this:
However, in the command window Stata returns it like this, with additional spaces, e.g. after "felix":
This later causes problems when I later refer to my path using other commands, such as import excel, as STATA tells me that it cannot find the specified file:
I specify my command like this:
and then STATA returns the command like this, with additional spaces in the command window with an error message "not found":
So sum up, the problem seems to be that STATA adds spaces to my command that are not included in the folder names and this seems to misspecify the path when I try to draw on it. I checked spelling and everything is in double-quotes.
Many thanks in advance for your help.
I am using Stata/SE 15.1 for Mac. I am using a global to specify my path at the beginning of my do file. I write it like this:
Code:
global path "/Users/felix/Dropbox/Felix/F&Vs PHD/EINITE/Aggregation/Paper/Aggregation_21082019/"
Code:
global path "/Users/felix /Dropbox /Felix/F&Vs PHD / EINITE/ Aggregation/Paper/Aggregation_21082019"
I specify my command like this:
Code:
import excel "$path/TOTAL WITH PROPERTYLESS/Germany_total_1400.xlsx", firstrow
Code:
file /Users/felix /Dropbox /Felix/F&Vs PHD / EINITE/ Aggregation/Paper/Aggregation_21082019/TOTAL WITH PROPERTYLESS/Germany_total_1400.xlsx
Many thanks in advance for your help.
Stata seems to add spaces into path where there are none, wich leads to error message
Dear statalist,
I am using Stata/SE 15.1 for Mac. I am using a global to specify my path at the beginning of my do file. I write it like this:
However, in the command window Stata returns it like this, with additional spaces, e.g. after "felix":
This later causes problems when I later refer to my path using other commands, such as import excel, as STATA tells me that it cannot find the specified file:
I specify my command like this:
and then STATA returns the command like this, with additional spaces in the command window with an error message "not found":
So sum up, the problem seems to be that STATA adds spaces to my command that are not included in the folder names and this seems to misspecify the path when I try to draw on it. I checked spelling and everything is in double-quotes.
Many thanks in advance for your help.
I am using Stata/SE 15.1 for Mac. I am using a global to specify my path at the beginning of my do file. I write it like this:
Code:
global path "/Users/felix/Dropbox/Felix/F&Vs PHD/EINITE/Aggregation/Paper/Aggregation_21082019/"
Code:
global path "/Users/felix /Dropbox /Felix/F&Vs PHD / EINITE/ Aggregation/Paper/Aggregation_21082019"
I specify my command like this:
Code:
import excel "$path/TOTAL WITH PROPERTYLESS/Germany_total_1400.xlsx", firstrow
Code:
file /Users/felix /Dropbox /Felix/F&Vs PHD / EINITE/ Aggregation/Paper/Aggregation_21082019/TOTAL WITH PROPERTYLESS/Germany_total_1400.xlsx
Many thanks in advance for your help.
event-study (event-study2)
Hi all,
I am using Stata 16.
My study: I have two periods that I will evaluate; 2008 - 2013 and then 2014 - 2018. I have a total of 63 baseline events - 23 in the 2008 - 2013 period and 40 in the 2014 - 2018 period - and I am using open/close data. Due to data access, I was unable to obtain intraday data (for less than £1500) but I feel using the open/close data will enable me to provide a broad view and still comment on any pre-announcement drift and lasting effects of FOMC news (thus commenting on efficient market theory). To that end, I would like to evaluate the way in which the Dow anticipates FOMC news 3 days prior to the announcement and the way in which news is incorporated, 3 days after the event.
The majority of event studies I have seen appear to evaluate a stock against an index (eg. apple against the S&P 500 is very common) to thus estimate normal performance, abnormal and cumulative abnormal returns, as well as being able to test for significance. Is it possible to conduct an event study with just the Dow Jones?
I am unsure whether anyone has any direction regarding the use of eventstudy2 but I also had the idea to evaluate the effects of unscheduled meetings - so effectively, another set of events. I considered using dummies/ changing (d.var d2.var etc) for x days – leading and lagging (1, 2,3, 4 ) to see if there is any reaction and whether it is more dramatic or not of the scheduled meetings. I suppose they would be in reaction to shocks and a such not working in the same way as the schedule ones. I have done something similar on a project that was not finance related so excuse the novice approach to the event study application.
I have read the 'help eventstudy2' and also used the Princeton , event Studies with Stata guide. However, I I am generating missing values when setting-up my data and I cannot figure what is up with the way I have prepared my data in Excel.
Attached in a copy of my data prepared in excel. I have manually calculated cumulative returns for the whole index, for both periods and I have further calculated cumulative returns for the index over the whole period and removed the event days - this will allow me to compare event-days versus all other trading days. Any direction at this stage would be much appreciated.
Thanks in advance.
I am using Stata 16.
My study: I have two periods that I will evaluate; 2008 - 2013 and then 2014 - 2018. I have a total of 63 baseline events - 23 in the 2008 - 2013 period and 40 in the 2014 - 2018 period - and I am using open/close data. Due to data access, I was unable to obtain intraday data (for less than £1500) but I feel using the open/close data will enable me to provide a broad view and still comment on any pre-announcement drift and lasting effects of FOMC news (thus commenting on efficient market theory). To that end, I would like to evaluate the way in which the Dow anticipates FOMC news 3 days prior to the announcement and the way in which news is incorporated, 3 days after the event.
The majority of event studies I have seen appear to evaluate a stock against an index (eg. apple against the S&P 500 is very common) to thus estimate normal performance, abnormal and cumulative abnormal returns, as well as being able to test for significance. Is it possible to conduct an event study with just the Dow Jones?
I am unsure whether anyone has any direction regarding the use of eventstudy2 but I also had the idea to evaluate the effects of unscheduled meetings - so effectively, another set of events. I considered using dummies/ changing (d.var d2.var etc) for x days – leading and lagging (1, 2,3, 4 ) to see if there is any reaction and whether it is more dramatic or not of the scheduled meetings. I suppose they would be in reaction to shocks and a such not working in the same way as the schedule ones. I have done something similar on a project that was not finance related so excuse the novice approach to the event study application.
I have read the 'help eventstudy2' and also used the Princeton , event Studies with Stata guide. However, I I am generating missing values when setting-up my data and I cannot figure what is up with the way I have prepared my data in Excel.
Attached in a copy of my data prepared in excel. I have manually calculated cumulative returns for the whole index, for both periods and I have further calculated cumulative returns for the index over the whole period and removed the event days - this will allow me to compare event-days versus all other trading days. Any direction at this stage would be much appreciated.
Thanks in advance.
Friday, August 30, 2019
Linear interpolation
Hi!
I have a variable called year in my dataset but this contains just the year 2006, 2009, 2011, 2013, 2015 and 2017. I want to aggregate intermediate years but i do not know how, the other variables have to have an average in those years.
Example: I have 2006 and 2009 in years and i have population to theese years. So, i want to aggregate the intermediate years and make an average from this population in the intermediate years.
I would be grateful if you can help me.
Thanks!
I have a variable called year in my dataset but this contains just the year 2006, 2009, 2011, 2013, 2015 and 2017. I want to aggregate intermediate years but i do not know how, the other variables have to have an average in those years.
Example: I have 2006 and 2009 in years and i have population to theese years. So, i want to aggregate the intermediate years and make an average from this population in the intermediate years.
I would be grateful if you can help me.
Thanks!
Taking Results from Tab2 and Using them to Generate/Recode a new variable
I have a big dataset (1.7million obs) and I need to create contour plots by a member_id within the dataset.
I have the member_id, and each member has about 45,000 obs. After filtering and looking at a subset, I end up with about 35,000 obs for each member.
I tried to use histogram/contour plot to create a contour plot/heatmap and with all of the data it took a really long time to render.
To increase the processing speed and distill the data I did the following....
For the 35,000 obs, I have two continuous variables that are the X and Y variable... and I want their incidence/intensity displayed in a contourplot or a heat map.
I created two categorical variables, a 15-category variable and an 11-category variable that were created by substantive concerns for 1 (known categories with substantive import) and distributional properties for the other (created reasonably sized groups in each interval.
I cross-tabbed the two categorical variables and then created a counter/incidentce variable using the crosstab results....
I did it using brute force...
generate counter=0
recode counter 0=(number in crosstab) if v1==1 and V2==1
recode counter 0=(number in crosstab) if v1==1 and v2==2
I did this for all of the cells in an 11x15 crosstab....
I then created a contour plot with the x variable, the y variable, and used the 'counter' as the z-variable.
The plots work and, with some options, convey the appropriate information.
BUT, this is hugely clumsy and labor intensive when I have 40 'members' and need to do this for 3-4 files per week.
What I would like to do is run the crosstab -
tab v1 v2 - generate the results, and use a mat command to automate the recode of the 'counter_group1' and place each of the cell values in the code to create my counter.
I hope this is clear and I have data to use for an example...
I have set up matrices and run tabs and then used the mat commands to export data into tables, I assume I can do the same and use the mat command to place 'results' into a recode statement that prevents me from manually taking crosstab results and typing them into recode statements.
Any thoughts?
I have the member_id, and each member has about 45,000 obs. After filtering and looking at a subset, I end up with about 35,000 obs for each member.
I tried to use histogram/contour plot to create a contour plot/heatmap and with all of the data it took a really long time to render.
To increase the processing speed and distill the data I did the following....
For the 35,000 obs, I have two continuous variables that are the X and Y variable... and I want their incidence/intensity displayed in a contourplot or a heat map.
I created two categorical variables, a 15-category variable and an 11-category variable that were created by substantive concerns for 1 (known categories with substantive import) and distributional properties for the other (created reasonably sized groups in each interval.
I cross-tabbed the two categorical variables and then created a counter/incidentce variable using the crosstab results....
I did it using brute force...
generate counter=0
recode counter 0=(number in crosstab) if v1==1 and V2==1
recode counter 0=(number in crosstab) if v1==1 and v2==2
I did this for all of the cells in an 11x15 crosstab....
I then created a contour plot with the x variable, the y variable, and used the 'counter' as the z-variable.
The plots work and, with some options, convey the appropriate information.
BUT, this is hugely clumsy and labor intensive when I have 40 'members' and need to do this for 3-4 files per week.
What I would like to do is run the crosstab -
tab v1 v2 - generate the results, and use a mat command to automate the recode of the 'counter_group1' and place each of the cell values in the code to create my counter.
I hope this is clear and I have data to use for an example...
I have set up matrices and run tabs and then used the mat commands to export data into tables, I assume I can do the same and use the mat command to place 'results' into a recode statement that prevents me from manually taking crosstab results and typing them into recode statements.
Any thoughts?
Quick questions about the interaction methods
Dear Researchers,
Greetings,
I have quick questions about interaction variables, please.
At first, I am going to examine the effect of specific standards on the financial reporting quality in the same country for industrial firms for the period 2006-2018. The standards issued let’s say in 2006 and firms started adopting the standards voluntarily. In other words, firms that did not adopt in 2006, they adopt in 2007, and those that did not adopt in 2007 adopted in 2008 and so on. I’ve read a lot about the Difference in Differences method (DID) but it seems that this method will not work in my case as the DID approach require control sample (non-adopters), and in my case, all non-adopters become adopters during the period. Now I am confused about which methods I can use.
To illustrate:
Let us assume the following model:
This is the basic model before studying the effect of the standards.
Financial Reporting quality, size, and growth are scales (ratio) variables while Audit is a binary variable.
Now, I want to examine the effect of the standards on the financial reporting quality as below:
Financial Reporting quality = α0+ β1size + β2 Audit + β3 Growth + β4 GG + β5GG*size + β6GG*Audit+ βGG*Growth+ €.
Where:
GG = are the standards adopted
Q (1) Can I code GG (1) for a firm (X) for the observations under the adoption period and (0) for the same firm but for observation under the old standard period.
Q(2) can I consider those observations that coded (0) under the old standards as a control?
Q(3) Do the following are correct:
Thank you very much in advance for your cooperation.
Greetings,
I have quick questions about interaction variables, please.
At first, I am going to examine the effect of specific standards on the financial reporting quality in the same country for industrial firms for the period 2006-2018. The standards issued let’s say in 2006 and firms started adopting the standards voluntarily. In other words, firms that did not adopt in 2006, they adopt in 2007, and those that did not adopt in 2007 adopted in 2008 and so on. I’ve read a lot about the Difference in Differences method (DID) but it seems that this method will not work in my case as the DID approach require control sample (non-adopters), and in my case, all non-adopters become adopters during the period. Now I am confused about which methods I can use.
To illustrate:
Let us assume the following model:
This is the basic model before studying the effect of the standards.
- Financial Reporting quality = α0+ β1 size + β2 Audit + β3 Growth + €.
Financial Reporting quality, size, and growth are scales (ratio) variables while Audit is a binary variable.
Now, I want to examine the effect of the standards on the financial reporting quality as below:
Financial Reporting quality = α0+ β1size + β2 Audit + β3 Growth + β4 GG + β5GG*size + β6GG*Audit+ βGG*Growth+ €.
Where:
GG = are the standards adopted
Q (1) Can I code GG (1) for a firm (X) for the observations under the adoption period and (0) for the same firm but for observation under the old standard period.
Q(2) can I consider those observations that coded (0) under the old standards as a control?
Q(3) Do the following are correct:
| β4 | capture the effect of the standards on the financial reporting quality and if it is positive means that the average values of financial reporting quality are higher under the new standards than the old standards keeping other factors constant. |
| β1 | Capture the effect of size before the adoption of the new standards |
| β5 | Capture the effect of size after the adoption of the new standards |
| β1+ β5 | Capture the overall effect (before and after). |
Thank you very much in advance for your cooperation.
Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data
Greetings,
I'm running Stata 15.1 on OSX. I've created a repeated cross-sectional dataset to assess variations in attitudes towards crime (the dependent variable). My time variable essentially stores the month and year in which each survey (31 in total) was conducted. For my independent variable, I created and merged (using the time variable) a newspaper issue-salience index that stores the percent of monthly New York Times articles that refer to crime-related issues. My expectation is that in months in which crime is salient in the media, we will see an increase in the percent of respondents saying crime 'is a serious issue'. To prepare the dataset for analysis, I created (using 'collapse') a variable that stores the mean percent of respondents that gave the 'serious issue' response in each survey (i.e. by year/month). I did the same with the salience index variable (separate dataset) and merged it into the collapsed survey dataset. I ran a simple Pearson correlation between the index and the survey response variable and uncovered a strong relationship (r=0.87). However, a colleague of mine who saw the resulting graph warned me that I 'shouldn't correlate time series with strong autocorrelation' and that, instead, I should 'create first-order difference sequences and correlate those'. I'm not quite sure how to go about doing this. The dataset has no panel ID, so I tried creating one:
(note that the id is then simply a number assigned to each survey--31 in total)
I then entered the following:
To create the 'first difference' variable I tried the following:
However, Stata subsequently gave me the '31 missing values generation' message.
What am I doing wrong here and how do I get it right? Thanks for your time!
Sample data:
I'm running Stata 15.1 on OSX. I've created a repeated cross-sectional dataset to assess variations in attitudes towards crime (the dependent variable). My time variable essentially stores the month and year in which each survey (31 in total) was conducted. For my independent variable, I created and merged (using the time variable) a newspaper issue-salience index that stores the percent of monthly New York Times articles that refer to crime-related issues. My expectation is that in months in which crime is salient in the media, we will see an increase in the percent of respondents saying crime 'is a serious issue'. To prepare the dataset for analysis, I created (using 'collapse') a variable that stores the mean percent of respondents that gave the 'serious issue' response in each survey (i.e. by year/month). I did the same with the salience index variable (separate dataset) and merged it into the collapsed survey dataset. I ran a simple Pearson correlation between the index and the survey response variable and uncovered a strong relationship (r=0.87). However, a colleague of mine who saw the resulting graph warned me that I 'shouldn't correlate time series with strong autocorrelation' and that, instead, I should 'create first-order difference sequences and correlate those'. I'm not quite sure how to go about doing this. The dataset has no panel ID, so I tried creating one:
Code:
gen id=_n
I then entered the following:
Code:
xtset id year
Code:
gen indexdiff=D.index
What am I doing wrong here and how do I get it right? Thanks for your time!
Sample data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(index crime_serious yearmonth year id) 1.0898919 37.634632 441 1996 1 1.141105 41.15658 449 1997 2 .9819449 31.580814 453 1997 3 1.1344688 35.43958 475 1999 4 1.2987403 39.7779 487 2000 5 1.1022217 39.37875 488 2000 6 1.045117 32.872364 521 2003 7 .7872596 35.538055 522 2003 8 .8885489 38.24273 523 2003 9 .9927688 35.79262 524 2003 10 .7067459 39.30157 539 2004 11 1.0929303 36.767914 548 2005 12 1.0707874 25.04893 572 2007 13 1.0773966 34.76981 573 2007 14 1.0685753 29.70381 576 2008 15 1.118886 27.0324 580 2008 16 .9239349 31.63132 584 2008 17 .7300239 23.623867 597 2009 18 .7975035 28.98842 598 2009 19 1.1477937 34.304623 613 2011 20 1.0149189 38.20615 614 2011 21 1.1804827 34.5046 624 2012 22 1.3056893 39.55238 648 2014 23 1.2751036 41.03848 649 2014 24 1.369863 42.47158 650 2014 25 1.8246716 52.22675 662 2015 26 2.096708 48.12559 667 2015 27 1.6774454 47.23487 668 2015 28 1.5856438 42.08379 669 2015 29 2.575059 57.32762 686 2017 30 2.7088645 64.2695 689 2017 31 end format %tm yearmonth
No observations r(2000) with Regress and SEM
Hello all. I am working with a dataset that contains many categorical variables with values that are still in written word format, which I am trying to convert into numbers. Initially when I typed "summarize" no observations would register for any of the variables, even though I could see the values when I manually scrolled through the dataset. When I tried to run PCA and Regression on some of the variables, these commands would obviously not work because the data were still in string format and needed to be converted to numbers. I fixed that issue by using: generate oldvar=real(oldvar) for all of the variables I was planning on including in my analyses. After running the real() function I was able to obtain summary statistics and observations for all of the variables i needed for my regression. So I thought my problem of the "no observations" error would be fixed when I ran a regression using the new numerical variables. This was not the case, however. I am able to run PCA on several of the variables successfully (Vo1-Vo10), but the regression model I ran still returned the "no observations" error again when I included the now numerical "Race, Age, Education, Ethnicity, and Income" variables. The same error arises when I try to run a SEM with some of the variables. Has anyone found a fix to this "no observations" error arising even after string variables have been converted to numeric? I'll copy and paste my code below:
*Create race dummy variable with 1 as white, 0 as non white
gen race=""
replace race=Q101
replace race="1" if Q101=="White"
replace race="0" if Q101=="Black or African American"
replace race="0" if Q101=="American Indian or Alaska Native"
replace race="0" if Q101=="Asian"
replace race="0" if Q101=="Native Hawaiian or Pacific Islander"
replace race="0" if Q101=="Other"
*Create ethnicity categorical variable
gen ethnicity="."
replace ethnicity=Q102
replace ethnicity="1" if Q102=="Yes"
replace ethnicity="0" if Q102=="No"
*create income categorical variable
gen income="."
replace income=Q100
replace income="1" if Q100=="Less than $19,999"
replace income="2" if Q100=="$20,000 - $39,999"
replace income="3" if Q100=="$40,000 - $59,999"
replace income="4" if Q100=="$60,000 - $79,999"
replace income="5" if Q100=="$80,000 - $99,999"
replace income="6" if Q100=="$100,000 - $149,999"
replace income="7" if Q100=="More than $150,000"
*Create education categorical variable
gen education="."
replace education=Q99
replace education="1" if Q99=="Less than high school"
replace education="2" if Q99=="High school graduate"
replace education="3" if Q99=="Some college"
replace education="4" if Q99=="2 year degree"
replace education="5" if Q99=="4 year degree"
replace education="6" if Q99=="Professional degree"
replace education="7" if Q99=="Doctorate"
*Create age categorical variable
gen age="."
replace age=Q98
replace age="1" if Q98=="16 - 24"
replace age="2" if Q98=="25 - 34"
replace age="3" if Q98=="35 - 44"
replace age="4" if Q98=="45 - 54"
replace age="5" if Q98=="55 - 64"
replace age="6" if Q98=="65 - or older"
*Code value oriententation questions to prep for PCA
gen vo1="."
replace vo1=DT
replace vo1="1" if DT=="Strongly disagree"
replace vo1="2" if DT=="Somewhat disagree"
replace vo1="3" if DT=="Neither agree nor disagree"
replace vo1="4" if DT=="Somewhat agree"
replace vo1="5" if DT=="Strongly agree"
gen vo2="."
replace vo2=DU
replace vo2="1" if DU=="Strongly disagree"
replace vo2="2" if DU=="Somewhat disagree"
replace vo2="3" if DU=="Neither agree nor disagree"
replace vo2="4" if DU=="Somewhat agree"
replace vo2="5" if DU=="Strongly agree"
gen vo3="."
replace vo3=DV
replace vo3="1" if DV=="Strongly disagree"
replace vo3="2" if DV=="Somewhat disagree"
replace vo3="3" if DV=="Neither agree nor disagree"
replace vo3="4" if DV=="Somewhat agree"
replace vo3="5" if DV=="Strongly agree"
gen vo4="."
replace vo4=DW
replace vo4="1" if DW=="Strongly disagree"
replace vo4="2" if DW=="Somewhat disagree"
replace vo4="3" if DW=="Neither agree nor disagree"
replace vo4="4" if DW=="Somewhat agree"
replace vo4="5" if DW=="Strongly agree"
gen vo5="."
replace vo5=DX
replace vo5="1" if DX=="Strongly disagree"
replace vo5="2" if DX=="Somewhat disagree"
replace vo5="3" if DX=="Neither agree nor disagree"
replace vo5="4" if DX=="Somewhat agree"
replace vo5="5" if DX=="Strongly agree"
gen vo6="."
replace vo6=DY
replace vo6="1" if DY=="Strongly disagree"
replace vo6="2" if DY=="Somewhat disagree"
replace vo6="3" if DY=="Neither agree nor disagree"
replace vo6="4" if DY=="Somewhat agree"
replace vo6="5" if DY=="Strongly agree"
gen vo7="."
replace vo7=DZ
replace vo7="1" if DZ=="Strongly disagree"
replace vo7="2" if DZ=="Somewhat disagree"
replace vo7="3" if DZ=="Neither agree nor disagree"
replace vo7="4" if DZ=="Somewhat agree"
replace vo7="5" if DZ=="Strongly agree"
gen vo8="."
replace vo8=EA
replace vo8="1" if EA=="Strongly disagree"
replace vo8="2" if EA=="Somewhat disagree"
replace vo8="3" if EA=="Neither agree nor disagree"
replace vo8="4" if EA=="Somewhat agree"
replace vo8="5" if EA=="Strongly agree"
gen vo9="."
replace vo9=EB
replace vo9="1" if EB=="Strongly disagree"
replace vo9="2" if EB=="Somewhat disagree"
replace vo9="3" if EB=="Neither agree nor disagree"
replace vo9="4" if EB=="Somewhat agree"
replace vo9="5" if EB=="Strongly agree"
gen vo10="."
replace vo10=EC
replace vo10="1" if EC=="Strongly disagree"
replace vo10="2" if EC=="Somewhat disagree"
replace vo10="3" if EC=="Neither agree nor disagree"
replace vo10="4" if EC=="Somewhat agree"
replace vo10="5" if EC=="Strongly agree"
*Convert model variables from string to numeric
generate Race=real(race)
generate Age=real(age)
generate Education=real(education)
generate Ethnicity=real(ethnicity)
generate Income=real(income)
generate Vo1=real(vo1)
generate Vo2=real(vo2)
generate Vo3=real(vo3)
generate Vo4=real(vo4)
generate Vo5=real(vo5)
generate Vo6=real(vo6)
generate Vo7=real(vo7)
generate Vo8=real(vo8)
generate Vo9=real(vo9)
generate Vo10=real(vo10)
*Create race dummy variable with 1 as white, 0 as non white
gen race=""
replace race=Q101
replace race="1" if Q101=="White"
replace race="0" if Q101=="Black or African American"
replace race="0" if Q101=="American Indian or Alaska Native"
replace race="0" if Q101=="Asian"
replace race="0" if Q101=="Native Hawaiian or Pacific Islander"
replace race="0" if Q101=="Other"
*Create ethnicity categorical variable
gen ethnicity="."
replace ethnicity=Q102
replace ethnicity="1" if Q102=="Yes"
replace ethnicity="0" if Q102=="No"
*create income categorical variable
gen income="."
replace income=Q100
replace income="1" if Q100=="Less than $19,999"
replace income="2" if Q100=="$20,000 - $39,999"
replace income="3" if Q100=="$40,000 - $59,999"
replace income="4" if Q100=="$60,000 - $79,999"
replace income="5" if Q100=="$80,000 - $99,999"
replace income="6" if Q100=="$100,000 - $149,999"
replace income="7" if Q100=="More than $150,000"
*Create education categorical variable
gen education="."
replace education=Q99
replace education="1" if Q99=="Less than high school"
replace education="2" if Q99=="High school graduate"
replace education="3" if Q99=="Some college"
replace education="4" if Q99=="2 year degree"
replace education="5" if Q99=="4 year degree"
replace education="6" if Q99=="Professional degree"
replace education="7" if Q99=="Doctorate"
*Create age categorical variable
gen age="."
replace age=Q98
replace age="1" if Q98=="16 - 24"
replace age="2" if Q98=="25 - 34"
replace age="3" if Q98=="35 - 44"
replace age="4" if Q98=="45 - 54"
replace age="5" if Q98=="55 - 64"
replace age="6" if Q98=="65 - or older"
*Code value oriententation questions to prep for PCA
gen vo1="."
replace vo1=DT
replace vo1="1" if DT=="Strongly disagree"
replace vo1="2" if DT=="Somewhat disagree"
replace vo1="3" if DT=="Neither agree nor disagree"
replace vo1="4" if DT=="Somewhat agree"
replace vo1="5" if DT=="Strongly agree"
gen vo2="."
replace vo2=DU
replace vo2="1" if DU=="Strongly disagree"
replace vo2="2" if DU=="Somewhat disagree"
replace vo2="3" if DU=="Neither agree nor disagree"
replace vo2="4" if DU=="Somewhat agree"
replace vo2="5" if DU=="Strongly agree"
gen vo3="."
replace vo3=DV
replace vo3="1" if DV=="Strongly disagree"
replace vo3="2" if DV=="Somewhat disagree"
replace vo3="3" if DV=="Neither agree nor disagree"
replace vo3="4" if DV=="Somewhat agree"
replace vo3="5" if DV=="Strongly agree"
gen vo4="."
replace vo4=DW
replace vo4="1" if DW=="Strongly disagree"
replace vo4="2" if DW=="Somewhat disagree"
replace vo4="3" if DW=="Neither agree nor disagree"
replace vo4="4" if DW=="Somewhat agree"
replace vo4="5" if DW=="Strongly agree"
gen vo5="."
replace vo5=DX
replace vo5="1" if DX=="Strongly disagree"
replace vo5="2" if DX=="Somewhat disagree"
replace vo5="3" if DX=="Neither agree nor disagree"
replace vo5="4" if DX=="Somewhat agree"
replace vo5="5" if DX=="Strongly agree"
gen vo6="."
replace vo6=DY
replace vo6="1" if DY=="Strongly disagree"
replace vo6="2" if DY=="Somewhat disagree"
replace vo6="3" if DY=="Neither agree nor disagree"
replace vo6="4" if DY=="Somewhat agree"
replace vo6="5" if DY=="Strongly agree"
gen vo7="."
replace vo7=DZ
replace vo7="1" if DZ=="Strongly disagree"
replace vo7="2" if DZ=="Somewhat disagree"
replace vo7="3" if DZ=="Neither agree nor disagree"
replace vo7="4" if DZ=="Somewhat agree"
replace vo7="5" if DZ=="Strongly agree"
gen vo8="."
replace vo8=EA
replace vo8="1" if EA=="Strongly disagree"
replace vo8="2" if EA=="Somewhat disagree"
replace vo8="3" if EA=="Neither agree nor disagree"
replace vo8="4" if EA=="Somewhat agree"
replace vo8="5" if EA=="Strongly agree"
gen vo9="."
replace vo9=EB
replace vo9="1" if EB=="Strongly disagree"
replace vo9="2" if EB=="Somewhat disagree"
replace vo9="3" if EB=="Neither agree nor disagree"
replace vo9="4" if EB=="Somewhat agree"
replace vo9="5" if EB=="Strongly agree"
gen vo10="."
replace vo10=EC
replace vo10="1" if EC=="Strongly disagree"
replace vo10="2" if EC=="Somewhat disagree"
replace vo10="3" if EC=="Neither agree nor disagree"
replace vo10="4" if EC=="Somewhat agree"
replace vo10="5" if EC=="Strongly agree"
*Convert model variables from string to numeric
generate Race=real(race)
generate Age=real(age)
generate Education=real(education)
generate Ethnicity=real(ethnicity)
generate Income=real(income)
generate Vo1=real(vo1)
generate Vo2=real(vo2)
generate Vo3=real(vo3)
generate Vo4=real(vo4)
generate Vo5=real(vo5)
generate Vo6=real(vo6)
generate Vo7=real(vo7)
generate Vo8=real(vo8)
generate Vo9=real(vo9)
generate Vo10=real(vo10)
One-sided test of coefficients obtained by mixed
Dear all,
I ran mixed 'outcome' 'treatment' 'binary_covariates' 'metric_covaraites' [fweight=w] || block: . The coefficient for treatment is non-significant, but I would like to be more certain it is not negative. Thus I would like to perform a one-sided test to find out. How can I best do that?
output
Array
The treatment variable is called SRI. All variables except area_aux_rice are binary. The sample contains cross-sectional data of farming households that were matched into groups using Mahalinobis matching.
regards,
Sarah
I ran mixed 'outcome' 'treatment' 'binary_covariates' 'metric_covaraites' [fweight=w] || block: . The coefficient for treatment is non-significant, but I would like to be more certain it is not negative. Thus I would like to perform a one-sided test to find out. How can I best do that?
output
Array
The treatment variable is called SRI. All variables except area_aux_rice are binary. The sample contains cross-sectional data of farming households that were matched into groups using Mahalinobis matching.
regards,
Sarah
Looping in Stata to extract values from one file to another
Hi all!
So I am a bit stuck on something I have to do.
1. I have a dataset which is multi level, and includes several individual observations for different countries (total 40), for different years/waves. (So for each year, there are 40 countries each having several individual observations of its own). So this is an individual level file.
2. In another dataset, I have the country level variables for all these countries and many more, which has variables like gdp of the country in that particular year
3. I have created an id variable - such that (country-wave) which is the same in both the files.
SO if the code is 47, it represents for country 4 in wave 7.
I want to import the gdp values for each country-wave variable from dataset in 2. such that it gets inputed into all the individual level observations corresponding to that particular (country-wave) variable.
All i need is a loop code such that it would take the gdp value corresponding to the country-wave number (e.g. 47) and put it into the individual file for observations that also correspond to the same country-wave number.
These are two different stata files.
Please help if you can! Will be very well appreciated.
So I am a bit stuck on something I have to do.
1. I have a dataset which is multi level, and includes several individual observations for different countries (total 40), for different years/waves. (So for each year, there are 40 countries each having several individual observations of its own). So this is an individual level file.
2. In another dataset, I have the country level variables for all these countries and many more, which has variables like gdp of the country in that particular year
3. I have created an id variable - such that (country-wave) which is the same in both the files.
SO if the code is 47, it represents for country 4 in wave 7.
I want to import the gdp values for each country-wave variable from dataset in 2. such that it gets inputed into all the individual level observations corresponding to that particular (country-wave) variable.
All i need is a loop code such that it would take the gdp value corresponding to the country-wave number (e.g. 47) and put it into the individual file for observations that also correspond to the same country-wave number.
These are two different stata files.
Please help if you can! Will be very well appreciated.
rho value in heckman model
Hello dear proffessor;
could you help me please in one small issue; i have panel data with n=7 t=8 ,, I use bootstrap methode with Heckman selection model but i find some diffecult to interpret the results , especialy the validation of rho value as it aqual to 1,more detail is in attachement
i appreciate any help from you
Attached Files
could you help me please in one small issue; i have panel data with n=7 t=8 ,, I use bootstrap methode with Heckman selection model but i find some diffecult to interpret the results , especialy the validation of rho value as it aqual to 1,more detail is in attachement
i appreciate any help from you
Attached Files
Problem with estimation of coefficient in panel regression
My aim is to estimate the association between a country's real per capita growth (realgthpc) and exported-weighted real per capita growth of its trading partners (tpg_aenia for advanced economies, tpg_em for emerging markets, tpg_chn for china). I am using a panel of 26 countries with data at annual frequency. I control for the usual variables that are associated with growth such as inflation, physical and human capital, global growth etc. Moderation, crisis, recovery dummies capture the periods 2003-07, 2008-09, and 2010-11 respectively.
I first run the regression for the period 2001 to 2018. Guided by the Hausman test, I have used country fixed effects. Also, standard errors are clustered at the country-level. I find that a 1% increase in growth of EM trade partners is associated with a 0.5% increase in the growth of the reporting country.
I first run the regression for the period 2001 to 2018. Guided by the Hausman test, I have used country fixed effects. Also, standard errors are clustered at the country-level. I find that a 1% increase in growth of EM trade partners is associated with a 0.5% increase in the growth of the reporting country.
Code:
xtreg realgthpc l.tpg_aenia l.tpg_em l.tpg_chn realpcgdp_5yago_bn gfcf_nomgdp prim_enrol cpi_ep_inflatn depratio open_gdp globalgthpc oiltrdbal_gdp oiltrdbal_gdp_5y gth_commexp_prind crisis_dummy moderation_dummy recovery_dummy, fe vce(cluster CountryNo)
Code:
Fixed-effects (within) regression Number of obs = 343
Group variable: CountryNo Number of groups = 26
R-sq: Obs per group:
within = 0.3832 min = 4
between = 0.0080 avg = 13.2
overall = 0.1749 max = 17
F(16,25) = 15.10
corr(u_i, Xb) = -0.5907 Prob > F = 0.0000
(Std. Err. adjusted for 26 clusters in CountryNo)
------------------------------------------------------------------------------------
| Robust
realgthpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
tpg_aenia |
L1. | -.0420772 .1956026 -0.22 0.831 -.4449283 .3607739
|
tpg_em |
L1. | .5208979 .2481788 2.10 0.046 .009764 1.032032
|
tpg_chn |
L1. | .1076752 .5621433 0.19 0.850 -1.050081 1.265431
|
realpcgdp_5yago_bn | .0309591 .0211014 1.47 0.155 -.0125001 .0744183
gfcf_nomgdp | .2013728 .0805251 2.50 0.019 .0355283 .3672173
prim_enrol | .0969751 .0289507 3.35 0.003 .0373501 .1566001
cpi_ep_inflatn | -.1490856 .0699263 -2.13 0.043 -.2931015 -.0050697
depratio | -.0002485 .0571743 -0.00 0.997 -.1180011 .1175041
open_gdp | .0272978 .023562 1.16 0.258 -.0212292 .0758247
globalgthpc | .7846087 .2894573 2.71 0.012 .1884603 1.380757
oiltrdbal_gdp | .061986 .079659 0.78 0.444 -.1020748 .2260467
oiltrdbal_gdp_5y | .0130573 .1709793 0.08 0.940 -.339081 .3651957
gth_commexp_prind | .0252099 .0140975 1.79 0.086 -.0038245 .0542443
crisis_dummy | .134008 .9356604 0.14 0.887 -1.793021 2.061037
moderation_dummy | .8694931 .6094907 1.43 0.166 -.3857765 2.124763
recovery_dummy | .5936562 .6663803 0.89 0.381 -.7787797 1.966092
_cons | -14.92877 6.315646 -2.36 0.026 -27.93609 -1.921454
-------------------+----------------------------------------------------------------
sigma_u | 2.7801935
sigma_e | 2.6945358
rho | .51564218 (fraction of variance due to u_i)
------------------------------------------------------------------------------------
Redistributing a Group's percentiles in panel data
Hi, this is my first post in Statalist and I'm hoping you can help me.
I am using a panel data set with really good coverage 2001-2018, however, the DTI for 1999 and 2000 is shaky at best for a specific group (where DTI drops to 50 obs where there should be 10,000.) We used multiple imputations as a fix but when compared to a benchmark, they fell 7 points lower than expected DTI's.
I have a benchmark for the 10th, 25th, 50th, 75th, and 90th percentiles from a reliable source for 1999 and 2000 with a total count for the year.
I have never created weights before/ have barely used them.
But should I make sample weights for my specific group for 2000, or create buckets?
I know this is vague, but I'm not sure what information would help for clarity.
Best and Thanks,
Sian
I am using a panel data set with really good coverage 2001-2018, however, the DTI for 1999 and 2000 is shaky at best for a specific group (where DTI drops to 50 obs where there should be 10,000.) We used multiple imputations as a fix but when compared to a benchmark, they fell 7 points lower than expected DTI's.
I have a benchmark for the 10th, 25th, 50th, 75th, and 90th percentiles from a reliable source for 1999 and 2000 with a total count for the year.
I have never created weights before/ have barely used them.
But should I make sample weights for my specific group for 2000, or create buckets?
I know this is vague, but I'm not sure what information would help for clarity.
Best and Thanks,
Sian
Geonear producing unstable results
I'm trying to use geonear to identify the nearest neighbors within a dataset at the household-day level.
I've got longitude with a level of precision ranging between the 4-6 decimal place. I've got latitude to the 6th decimal place always.
I ran a do-file, which produces the six nearest neighbors, four times. The first two times I ran it, they produced the same results. The second two times I ran it they varied in the final neighbor identification.
Is there a place where I can expect this type of variation in output to occur? Is it less likely if I specify a within()?
For what it is worth, here is my code.
I've got longitude with a level of precision ranging between the 4-6 decimal place. I've got latitude to the 6th decimal place always.
I ran a do-file, which produces the six nearest neighbors, four times. The first two times I ran it, they produced the same results. The second two times I ran it they varied in the final neighbor identification.
Is there a place where I can expect this type of variation in output to occur? Is it less likely if I specify a within()?
For what it is worth, here is my code.
Code:
preserve rename hhd_id nbor_index sort nbor_index, stable tempfile nbor save `nbor' restore sort hhd_id, stable geonear hhd_id lat lon using `nbor', n(nbor_index lat lon) ign wide near(6)
Generating mean returns, based on xtile returns
Hi comminuty,
I seem to have problems with calculating mean returns (ret_sup in my data). My dataset has weekly obeservations for returns(customer en supplier). The lagged customer returns are split into quintiles of 5 with xtile. Based on the rank of the lagged customer returns, I want to calculate the weekly return of the associated supplier returns per quintile.
I used the following codes, but when I manually calculate the returns with excel for double checking, it doesnt give me the same results:
My dataex exactly consists of 1 week of paneldata.
Thanks in advance, help would be appreciated here.
I seem to have problems with calculating mean returns (ret_sup in my data). My dataset has weekly obeservations for returns(customer en supplier). The lagged customer returns are split into quintiles of 5 with xtile. Based on the rank of the lagged customer returns, I want to calculate the weekly return of the associated supplier returns per quintile.
I used the following codes, but when I manually calculate the returns with excel for double checking, it doesnt give me the same results:
Code:
by stic ctic: gen L_ret= ret_cus[_n-1] egen retQ = xtile(L_ret), by( week_id ) nq(5) by week_id : egen retQ1=mean(ret_cus) if retQ ==1 by week_id : egen retQ2=mean(ret_cus) if retQ ==2 by week_id : egen retQ3=mean(ret_cus) if retQ ==3 by week_id : egen retQ4=mean(ret_cus) if retQ ==4 by week_id : egen retQ5=mean(ret_cus) if retQ ==5
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str8(stic ctic) float(week_id ret_sup ret_cus L_ret retQ retQ1 retQ2 retQ3 retQ4 retQ5) "PCZ" "CWG" 1926 .023622047 .069767445 .01775148 3 . . .0150178 . . "CPCI" "SNE" 1926 .067307696 .02152642 .02152642 3 . . .0150178 . . "CSDS" "GND" 1926 -.11594203 -.06666667 -.0625 1 .05393586 . . . . "PGEX" "FTR" 1926 .05633803 .03 .0033444816 2 . .03446858 . . . "HMTT" "IOM" 1926 .04918033 .1119403 -.02189781 1 .05393586 . . . . "SRI" "GM" 1926 . .016393442 .029535865 3 . . .0150178 . . "VICL" "RPR" 1926 -.007352941 .011666667 0 2 . .03446858 . . . "KOPN" "ROK" 1926 -.04807692 .030674847 -.012121212 1 .05393586 . . . . "MXC" "AQP" 1926 -.06451613 .008130081 -.008064516 2 . .03446858 . . . "TSGI" "NYN" 1926 -.06666667 .012562814 .015306123 3 . . .0150178 . . "AKS" "GM" 1926 .04153354 .016393442 .029535865 3 . . .0150178 . . "LEA" "GM" 1926 -.006493506 .016393442 .029535865 3 . . .0150178 . . "JCI" "GM" 1926 -.0056899 .016393442 .029535865 3 . . .0150178 . . "VCOM" "DSCP" 1926 -.014084507 -.010526316 .11111111 5 . . . . .0038411536 "SHLO" "GM" 1926 .028169014 .016393442 .029535865 3 . . .0150178 . . "IVAC" "HMTT" 1926 .04635762 .04918033 .18217054 5 . . . . .0038411536 "CSPLF" "CWG" 1926 -.01724138 .069767445 .01775148 3 . . .0150178 . . "BJICA" "DRYR" 1926 -.0825688 -.016666668 -.024390243 1 .05393586 . . . . "MNR" "AJL" 1926 -.02777778 .032786883 -.07575758 1 .05393586 . . . . "ITT" "GM" 1926 .005780347 .016393442 .029535865 3 . . .0150178 . . "ALET" "ALET" 1926 .04166667 .04166667 .04166667 5 . . . . .0038411536 "MCCL" "WMX" 1926 -.08571429 -.011235955 .011363637 3 . . .0150178 . . "SPTR" "OCC" 1926 .02352941 .06498195 -.017730497 1 .05393586 . . . . "LRCI" "BKCO" 1926 .0925926 .03448276 .035714287 5 . . . . .0038411536 "JPMX" "DBD" 1926 0 -.07212476 . . . . . . . "DSPT" "GM" 1926 0 .016393442 .029535865 3 . . .0150178 . . "PRCP" "GM" 1926 -.028368793 .016393442 .029535865 3 . . .0150178 . . "CPCI" "SNE" 1926 .067307696 .02152642 -.04307116 1 .05393586 . . . . "CVTX" "BGEN" 1926 .0877193 .15337424 -.006097561 2 . .03446858 . . . "PCZ" "TRP" 1926 .023622047 .02142857 .007194245 2 . .03446858 . . . "CEPH" "CHIR" 1926 .11229946 -.040625 .012658228 3 . . .0150178 . . "CFCI" "REY" 1926 .02479339 .03809524 .024390243 3 . . .0150178 . . "ROG" "AFLX" 1926 -.009090909 -.10784314 .05154639 5 . . . . .0038411536 "AFLX" "SFLX" 1926 -.10784314 -.015151516 .023255814 3 . . .0150178 . . "KLOC" "CBS" 1926 -.06666667 0 .018072288 3 . . .0150178 . . "HIW" "HIW" 1926 -.003610108 -.003610108 .029739777 5 . . . . .0038411536 "DEFI" "GM" 1926 -.01923077 .016393442 .029535865 3 . . .0150178 . . "RGLD" "PDG" 1926 -.1 .03030303 .012269938 3 . . .0150178 . . "SIND" "SHX" 1926 .11764706 .07692308 -.02803738 1 .05393586 . . . . "STCL" "RCII" 1926 .1935484 .02255639 .11764706 5 . . . . .0038411536 "STRB" "CBS" 1926 0 0 .018072288 3 . . .0150178 . . "WN" "GM" 1926 -.3415638 .016393442 .029535865 3 . . .0150178 . . "MCCL" "MCCL" 1926 -.08571429 -.08571429 .14130434 5 . . . . .0038411536 "DECO" "RBD" 1926 0 .03910615 .017045455 3 . . .0150178 . . "CSYI" "SGMA" 1926 -.01369863 .05755396 .007246377 3 . . .0150178 . . "SMPS" "GM" 1926 -.011494253 .016393442 .029535865 3 . . .0150178 . . "BGG" "ELUXY" 1926 .013812155 -.02484472 .07572383 5 . . . . .0038411536 "HILI" "GM" 1926 .15789473 .016393442 .029535865 3 . . .0150178 . . "ALET" "ALET" 1926 .04166667 .04166667 0 2 . .03446858 . . . "NFG" "UGI" 1926 -.02285714 .005586592 0 2 . .03446858 . . . "ZRAN" "IOM" 1926 .22826087 .1119403 -.02189781 1 .05393586 . . . . "SUP" "GM" 1926 -.010869565 .016393442 .029535865 3 . . .0150178 . . "GLFD" "RPR" 1926 -.02 .011666667 0 2 . .03446858 . . . "TIII" "NYN" 1926 .1122449 .012562814 .015306123 3 . . .0150178 . . "CHK" "AQP" 1926 -.1440678 .008130081 -.008064516 2 . .03446858 . . . "PTSI" "GM" 1926 -.020833334 .016393442 .029535865 3 . . .0150178 . . "NAII" "AJL" 1926 -.01369863 .032786883 -.07575758 1 .05393586 . . . . "SPZE" "EIRE" 1926 .1875 .13414635 .031446543 5 . . . . .0038411536 "HICKA" "GM" 1926 -.0125 .016393442 .029535865 3 . . .0150178 . . "ORCI" "GM" 1926 -.03448276 .016393442 .029535865 3 . . .0150178 . . "AMSC" "CWP" 1926 -.032967035 .037037037 -.010471204 2 . .03446858 . . . "HAVA" "GM" 1926 .35 .016393442 .029535865 3 . . .0150178 . . "GNTX" "GM" 1926 -.07317073 .016393442 .029535865 3 . . .0150178 . . "ARRS" "TCOMA" 1926 .07476635 .04424779 -.02586207 1 .05393586 . . . . "GGNS" "IIII" 1926 -.11764706 0 -.006993007 2 . .03446858 . . . "NSCC" "OHM" 1926 .04347826 -.014492754 0 2 . .03446858 . . . "RGR" "ELY" 1926 .006535948 .12820514 -.012658228 1 .05393586 . . . . "IMPX" "ROK" 1926 .071428575 .030674847 -.012121212 1 .05393586 . . . . "AWRE" "USRX" 1926 .25714287 .008561644 .06181818 5 . . . . .0038411536 "STRT" "GM" 1926 0 .016393442 .029535865 3 . . .0150178 . . "IMPX" "IOM" 1926 .071428575 .1119403 -.02189781 1 .05393586 . . . . "MAYS" "FULT" 1926 0 .10843374 0 2 . .03446858 . . . "SWTX" "SNE" 1926 -.05357143 .02152642 -.04307116 1 .05393586 . . . . "AOS" "YRK" 1926 -.06072874 .009433962 -.023041476 1 .05393586 . . . . "LCBM" "CHIR" 1926 -.006849315 -.040625 .012658228 3 . . .0150178 . . "FSCR" "GM" 1926 .022900764 .016393442 .029535865 3 . . .0150178 . . "INVI" "GM" 1926 0 .016393442 .029535865 3 . . .0150178 . . "ORBT" "GM" 1926 .0882353 .016393442 .029535865 3 . . .0150178 . . "SPA" "WAT" 1926 .008064516 .06521739 -.033613447 1 .05393586 . . . . "NAII" "NUS" 1926 -.01369863 -.041841 .012711864 3 . . .0150178 . . "NFG" "NUI" 1926 -.02285714 0 0 2 . .03446858 . . . "CTS" "GM" 1926 .028744327 .016393442 .029535865 3 . . .0150178 . . "KVHI" "GM" 1926 0 .016393442 .029535865 3 . . .0150178 . . "WALB" "GM" 1926 0 .016393442 .029535865 3 . . .0150178 . . "SEC" "MCD" 1926 .037037037 .04519774 0 2 . .03446858 . . . "DASTY" "DASTY" 1926 -.011961723 -.011961723 .14520548 5 . . . . .0038411536 "DRTE" "RPR" 1926 -.0625 .011666667 0 2 . .03446858 . . . "PMD" "GM" 1926 .02654867 .016393442 .029535865 3 . . .0150178 . . "KRT" "HCDV" 1926 .014492754 0 -.032967035 1 .05393586 . . . . "ASI" "ASI" 1926 .29801324 .29801324 -.01948052 1 .05393586 . . . . "PAR" "MCD" 1926 .14049587 .04519774 0 2 . .03446858 . . . "PDKL" "COGE" 1926 -.023809524 .09574468 -.005291005 2 . .03446858 . . . "PSDS" "DRI" 1926 .10204082 -.03076923 -.029850746 1 .05393586 . . . . "WCSTF" "GM" 1926 -.01599016 .016393442 .029535865 3 . . .0150178 . . "BWA" "GM" 1926 .04140127 .016393442 .029535865 3 . . .0150178 . . end format %tw week_id
Misstable
Hello,
Looking to do misstable summarize. I have a variable split into cases and controls. Then I have another variable that I want to see how many missing values there are. But I want to know how many missing values there are for the cases and controls. Is there a code for this so that I don't just see the total number of missing values but rather have it split via another variable?
Thanks
Looking to do misstable summarize. I have a variable split into cases and controls. Then I have another variable that I want to see how many missing values there are. But I want to know how many missing values there are for the cases and controls. Is there a code for this so that I don't just see the total number of missing values but rather have it split via another variable?
Thanks
Age and Gender
Bonjour,
First and foremost, I have to warn you that Im such a beginner on STATA & statistics overall ...
But with regards to my master thesis on fashion windows displays & consumer behavior, I had to do statistics analyses !
Anyway, Im facing now the issue of including gender & age, as covariates/moderator variables of the model.
gender: male (116), female (97), non-binary (27)
age: 18-24 (195), 25-34 (27), 35-44 (7), 45-60+ (11)
In total 240 participants, but the sample sizes are different! I think it is a problem.
I tried to code both as dummy variables, but received the message "no observations".
I want to study if including genders identities influence Entry Intention (Y outcome variable) regarding f4 different Fashion Windows Displays (independent variables: X1Mannequins X2PunchyWarmColors X3InteractiveTechnology X4CreativeDesign)
And I want to know if age has an impact on the attraction towards technology-based window.
- maybe: dummy variables are not the solution?
- others models should be used like: a binary response model?
- maybe ANOVA? by separating the population using a median split?
I already reg per gender & the three models are significant p-value <0,05, R²<1.
Also, F-Test is OK.
VIF: checked as well.
I hope that I have expressed myself correctly.
I include here my Excel file.
Brgrds,
Isabelle Deschamps
First and foremost, I have to warn you that Im such a beginner on STATA & statistics overall ...
But with regards to my master thesis on fashion windows displays & consumer behavior, I had to do statistics analyses !
Anyway, Im facing now the issue of including gender & age, as covariates/moderator variables of the model.
gender: male (116), female (97), non-binary (27)
age: 18-24 (195), 25-34 (27), 35-44 (7), 45-60+ (11)
In total 240 participants, but the sample sizes are different! I think it is a problem.
I tried to code both as dummy variables, but received the message "no observations".
I want to study if including genders identities influence Entry Intention (Y outcome variable) regarding f4 different Fashion Windows Displays (independent variables: X1Mannequins X2PunchyWarmColors X3InteractiveTechnology X4CreativeDesign)
And I want to know if age has an impact on the attraction towards technology-based window.
- maybe: dummy variables are not the solution?
- others models should be used like: a binary response model?
- maybe ANOVA? by separating the population using a median split?
I already reg per gender & the three models are significant p-value <0,05, R²<1.
Also, F-Test is OK.
VIF: checked as well.
I hope that I have expressed myself correctly.
I include here my Excel file.
Brgrds,
Isabelle Deschamps
event study - eventstudy2
I am a postgraduate student writing my thesis. I am somewhat a novice in the field I have chosen to study, however this has undoubtedly broadened my horizons. I am attempting to evaluate the impact of FOMC news on the dow jones industrial average. Please excuse me for resorting to the forum the for help but I am getting quite confused with all the material I have seen on event studies and would really appreciate a specific direction at this stage.
My study: I have two periods that I will evaluate; 2008 - 2013 and then 2013 - 2018. I have a total of 63 events - 23 in the 2008 - 2013 period and 40 in the 2013 - 2018 period - and I am using open/close data. Due to data access, I was unable to obtain intraday data (for less than £1500) but I feel using the open/close data will enable me to provide a broad view and still comment on any pre-announcement drift and lasting effects of FOMC news (thus commenting on efficient market theory). To that end, I would like to evaluate the way in which the Dow anticipates FOMC news 3 days prior to the announcement and the way in which news is incorporated, 3 days after the event.
I have used Stata for past projects, unrelated to finance and feel its use would will be quicker than attempting to evaluate each event in Excel, further I would like to make use of the estout or outreg2 for output tables. I also had the idea to evaluate the effects of unscheduled meetings - so effectively, another set of events. I considered using dummies/ changing (d.var d2.var etc) for x days – leading and lagging (1, 2,3, 4 ) to see if there is any reaction and whether it is more dramatic or not of the scheduled meetings. I suppose they would be in reaction to shocks and a such not working in the same way as the schedule ones. I have done something similar on a project that was not finance related so excuse the novice approach but is this something that could still be done using for example, eventstudy2?
The majority of event studies I have seen appear to evaluate a stock against an index (eg. apple against the S&P 500 is very common) to thus estimate normal performance, abnormal and cumulative abnormal returns, as well as being able to test for significance. Is it possible to conduct an event study with just the Dow Jones?
If all seems feasible, does anyone have direction of how I could go about starting my study using eventstudy2. There is usually an abundance of information regarding the stata packages but there seems to be little on eventstudy2.
Thanks in advance.
My study: I have two periods that I will evaluate; 2008 - 2013 and then 2013 - 2018. I have a total of 63 events - 23 in the 2008 - 2013 period and 40 in the 2013 - 2018 period - and I am using open/close data. Due to data access, I was unable to obtain intraday data (for less than £1500) but I feel using the open/close data will enable me to provide a broad view and still comment on any pre-announcement drift and lasting effects of FOMC news (thus commenting on efficient market theory). To that end, I would like to evaluate the way in which the Dow anticipates FOMC news 3 days prior to the announcement and the way in which news is incorporated, 3 days after the event.
I have used Stata for past projects, unrelated to finance and feel its use would will be quicker than attempting to evaluate each event in Excel, further I would like to make use of the estout or outreg2 for output tables. I also had the idea to evaluate the effects of unscheduled meetings - so effectively, another set of events. I considered using dummies/ changing (d.var d2.var etc) for x days – leading and lagging (1, 2,3, 4 ) to see if there is any reaction and whether it is more dramatic or not of the scheduled meetings. I suppose they would be in reaction to shocks and a such not working in the same way as the schedule ones. I have done something similar on a project that was not finance related so excuse the novice approach but is this something that could still be done using for example, eventstudy2?
The majority of event studies I have seen appear to evaluate a stock against an index (eg. apple against the S&P 500 is very common) to thus estimate normal performance, abnormal and cumulative abnormal returns, as well as being able to test for significance. Is it possible to conduct an event study with just the Dow Jones?
If all seems feasible, does anyone have direction of how I could go about starting my study using eventstudy2. There is usually an abundance of information regarding the stata packages but there seems to be little on eventstudy2.
Thanks in advance.
Dummy Variable??
Hi All,
I have tried to find this answer in the forum but cant seem to.
I have a set of panel data for 15 countries over 10 years and want to create a dummy based on per capita income levels, per year but split into 3. Hope this makes sense
Usually with cross section i would first create the median income per year (as i want equal observations in each half(75))
egen MEDINCOME=median(income), by (year)
then gen dummy = income>MEDINCOME
However, I would like to have 3 splits instead of just two, eg 50 observations in each to get the equivalent of high income, middle income and lower income.
Is this possible?
Thank you
Safia
I have tried to find this answer in the forum but cant seem to.
I have a set of panel data for 15 countries over 10 years and want to create a dummy based on per capita income levels, per year but split into 3. Hope this makes sense
Usually with cross section i would first create the median income per year (as i want equal observations in each half(75))
egen MEDINCOME=median(income), by (year)
then gen dummy = income>MEDINCOME
However, I would like to have 3 splits instead of just two, eg 50 observations in each to get the equivalent of high income, middle income and lower income.
Is this possible?
Thank you
Safia
Suest - Can I cluster just 1 model?
Dear all,
I'm trying to compare the coefficients of an OLS regression with a multilevel regression. I tried to use the Hausman command but got an error saying that
I'm trying to use the Suest command as a proper solution, but this command doesn't work after Mixed command. I understand that there's an option to use
in order to cluster 1-level models and with that use this command for mixed models.
However, in my case there's only one model with clustering (the other is just a normal linear regression), so if I try this syntax -
the command create two identical clustered regression, where I want just the second one to be clustered (the model I call "random").
Is there a way to cluster just 1 model in the suest command?
Thanks!
Eran
I'm trying to compare the coefficients of an OLS regression with a multilevel regression. I tried to use the Hausman command but got an error saying that
Code:
"model fitted on these data fails to meet the asymptotic assumptions of the Hausman test; see suest for a generalized test"
Code:
vce(cluster clustervar)
However, in my case there's only one model with clustering (the other is just a normal linear regression), so if I try this syntax -
Code:
regress sqrt_Relative_time_days White Hispanic Other Age Sex i.Seriousness_of_crime_7 Death_Penalty i.year_of_conviction_groups estimates store fixed mixed sqrt_Relative_time_days White Hispanic Other Age Sex i.Seriousness_of_crime_7 Death_Penalty i.year_of_conviction_groups || states_combined: estimates store random suest fixed random, vce(cluster clustervar)
Is there a way to cluster just 1 model in the suest command?
Thanks!
Eran
Thursday, August 29, 2019
Adjusted variable and deviation
Hi,
I want to generate a new variable (RC) with the sum of industry-year (-Risky_Ind- and -year-) adjusted -INV- and -REC- as a percentage of -TA-, and expressed as deviation from the corresponding industry-year mean.
I would be grateful if you could help me with generating the new variable.
Note: IS is the company identifier.
example:
. dataex IS INV REC TA year risky_ind in 1/50
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
Listed 50 out of 4887 observations
.
Best,
Mahmoud
I want to generate a new variable (RC) with the sum of industry-year (-Risky_Ind- and -year-) adjusted -INV- and -REC- as a percentage of -TA-, and expressed as deviation from the corresponding industry-year mean.
I would be grateful if you could help me with generating the new variable.
Note: IS is the company identifier.
example:
. dataex IS INV REC TA year risky_ind in 1/50
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str11 IS double(INV REC TA) int year float risky_ind "IQ1001135" 2.20403 2.52538 42.87435 2013 0 "IQ1001135" 2.83681 3.1106 24.37639 2014 0 "IQ10038253" 10.03707 9.87853 67.32937 2017 0 "IQ100475337" 140.42434 64.60702 1320.80513 2013 0 "IQ100475337" 95.74748 65.44843 857.81385 2014 0 "IQ100475337" 53.54069 52.1743 574.70545 2015 0 "IQ100475337" 58.86901 38.4869 501.58682 2016 0 "IQ100475337" 46.88747 37.78913 427.04402 2017 0 "IQ100475337" 17.20336 47.58633 450.97572 2018 0 "IQ10059716" 13.97032 2.20655 64.91229 2014 0 "IQ10059716" 12.58804 1.39677 79.95269 2015 0 "IQ10059716" 13.46922 1.33011 71.59447 2016 0 "IQ10059716" 15.93507 2.52412 76.18798 2017 0 "IQ10059716" 16.80082 1.94413 83.10289 2018 0 "IQ100601859" 59.32227 15.65512 200.16184 2013 0 "IQ100601859" 69.46581 9.0277 250.01989 2014 0 "IQ100601859" 76.5801 13.21964 289.17414 2015 0 "IQ100601859" 57.17937 9.38205 265.69254 2016 0 "IQ100601859" 45.53861 6.08402 329.99212 2017 0 "IQ100601859" 41.12032 5.39557 338.10262 2018 0 "IQ10063218" .10101 .55322 16.63223 2016 0 "IQ10063218" .13055 1.21617 18.80271 2017 0 "IQ10063218" .09417 1.42922 15.64643 2018 0 "IQ1007880" 44.6 175.1 3016.5 2014 0 "IQ1007880" 39.5 168.1 2765.1 2015 0 "IQ102377327" 16.528 7.584 63.839 2013 0 "IQ102377327" 12.146 3.755 40.429 2014 0 "IQ102377327" 10.037 3.182 29.465 2015 0 "IQ102377327" 12.977 3.546 35.056 2016 0 "IQ102377327" 19.175 3.427 66.001 2018 0 "IQ1027520" .13204 39.70413 174.29746 2013 1 "IQ1027520" .10263 40.96519 164.38773 2014 1 "IQ1027520" .1358 47.1315 206.26628 2015 1 "IQ1027520" .1389 44.97404 207.2047 2016 1 "IQ1027520" .10174 54.73586 267.06623 2017 1 "IQ1027520" .09828 58.96796 302.21079 2018 1 "IQ10332217" 3.54973 3.542 10.19086 2013 0 "IQ10332217" 4.0283 3.5752 11.44259 2014 0 "IQ10332217" 4.58241 3.68841 12.71978 2015 0 "IQ10332217" 5.49894 4.2628 14.14477 2016 0 "IQ10332217" 4.99721 4.58772 15.03865 2017 0 "IQ10332217" 4.97193 4.85585 16.30446 2018 0 "IQ1034090" . 348.4 1094.6 2013 0 "IQ1034090" . 326.1 1031.5 2014 0 "IQ1034090" . 300.8 1086.3 2015 0 "IQ1034090" . 339.8 1074.5 2016 0 "IQ1034090" . 364.9 1149.9 2017 0 "IQ1034090" . 335.4 1197.6 2018 0 "IQ1034269" 2.198 1.781 18.498 2013 0 "IQ1034269" .067 .465 7.56 2014 0 end
Listed 50 out of 4887 observations
.
Best,
Mahmoud
Error in converting time (hourly) variable
Hi. I have hourly level data with the format, "2010123123" representing "23:00 Dec 31, 2010". I can convert it into time variable by using "double" but there is a problem for some observations. The data show "00:00 Jan 1, 2011" as 2010123124 and Stata does not automatically change it to the midnight next day and generate an empty cell, shown as below. I would really appreciate it if you could help me to correctly convert it into the next day format. Thank you very much!
Array
Array
taking a number from a returned matrix?
Dear All, I run the following xthreg (please search xthreg, and install) command
Note that you have to (ssc) install outreg2.
My question is: how can I using something like f[1,3] (the is not correct, but what is the correct command?) instead of 35.196397 in the `adds(F, 35.196397)' option?
Code:
xthreg i q1 q2 q3 d1 qd1, rx(cf1) qx(d1) thnum(1) grid(400) trim(0.01) bs(100)
matrix f = e(Fstat)
matrix list f
/*
f[1,7]
RSS MSE Fstat Prob Crit10 Crit5 Crit1
Single 17.781836 .00225201 35.196397 0 12.908204 15.651497 21.814463
*/
outreg2 using "thrpd", word dec(4) adds(F, 35.196397) adec(3) ctitle(linear) replace
My question is: how can I using something like f[1,3] (the is not correct, but what is the correct command?) instead of 35.196397 in the `adds(F, 35.196397)' option?
Sub-group meta-analysis in Stata
I have two question:
1: Does Stata allow
2. How can I get Q-value and its it corresponding p-value for difference between subgroups in meta-analysis. What is the code for that? Many thanks for your support. I am doing a meta-analysis of prevalence studies, but if you have a answer/ code for any other kind of studies, that is fine too.
1: Does Stata allow
sub-group meta-analysis using
mixed effects ? If there is a code available, Please share.2. How can I get Q-value and its it corresponding p-value for difference between subgroups in meta-analysis. What is the code for that? Many thanks for your support. I am doing a meta-analysis of prevalence studies, but if you have a answer/ code for any other kind of studies, that is fine too.
Reshaping Datastream Data (Currency) for Stata
I have been trying to transpose Datastream data by date. I know how to do this in SAS but I am having trouble replicating this on STATA. I have provided a dataex sample below
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(date uk australia brazil bulgaria)
-1065 2.7982 . . .
-1037 2.7975 . . .
-1008 2.7875 . . .
-976 2.7912 . . .
-945 2.7918 . . .
-917 2.7912 . . .
-884 2.785 . . .
-854 2.7832 . . .
-823 2.7956 . . .
-792 2.7968 . . .
-763 2.8062 . . .
-731 2.8594 . . .
-700 2.8162 . . .
-672 2.8162 . . .
-641 2.8162 . . .
-611 2.8182 . . .
-581 2.8168 . . .
-550 2.8032 . . .
-519 2.805 . . .
-490 2.8018 . . .
-458 2.8068 . . .
-427 2.8062 . . .
-399 2.8044 . . .
-366 2.8025 . . .
-336 2.8094 . . .
-308 2.8055 . . .
-276 2.8162 . . .
-246 2.8162 . . .
-217 2.8118 . . .
-185 2.8138 . . .
-154 2.8112 . . .
-123 2.8056 . . .
-93 2.8044 . . .
-63 2.8056 . . .
-32 2.8018 . . .
-1 2.8 . . .
28 2.8025 . . .
59 2.8038 . . .
90 2.8075 . . .
119 2.8088 . . .
151 2.8025 . . .
181 2.8069 . . .
210 2.8089 . . .
243 2.8113 . . .
273 2.8106 . . .
304 2.8156 . . .
334 2.8106 . . .
364 2.8038 . . .
396 2.8044 . . .
424 2.7956 . . .
455 2.7975 . . .
483 2.7962 . . .
516 2.7919 . . .
546 2.7881 . . .
577 2.795 . . .
608 2.8056 . . .
637 2.8144 . . .
669 2.815 . . .
699 2.8125 . . .
728 2.8081 . . .
761 2.8119 . . .
789 2.815 . . .
819 2.8138 . . .
850 2.8125 . . .
881 2.81 . . .
910 2.8088 . . .
942 2.805 . . .
973 2.8019 . . .
1001 2.8012 . . .
1034 2.8012 . . .
1064 2.8031 . . .
1095 2.8025 . . .
1126 2.8044 . . .
1154 2.8025 . . .
1183 2.7995 . . .
1215 2.8006 . . .
1246 2.8002 . . .
1274 2.8009 . . .
1307 2.8007 . . .
1337 2.7979 . . .
1368 2.7977 . . .
1399 2.7981 . . .
1428 2.7973 . . .
1460 2.7966 . . .
1491 2.7978 . . .
1519 2.7982 . . .
1551 2.7984 . . .
1581 2.7999 . . .
1610 2.7979 . . .
1642 2.7917 . . .
1673 2.7882 . . .
1704 2.7839 . . .
1734 2.7833 . . .
1764 2.785 . . .
1795 2.7912 . . .
1826 2.7901 1.1118151 . .
1855 2.792 1.1120847 . .
1883 2.7941 1.1135865 . .
1916 2.7905 1.1125064 . .
1946 2.7991 1.1139809 . .
end
I am trying to transpose the data so that the data is sorted by date and there is an ID called 'country' for the column variables which are currently listed as country names. This is how the datastream currency data appears in wide form. I tried doing this using the reshape long command but nothing seems to work. I've also uploaded the raw dataset in csv form if one needs more info about the dataset.
Thank you in advance.
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(date uk australia brazil bulgaria)
-1065 2.7982 . . .
-1037 2.7975 . . .
-1008 2.7875 . . .
-976 2.7912 . . .
-945 2.7918 . . .
-917 2.7912 . . .
-884 2.785 . . .
-854 2.7832 . . .
-823 2.7956 . . .
-792 2.7968 . . .
-763 2.8062 . . .
-731 2.8594 . . .
-700 2.8162 . . .
-672 2.8162 . . .
-641 2.8162 . . .
-611 2.8182 . . .
-581 2.8168 . . .
-550 2.8032 . . .
-519 2.805 . . .
-490 2.8018 . . .
-458 2.8068 . . .
-427 2.8062 . . .
-399 2.8044 . . .
-366 2.8025 . . .
-336 2.8094 . . .
-308 2.8055 . . .
-276 2.8162 . . .
-246 2.8162 . . .
-217 2.8118 . . .
-185 2.8138 . . .
-154 2.8112 . . .
-123 2.8056 . . .
-93 2.8044 . . .
-63 2.8056 . . .
-32 2.8018 . . .
-1 2.8 . . .
28 2.8025 . . .
59 2.8038 . . .
90 2.8075 . . .
119 2.8088 . . .
151 2.8025 . . .
181 2.8069 . . .
210 2.8089 . . .
243 2.8113 . . .
273 2.8106 . . .
304 2.8156 . . .
334 2.8106 . . .
364 2.8038 . . .
396 2.8044 . . .
424 2.7956 . . .
455 2.7975 . . .
483 2.7962 . . .
516 2.7919 . . .
546 2.7881 . . .
577 2.795 . . .
608 2.8056 . . .
637 2.8144 . . .
669 2.815 . . .
699 2.8125 . . .
728 2.8081 . . .
761 2.8119 . . .
789 2.815 . . .
819 2.8138 . . .
850 2.8125 . . .
881 2.81 . . .
910 2.8088 . . .
942 2.805 . . .
973 2.8019 . . .
1001 2.8012 . . .
1034 2.8012 . . .
1064 2.8031 . . .
1095 2.8025 . . .
1126 2.8044 . . .
1154 2.8025 . . .
1183 2.7995 . . .
1215 2.8006 . . .
1246 2.8002 . . .
1274 2.8009 . . .
1307 2.8007 . . .
1337 2.7979 . . .
1368 2.7977 . . .
1399 2.7981 . . .
1428 2.7973 . . .
1460 2.7966 . . .
1491 2.7978 . . .
1519 2.7982 . . .
1551 2.7984 . . .
1581 2.7999 . . .
1610 2.7979 . . .
1642 2.7917 . . .
1673 2.7882 . . .
1704 2.7839 . . .
1734 2.7833 . . .
1764 2.785 . . .
1795 2.7912 . . .
1826 2.7901 1.1118151 . .
1855 2.792 1.1120847 . .
1883 2.7941 1.1135865 . .
1916 2.7905 1.1125064 . .
1946 2.7991 1.1139809 . .
end
I am trying to transpose the data so that the data is sorted by date and there is an ID called 'country' for the column variables which are currently listed as country names. This is how the datastream currency data appears in wide form. I tried doing this using the reshape long command but nothing seems to work. I've also uploaded the raw dataset in csv form if one needs more info about the dataset.
Thank you in advance.
Productivity effects
Dear Statalist users,
I am trying to examine the impact of export participation of the firm on firm productivity. To this end, I want to examine how the productivity of the firm changes one year from its participation in export market, two years after participation and so on. To understand if the effect of participating in the export market on its productivity dissipate or accelerate over time. I have an unbalanced panel for the period 2001-2013. TFP is my dependent variable and export status (t1) is my independent variable which is a dummy. I think I need to proceed with a psm-did but I am not able to understand its operation for my dataset.
If I could get some inputs on how to achieve this in stata would be much appreciated
A small sample of my data
I am trying to examine the impact of export participation of the firm on firm productivity. To this end, I want to examine how the productivity of the firm changes one year from its participation in export market, two years after participation and so on. To understand if the effect of participating in the export market on its productivity dissipate or accelerate over time. I have an unbalanced panel for the period 2001-2013. TFP is my dependent variable and export status (t1) is my independent variable which is a dummy. I think I need to proceed with a psm-did but I am not able to understand its operation for my dataset.
If I could get some inputs on how to achieve this in stata would be much appreciated
A small sample of my data
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double id float(year t1 ltfp) 365 2001 0 .39686665 365 2002 1 .3967393 365 2003 1 .6374343 365 2004 0 .6560543 365 2005 1 .6640503 365 2006 1 .7698439 365 2007 0 .8161478 365 2008 0 .87566 365 2009 0 1.1369226 365 2010 0 1.2058983 365 2011 0 1.0512627 365 2012 0 1.0222095 365 2013 0 .9116122 381 2001 1 .28473413 381 2002 1 .2790297 381 2003 1 .4228754 381 2004 1 .4589705 381 2005 1 .443521 381 2006 1 .6932002 381 2007 1 .51895165 381 2008 1 .4412919 381 2009 1 .4278684 381 2010 1 .4917918 569 2001 0 .4084495 569 2002 0 .32361555 569 2003 0 .312629 569 2004 0 .3549544 569 2005 0 .4214703 569 2006 0 .5034326 569 2007 0 .52623117 569 2008 0 .58975697 569 2009 0 .55094194 569 2010 0 .3773664 595 2006 0 .2977381 595 2007 0 .3644855 595 2008 0 .3293638 595 2009 0 .3146657 595 2010 0 .3185207 595 2011 0 .4280073 595 2012 0 .5122329 595 2013 0 .4275252 600 2009 0 .7721639 600 2010 0 .787946 600 2011 0 .6881346 600 2012 0 .6645668 783 2001 0 .729115 783 2002 0 .7449946 783 2003 0 .7614059 783 2004 0 .870962 783 2005 0 .8693857 783 2006 0 .8836148 783 2007 0 1.0283844 783 2008 0 1.0673563 783 2009 0 1.2047342 783 2010 0 1.2074325 783 2011 0 1.2030094 783 2012 0 1.209522 783 2013 0 1.193356 870 2010 0 .4383901 934 2004 0 .6618422 934 2005 0 .677335 934 2006 0 .698532 934 2007 0 .6574489 934 2008 0 .602454 934 2009 0 .9306454 1120 2001 1 1.0948371 1120 2002 1 1.1048775 1120 2003 1 1.2104284 1120 2004 1 1.1987357 1120 2005 1 1.3653517 1120 2006 1 1.479699 1120 2007 1 1.614259 1120 2008 1 1.8814026 1120 2009 1 2.1619618 1120 2010 1 1.977049 1120 2011 1 2.043859 1120 2012 1 2.134581 1120 2013 1 2.2729049 1621 2007 0 .3721843 1621 2008 0 .4280191 1621 2009 0 .435424 1621 2010 0 .4076315 1621 2011 0 .3898215 1621 2012 0 .4241333 1954 2009 0 .4050439 1954 2010 0 .41772455 2015 2004 0 .17348923 2015 2005 0 .25310284 2015 2006 0 .3803225 2015 2007 0 .4936106 2015 2008 0 .5817891 2015 2009 0 .4436422 2015 2010 0 .42046455 2015 2011 0 .51623964 2015 2012 0 .5415064 2015 2013 0 .4652313 2216 2001 1 .3285122 2216 2002 0 .3717326 2216 2003 0 .40993 2216 2004 0 .3402351 end
Univariate analysis
Hello,
So I want to apply this univariate test, subdividing my sample into high and low groups by the mean values of independent variables to report the differences in the dependent variable.
My question's How to subdivide a sample using such condition and what is the command in STATA to do so and apply the test.
My literature reference is Thanatawee, Y. (2019). ‘Foreign Institutional Ownership and Liquidity: Evidence from Thailand’. ABAC Journal, Vol.39, No.4, Forthcoming. pp. 1-21.
Thank you!
So I want to apply this univariate test, subdividing my sample into high and low groups by the mean values of independent variables to report the differences in the dependent variable.
My question's How to subdivide a sample using such condition and what is the command in STATA to do so and apply the test.
My literature reference is Thanatawee, Y. (2019). ‘Foreign Institutional Ownership and Liquidity: Evidence from Thailand’. ABAC Journal, Vol.39, No.4, Forthcoming. pp. 1-21.
Thank you!
Is it okay to use multiple imputation to create imputed values in a second dataset?
Say I have two datasets A & B. They have the same variables in each except that set A has one additional variable.
Would it be valid to impute the variable for dataset B, and then to only conduct analysis on dataset B?
For example:
Would it be valid to impute the variable for dataset B, and then to only conduct analysis on dataset B?
For example:
Code:
webuse lbw, clear set seed 12345 generate wave = round(uniform(),1) replace bwt = . if wave==1 //artificially creating missing data in half of dataset mi set mlong mi register imputed bwt mi impute chained (pmm, knn(5)) bwt = age lwt i.race, add(5) rseed(12345) mi estimate : glm age lwt i.race bwt if wave==1 //command runs, but is only conducted on part of sample where all data was missing originally
Using outreg2 to export Heckman results to LaTeX
Hi, I'm having problems using outreg2 to export the results from a Heckman model to LaTeX (same problem occurs when exporting to a .doc file). The following code produces a hypothetical Heckman model and exports the results to LaTex. For some reason, the 3rd column replicates the 2nd column. I would like to only keep column 1 and 2.
A similar problem, but no solution, was posted here: https://www.statalist.org/forums/for...ach-regression
clear
input yvar xvar1 xvar2
3 2 1
4 0 3
. 3 1
. 1 2
2 7 0
6 0 1
. 3 7
0 1 1
. 2 2
1 1 5
end
heckman yvar xvar1 xvar2, select (xvar1 xvar2) vce(robust) iterate(10)
outreg2 using "test_outreg2.tex", replace tex(pretty) keep(xvar1 xvar2)
A similar problem, but no solution, was posted here: https://www.statalist.org/forums/for...ach-regression
clear
input yvar xvar1 xvar2
3 2 1
4 0 3
. 3 1
. 1 2
2 7 0
6 0 1
. 3 7
0 1 1
. 2 2
1 1 5
end
heckman yvar xvar1 xvar2, select (xvar1 xvar2) vce(robust) iterate(10)
outreg2 using "test_outreg2.tex", replace tex(pretty) keep(xvar1 xvar2)
Creating Lead variables with respect to month (1, 3, 6) when I have daily data
hello.
I'm working on the U.S. stock market.
I have entire stock market data ( thousands of stocks). It's panel data( sorted by firms and dates).
My question is how I can create lead variables with respect to month when I have daily data. I should not collapse data since I have to use daily return later.
In detail, I have past 1 month return on each month of each stock. I should create future return which is dragged from the past return.
For example, If 1 month past return on Feb of firm A is 0.1, I have to make a new variable (future return) on Jan of firm A 0.1.
1 month past return of Feb 01, 02, 03, 04..(they are same). into 1 month past return of Jan 01, 02, 03, 04...(they are same)
I created 1 month mean past returns and they are same within the same month. For example, past_ret_1 of Jan 01is same as past_ret_1 of Jan 02, Jan 03, Jan 04...
So, I have to put past_ret_1 of Feb 01, Feb 02, Feb 03 which are same into past_ret_1 of Jan 01, 02, 03....
permno is the firm code
date is date
past_ret_1 is 1 month past return
mdate is month identifier.
My data looks like below and I add my dataex below too. I have to create B's from Bs (B and B' are same)
My dataex is
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
Listed 100 out of 5502429 observations
Use the count() option to list more
I'm working on the U.S. stock market.
I have entire stock market data ( thousands of stocks). It's panel data( sorted by firms and dates).
My question is how I can create lead variables with respect to month when I have daily data. I should not collapse data since I have to use daily return later.
In detail, I have past 1 month return on each month of each stock. I should create future return which is dragged from the past return.
For example, If 1 month past return on Feb of firm A is 0.1, I have to make a new variable (future return) on Jan of firm A 0.1.
1 month past return of Feb 01, 02, 03, 04..(they are same). into 1 month past return of Jan 01, 02, 03, 04...(they are same)
I created 1 month mean past returns and they are same within the same month. For example, past_ret_1 of Jan 01is same as past_ret_1 of Jan 02, Jan 03, Jan 04...
So, I have to put past_ret_1 of Feb 01, Feb 02, Feb 03 which are same into past_ret_1 of Jan 01, 02, 03....
permno is the firm code
date is date
past_ret_1 is 1 month past return
mdate is month identifier.
My data looks like below and I add my dataex below too. I have to create B's from Bs (B and B' are same)
| permno | date | mdate | past_ret_1 | future_ret1 | |
| 10001 | Jan 01 2019 | 2019m1 | A | B' | |
| 10001 | Jan 02 2019 | 2019m1 | A | B' | |
| 10001 | Jan 03 2019 | 2019m1 | A | B' | |
| ... | ... | ||||
| 10001 | Feb 01 2019 | 2019m2 | B | ||
| 10001 | Feb 02 2019 | 2019m2 | B | ||
| ... | ... | ... | |||
| 10002 | 2019m1 | 0.1 | |||
| 10002 | 2019m1 | 0.1 |
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double permno long date double prc int mdate float(past_ret_1 past_ret_3 past_ret_6) 10001 20457 7.519999980926514 672 . . . 10001 20458 7.420000076293945 672 . . . 10001 20459 7.53000020980835 672 . . . 10001 20460 7.639999866485596 672 . . . 10001 20461 7.789999961853027 672 . . . 10001 20464 8.0600004196167 672 . . . 10001 20465 8.25 672 . . . 10001 20466 8.25 672 . . . 10001 20467 8.130000114440918 672 . . . 10001 20468 7.989999771118164 672 . . . 10001 20472 8.119999885559082 672 . . . 10001 20473 8.079999923706055 672 . . . 10001 20474 8.100000381469727 672 . . . 10001 20475 8.220000267028809 672 . . . 10001 20478 8 672 . . . 10001 20479 8.100000381469727 672 . . . 10001 20480 8.199999809265137 672 . . . 10001 20481 8.199999809265137 672 . . . 10001 20482 8.319999694824219 672 . . . 10001 20485 8.420000076293945 673 .11968087 .11968087 .11968087 10001 20486 8.489899635314941 673 .11968087 .11968087 .11968087 10001 20487 8.260000228881836 673 .11968087 .11968087 .11968087 10001 20488 8 673 .11968087 .11968087 .11968087 10001 20489 8.34000015258789 673 .11968087 .11968087 .11968087 10001 20492 8.630000114440918 673 .11968087 .11968087 .11968087 10001 20493 8.210000038146973 673 .11968087 .11968087 .11968087 10001 20494 8.300000190734863 673 .11968087 .11968087 .11968087 10001 20495 8.050000190734863 673 .11968087 .11968087 .11968087 10001 20496 8.300000190734863 673 .11968087 .11968087 .11968087 10001 20500 8.25 673 .11968087 .11968087 .11968087 10001 20501 8.199999809265137 673 .11968087 .11968087 .11968087 10001 20502 8.100000381469727 673 .11968087 .11968087 .11968087 10001 20503 7.900000095367432 673 .11968087 .11968087 .11968087 10001 20506 7.860000133514404 673 .11968087 .11968087 .11968087 10001 20507 7.940000057220459 673 .11968087 .11968087 .11968087 10001 20508 7.880000114440918 673 .11968087 .11968087 .11968087 10001 20509 7.940000057220459 673 .11968087 .11968087 .11968087 10001 20510 7.840000152587891 673 .11968087 .11968087 .11968087 10001 20513 7.860000133514404 673 .11968087 .11968087 .11968087 10001 20514 8.300000190734863 674 -.014251768 .10372343 .10372343 10001 20515 8.15999984741211 674 -.014251768 .10372343 .10372343 10001 20516 8.199999809265137 674 -.014251768 .10372343 .10372343 10001 20517 8.199999809265137 674 -.014251768 .10372343 .10372343 10001 20520 8.1899995803833 674 -.014251768 .10372343 .10372343 10001 20521 8.079999923706055 674 -.014251768 .10372343 .10372343 10001 20522 8.029999732971191 674 -.014251768 .10372343 .10372343 10001 20523 8.119999885559082 674 -.014251768 .10372343 .10372343 10001 20524 8 674 -.014251768 .10372343 .10372343 10001 20527 7.980000019073486 674 -.014251768 .10372343 .10372343 10001 20528 7.900000095367432 674 -.014251768 .10372343 .10372343 10001 20529 8.069999694824219 674 -.014251768 .10372343 .10372343 10001 20530 8.119999885559082 674 -.014251768 .10372343 .10372343 10001 20531 8.199999809265137 674 -.014251768 .10372343 .10372343 10001 20534 8.199999809265137 674 -.014251768 .10372343 .10372343 10001 20535 8.199999809265137 674 -.014251768 .10372343 .10372343 10001 20536 8.239999771118164 674 -.014251768 .10372343 .10372343 10001 20537 8.079999923706055 674 -.014251768 .10372343 .10372343 10001 20541 7.900000095367432 674 -.014251768 .10372343 .10372343 10001 20542 7.889999866485596 674 -.014251768 .10372343 .10372343 10001 20543 7.889999866485596 674 -.014251768 .10372343 .10372343 10001 20544 7.809999942779541 674 -.014251768 .10372343 .10372343 10001 20545 7.800000190734863 675 -.06024096 .03723407 .03723407 10001 20548 7.769999980926514 675 -.06024096 .03723407 .03723407 10001 20549 7.230000019073486 675 -.06024096 .03723407 .03723407 10001 20550 7.079999923706055 675 -.06024096 .03723407 .03723407 10001 20551 7.369999885559082 675 -.06024096 .03723407 .03723407 10001 20552 7.579999923706055 675 -.06024096 .03723407 .03723407 10001 20555 7.769999980926514 675 -.06024096 .03723407 .03723407 10001 20556 7.869999885559082 675 -.06024096 .03723407 .03723407 10001 20557 7.800000190734863 675 -.06024096 .03723407 .03723407 10001 20558 7.599999904632568 675 -.06024096 .03723407 .03723407 10001 20559 7.449999809265137 675 -.06024096 .03723407 .03723407 10001 20562 7.21999979019165 675 -.06024096 .03723407 .03723407 10001 20563 7.309999942779541 675 -.06024096 .03723407 .03723407 10001 20564 7.179999828338623 675 -.06024096 .03723407 .03723407 10001 20565 7 675 -.06024096 .03723407 .03723407 10001 20566 7.179999828338623 675 -.06024096 .03723407 .03723407 10001 20569 7.199999809265137 675 -.06024096 .03723407 .03723407 10001 20570 7.349999904632568 675 -.06024096 .03723407 .03723407 10001 20571 7.28000020980835 675 -.06024096 .03723407 .03723407 10001 20572 7.329999923706055 675 -.06024096 .03723407 .03723407 10001 20573 7.300000190734863 675 -.06024096 .03723407 .03723407 10001 20576 6.909999847412109 676 -.1141026 -.17933494 -.08111704 10001 20577 7.039999961853027 676 -.1141026 -.17933494 -.08111704 10001 20578 7.039999961853027 676 -.1141026 -.17933494 -.08111704 10001 20579 7.014999866485596 676 -.1141026 -.17933494 -.08111704 10001 20580 6.880000114440918 676 -.1141026 -.17933494 -.08111704 10001 20583 6.889999866485596 676 -.1141026 -.17933494 -.08111704 10001 20584 6.96999979019165 676 -.1141026 -.17933494 -.08111704 10001 20585 7.050000190734863 676 -.1141026 -.17933494 -.08111704 10001 20586 7.099999904632568 676 -.1141026 -.17933494 -.08111704 10001 20587 7.150000095367432 676 -.1141026 -.17933494 -.08111704 10001 20590 7.119999885559082 676 -.1141026 -.17933494 -.08111704 10001 20591 7.139999866485596 676 -.1141026 -.17933494 -.08111704 10001 20592 7.059999942779541 676 -.1141026 -.17933494 -.08111704 10001 20593 6.900000095367432 676 -.1141026 -.17933494 -.08111704 10001 20594 6.880000114440918 676 -.1141026 -.17933494 -.08111704 10001 20597 6.900000095367432 676 -.1141026 -.17933494 -.08111704 10001 20598 6.900000095367432 676 -.1141026 -.17933494 -.08111704 10001 20599 6.949999809265137 676 -.1141026 -.17933494 -.08111704 end format %d date format %tm mdate
Listed 100 out of 5502429 observations
Use the count() option to list more