Hello world of stats and analysis,
I have dibbled a bit in STATA over my university years but my work has required me to become much more engaged with the software.
I am currently working on a project (using STATA 13.0) trying to determine seasonal sales patterns to implement a sales sampling strategy across multiple provinces in Cambodia.
I have a large dataset with 24 variables and 120,000+ observations. My first step is to manipulate the current dataset to remake sales years based on seasonal patterns in Cambodia. It happens that a sales year makes much more sense from November 1st - October 31st (the following year). The dataset has sales records of every sale done by any supplier based on date (SUPPLIER_ID / DAY / MONTH/ YEAR / SALE_AMOUNT).
I am having problems writing code that would systematically eliminate any sales before November the year that particular supplier joined our records. For example supplier 43 joined in 01/04/2014 and has sales until 01/09/2017. I would like to delete all sales previous to 01/11/2014 for supplier 43, and then do that for all suppliers in my dataset.
I have been fiddling with the bysort function: bysort SPID year month: generate y=1 then replace y=. if SPID==SPID[_n-1] & month>=11. I feel like im close but just missing the code to tell STATA to identify and replace only the first year they joined so that later I can write a code to eliminate all 'missing values'
I hope all that makes sense,
Thank you all for your help!
Related Posts with Problem manipulating large database based on conditional statements to eliminate specific observations
splitting Chinese addresses?Dear All, Suppose that I have the following addresses (in Chinese), Code: * Example generated by -d…
Concentration Index (industries within one sector per country)Hello, I have a set of industries within one sector and I would like to calculate the degree of con…
countingHello, I have a database in Stata and I need to count the number of nodules in each patient. The fir…
In Stata 17.0 Kaplan-Meier graph with log scale in xaxis impossible due to start at zero, why?Hi, I have tried to start Kaplan-Meier plot with logscale at time -axis using several forms of stset…
replacing variable value via a loop conditioning on the contents of other variablesHello, I have dataset which has 14 variables relating ethnic group: eth1 eth2 eth3...eth14. The va…
Subscribe to:
Post Comments (Atom)
0 Response to Problem manipulating large database based on conditional statements to eliminate specific observations
Post a Comment