Essentially, I have data that looks like this:
rowid | date | earnings_date | stock_ticker |
1 | 18001 | 18002 | AAPL |
2 | 18002 | . | AAPL |
3 | 18003 | . | AAPL |
4 | 18001 | 18003 | MSFT |
5 | 18002 | . | MSFT |
6 | 18003 | . | MSFT |
7 | 18001 | . | TSLA |
8 | 18002 | 18001 | TSLA |
9 | 18003 | . | TSLA |
The "date" variable covers every working day of the year, sequentially, in Stata format (number of days since January 1, 1960.)
The earnings_date column is messy - it contains a list of dates when various companies announced earnings. It has some duplicates and a great deal of missing values, since there are far fewer earnings dates than days in a year. All earnings_dates, however, are in line with the appropriate company ticker, something like this:
date | earnings_date | stock_ticker |
18001 | 18002 | AAPL |
18002 | 18032 | AAPL |
18003 | 18097 | AAPL |
18004 | . | AAPL |
18005 | 18097 | AAPL |
18006 | . | AAPL |
18007 | . | AAPL |
18008 | . | AAPL |
18009 | . | AAPL |
I would just like to add a variable ("categ") that's 1 when a date is an earnings date and 0 otherwise:
rowid | date | earnings_date | stock_ticker | categ |
1 | 18001 | 18002 | AAPL | 0 |
2 | 18002 | . | AAPL | 1 |
3 | 18003 | . | AAPL | 0 |
4 | 18001 | 18003 | MSFT | 0 |
5 | 18002 | . | MSFT | 0 |
6 | 18003 | . | MSFT | 1 |
7 | 18001 | . | TSLA | 1 |
8 | 18002 | 18001 | TSLA | 0 |
9 | 18003 | . | TSLA | 0 |
In other words, my code needs to compare each value of "date" with all the values of "earnings_date" for a particular stock ticker.
I've looked for inspiration in various examples but I only got to the point where I can compare each value of "date" with the inline value of "earnings_date":
Code:
gen categ=. forvalues f = 1/`=_N' { quietly sum date if rowid==`f', meanonly local testvalue = r(min) quietly egen testvariable = anymatch(earnings_date), values(`testvalue') quietly replace categ = testvariable if rowid==`f' drop testvariable }
I do understand what this does. It takes the value of "date", stores it temporarily, then uses egen's anymatch to compare it to "earnings_date". This works.
I just don't know how to compare the stored "testvalue" to ALL values of earnings_date corresponding to EACH ticker (not just with the inline value of earnings_date). I've tinkered for hours and everything failed.
If you have some ideas, I would greatly appreciate hearing from you. Many thanks in advance!
0 Response to Trying to check if a value in one variable is equal to any value in a certain range in another variable
Post a Comment