I'm running Stata 15.1 on OSX. I've created a repeated cross-sectional dataset to assess variations in attitudes towards crime (the dependent variable). My time variable essentially stores the month and year in which each survey (31 in total) was conducted. For my independent variable, I created and merged (using the time variable) a newspaper issue-salience index that stores the percent of monthly New York Times articles that refer to crime-related issues. My expectation is that in months in which crime is salient in the media, we will see an increase in the percent of respondents saying crime 'is a serious issue'. To prepare the dataset for analysis, I created (using 'collapse') a variable that stores the mean percent of respondents that gave the 'serious issue' response in each survey (i.e. by year/month). I did the same with the salience index variable (separate dataset) and merged it into the collapsed survey dataset. I ran a simple Pearson correlation between the index and the survey response variable and uncovered a strong relationship (r=0.87). However, a colleague of mine who saw the resulting graph warned me that I 'shouldn't correlate time series with strong autocorrelation' and that, instead, I should 'create first-order difference sequences and correlate those'. I'm not quite sure how to go about doing this. The dataset has no panel ID, so I tried creating one:
Code:
gen id=_n
I then entered the following:
Code:
xtset id year
Code:
gen indexdiff=D.index
What am I doing wrong here and how do I get it right? Thanks for your time!
Sample data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(index crime_serious yearmonth year id) 1.0898919 37.634632 441 1996 1 1.141105 41.15658 449 1997 2 .9819449 31.580814 453 1997 3 1.1344688 35.43958 475 1999 4 1.2987403 39.7779 487 2000 5 1.1022217 39.37875 488 2000 6 1.045117 32.872364 521 2003 7 .7872596 35.538055 522 2003 8 .8885489 38.24273 523 2003 9 .9927688 35.79262 524 2003 10 .7067459 39.30157 539 2004 11 1.0929303 36.767914 548 2005 12 1.0707874 25.04893 572 2007 13 1.0773966 34.76981 573 2007 14 1.0685753 29.70381 576 2008 15 1.118886 27.0324 580 2008 16 .9239349 31.63132 584 2008 17 .7300239 23.623867 597 2009 18 .7975035 28.98842 598 2009 19 1.1477937 34.304623 613 2011 20 1.0149189 38.20615 614 2011 21 1.1804827 34.5046 624 2012 22 1.3056893 39.55238 648 2014 23 1.2751036 41.03848 649 2014 24 1.369863 42.47158 650 2014 25 1.8246716 52.22675 662 2015 26 2.096708 48.12559 667 2015 27 1.6774454 47.23487 668 2015 28 1.5856438 42.08379 669 2015 29 2.575059 57.32762 686 2017 30 2.7088645 64.2695 689 2017 31 end format %tm yearmonth
0 Response to Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data
Post a Comment