Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data

Greetings,

I'm running Stata 15.1 on OSX. I've created a repeated cross-sectional dataset to assess variations in attitudes towards crime (the dependent variable). My time variable essentially stores the month and year in which each survey (31 in total) was conducted. For my independent variable, I created and merged (using the time variable) a newspaper issue-salience index that stores the percent of monthly New York Times articles that refer to crime-related issues. My expectation is that in months in which crime is salient in the media, we will see an increase in the percent of respondents saying crime 'is a serious issue'. To prepare the dataset for analysis, I created (using 'collapse') a variable that stores the mean percent of respondents that gave the 'serious issue' response in each survey (i.e. by year/month). I did the same with the salience index variable (separate dataset) and merged it into the collapsed survey dataset. I ran a simple Pearson correlation between the index and the survey response variable and uncovered a strong relationship (r=0.87). However, a colleague of mine who saw the resulting graph warned me that I 'shouldn't correlate time series with strong autocorrelation' and that, instead, I should 'create first-order difference sequences and correlate those'. I'm not quite sure how to go about doing this. The dataset has no panel ID, so I tried creating one:

Code:

gen id=_n

(note that the id is then simply a number assigned to each survey--31 in total)

I then entered the following:

Code:

xtset id year

To create the 'first difference' variable I tried the following:

Code:

gen indexdiff=D.index

However, Stata subsequently gave me the '31 missing values generation' message.

What am I doing wrong here and how do I get it right? Thanks for your time!

Sample data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(index crime_serious yearmonth year id)
1.0898919 37.634632 441 1996  1
 1.141105  41.15658 449 1997  2
 .9819449 31.580814 453 1997  3
1.1344688  35.43958 475 1999  4
1.2987403   39.7779 487 2000  5
1.1022217  39.37875 488 2000  6
 1.045117 32.872364 521 2003  7
 .7872596 35.538055 522 2003  8
 .8885489  38.24273 523 2003  9
 .9927688  35.79262 524 2003 10
 .7067459  39.30157 539 2004 11
1.0929303 36.767914 548 2005 12
1.0707874  25.04893 572 2007 13
1.0773966  34.76981 573 2007 14
1.0685753  29.70381 576 2008 15
 1.118886   27.0324 580 2008 16
 .9239349  31.63132 584 2008 17
 .7300239 23.623867 597 2009 18
 .7975035  28.98842 598 2009 19
1.1477937 34.304623 613 2011 20
1.0149189  38.20615 614 2011 21
1.1804827   34.5046 624 2012 22
1.3056893  39.55238 648 2014 23
1.2751036  41.03848 649 2014 24
 1.369863  42.47158 650 2014 25
1.8246716  52.22675 662 2015 26
 2.096708  48.12559 667 2015 27
1.6774454  47.23487 668 2015 28
1.5856438  42.08379 669 2015 29
 2.575059  57.32762 686 2017 30
2.7088645   64.2695 689 2017 31
end
format %tm yearmonth

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data
Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data

0 Response to Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data

Related Posts with Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data

0 Response to Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data
Creating first-order difference variables in repeated cross-sectional (i.e. pseudo panel) data