Help needed: Automate extracting of data

Hi all,

I am using Stata 16 and I need some help in automating the process of extracting data. I am trying to do charts of unemployment data for OECD countries. I am basically trying to find the difference between unemployment data at a given quarter and the unemployment data of a recession. Below is a sample of my dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str3 location str7 time str6 recession_date byte pt_output float(spell quarters_to_start unemptime new_variable recession1 unemp_data)
  1 "AUS" "1966-Q3" "1971Q3" . 0  20 -29   .             .      1.8
  2 "AUS" "1966-Q4" "1975Q2" . 0  20 -28   .             .      1.7
  3 "AUS" "1967-Q1" "1977Q2" . 0  20 -27   .             .      1.8
  4 "AUS" "1967-Q2" "1981Q3" . 0  20 -26   .             .        2
  5 "AUS" "1967-Q3" "1990Q4" . 0  20 -25   .             .      1.9
  6 "AUS" "1967-Q4" ""       . 0  20 -24   .             .      1.8
  7 "AUS" "1968-Q1" ""       . 0  20 -23   .             .      1.9
  8 "AUS" "1968-Q2" ""       . 0  20 -22   .             .      1.9
  9 "AUS" "1968-Q3" ""       . 0  20 -21   .             .      1.8
 10 "AUS" "1968-Q4" ""       . 0  20 -20 1.8    -.10000005      1.7
 11 "AUS" "1969-Q1" ""       . 0  20 -19 1.7    -.19999996      1.8
 12 "AUS" "1969-Q2" ""       . 0  20 -18 1.8    -.10000005      1.8
 13 "AUS" "1969-Q3" ""       . 0  20 -17   2            .1      1.7
 14 "AUS" "1969-Q4" ""       . 0  20 -16 1.9 -2.384186e-08      1.9
 15 "AUS" "1970-Q1" ""       . 0  20 -15 1.8    -.10000005      1.6
 16 "AUS" "1970-Q2" ""       . 0  20 -14 1.9 -2.384186e-08      1.7
 17 "AUS" "1970-Q3" ""       . 0  20 -13 1.9 -2.384186e-08      1.7
 18 "AUS" "1970-Q4" ""       . 0  20 -12 1.8    -.10000005      1.6
 19 "AUS" "1971-Q1" ""       . 0  20 -11 1.7    -.19999996      1.7
 20 "AUS" "1971-Q2" ""       . 0  20 -10 1.8    -.10000005      1.9
 21 "AUS" "1971-Q3" ""       1 1  15  -9 1.8    -.10000005      1.9
 22 "AUS" "1971-Q4" ""       . 1  15  -8 1.7    -.19999996      2.2
 23 "AUS" "1972-Q1" ""       2 1  15  -7 1.9 -2.384186e-08      2.4
 24 "AUS" "1972-Q2" ""       . 1  15  -6 1.6           -.3      2.4
 25 "AUS" "1972-Q3" ""       . 1  15  -5 1.7    -.19999996      2.9
 26 "AUS" "1972-Q4" ""       . 1  15  -4 1.7    -.19999996      2.8
 27 "AUS" "1973-Q1" ""       . 1  15  -3 1.6           -.3      2.6
 28 "AUS" "1973-Q2" ""       . 1  15  -2 1.7    -.19999996      2.3
 29 "AUS" "1973-Q3" ""       . 1  15  -1 1.9 -2.384186e-08      2.1
 30 "AUS" "1973-Q4" ""       . 1  15   0 1.9 -2.384186e-08      2.1
 31 "AUS" "1974-Q1" ""       . 1  15   1 2.2            .3      2.1
 32 "AUS" "1974-Q2" ""       . 1  15   2 2.4      .5000001      2.1
 33 "AUS" "1974-Q3" ""       . 1  15   3 2.4      .5000001      2.7
 34 "AUS" "1974-Q4" ""       . 1  15   4 2.9     1.0000001        4
 35 "AUS" "1975-Q1" ""       . 1  15   5 2.8            .9      4.8
 36 "AUS" "1975-Q2" ""       1 2   8   6 2.6      .6999999      4.8
 37 "AUS" "1975-Q3" ""       . 2   8   7 2.3     .39999995      4.7
 38 "AUS" "1975-Q4" ""       2 2   8   8 2.1      .1999999      5.4
 39 "AUS" "1976-Q1" ""       . 2   8   9 2.1      .1999999      4.9
 40 "AUS" "1976-Q2" ""       . 2   8  10 2.1      .1999999      4.5
 41 "AUS" "1976-Q3" ""       . 2   8  11 2.1      .1999999      4.8
 42 "AUS" "1976-Q4" ""       . 2   8  12 2.7      .8000001      4.9
end

The variable "pt_output" ==1 when the date is a recession date. As you can see from the variable "quarters-to-start", different recessions have different number of quarters before the recession and as such the data is inputted into different rows for different recessions. For example, recession1 has 20 quarters before the start of the recession quarter(1971Q3) and the second recession date(1975Q2) has 15 quarters before the start of the recession. The variable "quarters-to-start" shows the number of quarters before a recession date.

The main interest variables are unemptime, recession1. What I intend to create is a time series chart that shows the differences in unemployment rates between a given quarter and a recession. So when unemptime = 0, this means that the quarter is the recession quarter and the difference would be zero.

I need to extract unemployment data of the recession dates. For example, extract 1.9% from 1971-Q3 and 4.8% from 1975-Q2. My intention is to create columns of such data. For example, a column called "recession1" that contains 1.9% on all rows and "recession2" that contains 4.8% on all rows. The purpose of such columns is to find the difference between unemployment rates.

I am unsure of any such commands to automate the process. Any help in this area would be appreciated.

As for the extraction of data in the first recession, I used the following code:

Code:

gen new_variable = unemp_data[_n-9]  in 1/42

This produces unemployment data that starts from unemptime= -20 and ends up on the 42th row. But as you can see from the code, there is still some form of "eyeballing" of data to determine which row the data starts and ends.

For the second recession date, I am not sure which command to use in order to extract data from row 21 to 42; so data would start from row 15 in new column "recession2" but gets unemployment data from row 21(1971Q3 onwards).

I would appreciate greatly any help given. Thanks!

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Help needed: Automate extracting of data
Help needed: Automate extracting of data

0 Response to Help needed: Automate extracting of data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Help needed: Automate extracting of data Help needed: Automate extracting of data

Related Posts with Help needed: Automate extracting of data

0 Response to Help needed: Automate extracting of data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Help needed: Automate extracting of data
Help needed: Automate extracting of data