Identifying matched observations in a large database

Dear all,

This is relevant to the question I asked some days ago and can be found here:

I have encountered some difficulties due to the size of my database. Hence, I am starting a new thread regarding that topic.

I will start with the following example. Assume that we have a database with the following variables: id, year, treated, x1, and x2.

Treated is an indicator and informs us that a firm had an event at a specific year. My aim is to to construct a new variable that shows me the id of matched firms based on specific criteria that I name below.

There are three requirments that need to hold in order to match a firm.

1. If a company is treated at time t, the matched company must not be treated in years (-5, +5) from the event year of the treated firm. For example, if a treated firm had an event in 1990, the matched company must not be treated in the period [1985, 1995].

2. Create a variable z that is the ratio of x1 of the untreated firm to the value of x1 of the treated firm one year before the event. Then, based on z, the matched company should have values of z in the range [0.70 to 1.3]. When we compare z, we care about the values one year before the event.

3. The treated company's x2 one year before the event is closest to the x2 of the matched company one year before the event. So, for two companies that qualify for points 1 and 2, we choose as a match the company with the closest x2 to our treated company one year before the event.

In my previous thread, a solution was provided using the rangejoin command. Alas, although this works for this small sample, there are problems for large databases. In my case, there are more than 5,000 unique ids, 30+ years, and more than 20,000 events.

The problem arises with the rangejoin command and I get a Op. sys. refuses to provide memory error.
Is there a way to do the comparisons without the rangejoin command?

Thank you for your time.

A simplistic example is the following:

Code:

input id    year    treated    x1    x2
1    1980    0    157    18
1    1981    0    165    6
1    1982    0    28    4
1    1983    0    167    13
1    1984    0    77    8
1    1985    0    25    3
1    1986    0    156    11
1    1987    0    159    2
1    1988    0    101    2
1    1989    0    128    2
1    1990    1    44    15
1    1991    0    161    15
1    1992    0    195    20
1    1993    0    10    19
1    1994    0    34    7
1    1995    0    88    4
1    1996    0    83    12
1    1997    0    191    7
1    1998    0    180    20
1    1999    0    96    7
1    2000    0    177    10
2    1980    0    119    20
2    1981    0    71    18
2    1982    0    192    15
2    1983    0    142    4
2    1984    0    187    12
2    1985    0    41    8
2    1986    0    68    17
2    1987    0    74    20
2    1988    0    192    3
2    1989    0    178    7
2    1990    0    159    6
2    1991    0    43    16
2    1992    1    146    20
2    1993    0    103    2
2    1994    0    169    19
2    1995    0    99    11
2    1996    0    194    9
2    1997    0    48    19
2    1998    0    103    4
2    1999    0    106    18
2    2000    0    65    11
3    1980    0    15    8
3    1981    0    188    7
3    1982    0    138    7
3    1983    0    85    18
3    1984    1    114    7
3    1985    0    50    17
3    1986    0    122    1
3    1987    0    180    13
3    1988    0    67    4
3    1989    0    48    13
3    1990    0    63    18
3    1991    0    23    11
3    1992    0    36    4
3    1993    0    95    16
3    1994    0    188    18
3    1995    0    31    18
3    1996    1    22    3
3    1997    0    10    1
3    1998    1    11    1
3    1999    0    60    7
3    2000    0    25    7
4    1980    0    112    18
4    1981    0    197    6
4    1982    0    96    2
4    1983    0    165    2
4    1984    0    64    4
4    1985    0    71    19
4    1986    0    15    17
4    1987    0    158    7
4    1988    0    140    11
4    1989    0    156    3
4    1990    0    38    14
4    1991    0    33    14
4    1992    0    17    10
4    1993    0    183    1
4    1994    0    111    20
4    1995    0    69    12
4    1996    0    127    15
4    1997    0    24    12
4    1998    0    102    2
4    1999    0    130    17
4    2000    0    130    15
end

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Identifying matched observations in a large database
Identifying matched observations in a large database

0 Response to Identifying matched observations in a large database

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Identifying matched observations in a large database Identifying matched observations in a large database

Related Posts with Identifying matched observations in a large database

0 Response to Identifying matched observations in a large database

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Identifying matched observations in a large database
Identifying matched observations in a large database