I have been working on a dataset in Stata for the past month or so, but have recently run into a couple issues that go far beyond my current skills. I am using Stata version 14.0.
Some quick background information: I've been working with a fairly large dataset (~70k obs) that describes contracts between car manufacturers and parts suppliers. The components range from tires to alternators to carpets, but are unfortunately not uniquely described (eg descriptions for tires can be "tires" "tyres" "tires: all four" "tires: performance" "tyre: 225/40R18" ....).
What I've done is selected the observations that all describe the same component and have done some data cleaning such as dropping duplicates etc.
The end result is as follows:
comp_id | regcode | parentgr | oembrand | uniq_modelname | launchyear | suppliername |
tires | WEU | GM | Opel | WEU Astra | 1998 | Continental |
tires | WEU | GM | Opel | WEU Astra | 2004 | Bridgestone |
tires | WEU | GM | Opel | WEU Astra | 2009 | Bridgestone |
tires | WEU | VW Group | Porsche | WEU Boxster | 1996 | Sumitomo |
tires | WEU | VW Group | Porsche | WEU Boxster | 1996 | Continental |
tires | WEU | VW Group | Porsche | WEU Boxster | 1996 | Bridgestone |
tires | WEU | VW Group | Porsche | WEU Boxster | 2004 | Bridgestone |
tires | WEU | VW Group | Porsche | WEU Boxster | 2012 | Pirelli |
tires | NAF | Toyota | Toyota | NAF Camry | 2001 | Goodyear |
tires | ... | ... | ... | ... | ... | ... |
I also have or I am planning to add other descriptive information in extra columns such as 'total # of cars sold' or a binary 'big refresh YN' , ...
A couple observations:
- The variable 'uniq_modelname' uniquely defines the car model and is a concatenated string of regcode + an earlier 'modelname' variable. The reasoning being that I sometimes have the same model in multiple regions, but for the sake of this application, I treat these as separate models.
- The variable 'launchyear' denotes the different modelyears of each car model, and may later be replaced by a 'generation proxy' variable that simply counts the number of generations (1 2 3 4 ..). I have not done this yet as I don't have full information for each component: for instance you can see in the table above I am missing information for the WEU Boxster from 2008, and so renaming the three generations I have to 1 2 and 3 would not be correct.
- I sometimes have multiple suppliers per launchyear, as can be seen by the 1996 generation of the WEU Boxster. I would like to reshape this into 'supplier1' 'supplier2' and 'supplier3', but am unsure when or how to do this. I will come back to this in my first main question.
comp_id | regcode | parentgr | oembrand | uniq_modelname | launchyear | genproxy | supplier1 | supplier2 | supplier3 |
tires | WEU | VW Group | Porsche | WEU Boxster | 1996 | 1 | Sumitomo | Continental | Bridgestone |
tires | WEU | VW Group | Porsche | WEU Boxster | 2004 | 2 | Bridgestone | ||
tires | WEU | VW Group | Porsche | WEU Boxster | 2008 | 3 | |||
tires | WEU | VW Group | Porsche | WEU Boxster | 2012 | 4 | Pirelli | ||
alternator | WEU | VW Group | Porsche | WEU Boxster | 1996 | 1 | |||
alternator | WEU | VW Group | Porsche | WEU Boxster | 2004 | 2 | Bosch | Denso | |
alternator | WEU | VW Group | Porsche | WEU Boxster | 2008 | 3 | |||
alternator | WEU | VW Group | Porsche | WEU Boxster | 2012 | 4 | Denso | ||
.. | .. | .. | .. | .. | .. | .. | .. | .. | .. |
tires | NAF | Toyota | Toyota | NAF Camry | 2001 | 1 | Goodyear | ||
tires | NAF | Toyota | Toyota | NAF Camry | 2006 | 2 | |||
tires | NAF | Toyota | Toyota | NAF Camry | 2010 | 3 | Goodyear | Bridgestone | |
alternator | NAF | Toyota | Toyota | NAF Camry | 2001 | 1 | Denso | ||
.. | .. | .. | .. | .. | .. | .. | .. | .. | .. |
As such my two main questions are the following:
- I want to reshape the data, but I am unable to do so when it is not consistently the same size in terms of i(i) and j(j). I want to reshape it into wide format, but for each launchyear I can have anywhere from 0 to 5 suppliers. Every model can also have between 3 and 6 generations. Both of these prevent me from using the reshape command correctly.
>To help with the amount of suppliers each generation has, I wanted to add 'extra' observations where the suppliername variable is empty, to ensure that each launchyear has exactly 5 observations, empty or otherwise.
This would mean the 2008 'WEU Boxster' + 'tires' generation would get 5 observations added where the suppliername is empty, while the 2004 generation would get 4 obs added (since it already has 1 non-empty observation). However, I am unable to code this correctly to ensure each generation gets exactly the required number of empty observations added.
>To combat the issue of the different number of generations per modelname, I honestly have no clue. I wanted to use bys uniq_modelname reshape to do this for each distinct number of generations per modelname, but bysort is not allowed with reshape. - After reshaping the dataset, I want to impute the suppliers for a certain set of observations. I assume that if there is no information regarding a certain component, then that means that the suppliers from the previous generation stay the same (for instance, long term multi-generational contracts). If, however, this empty observation is in fact the first generation, then obviously there is nothing to impute as I don't have the information from the previous generation.
From the tabel above I have included in red the instances of the suppliers that I wish to impute. As you can see I wish to repeat the suppliers from the previous generation (no matter if this means imputing 1 or more suppliers, or if this information is itself imputed from the generation before it).
The 1996 'WEU Boxster' + 'tires' observation is the first in the dataset and has no information regarding the suppliers of this part, so this needs to remain empty as I cannot know this information.
In a sense I want to only ""impute downwards""
comp_id | regcode | parentgr | oembrand | uniq_modelname | launchyear | genproxy | supplier1 | supplier2 | supplier3 |
tires | WEU | VW Group | Porsche | WEU Boxster | 1996 | 1 | Sumitomo | Continental | Bridgestone |
tires | WEU | VW Group | Porsche | WEU Boxster | 2004 | 2 | Bridgestone | ||
tires | WEU | VW Group | Porsche | WEU Boxster | 2008 | 3 | (Bridgestone) | ||
tires | WEU | VW Group | Porsche | WEU Boxster | 2012 | 4 | Pirelli | ||
alternator | WEU | VW Group | Porsche | WEU Boxster | 1996 | 1 | (empty) | (empty) | (empty) |
alternator | WEU | VW Group | Porsche | WEU Boxster | 2004 | 2 | Bosch | Denso | |
alternator | WEU | VW Group | Porsche | WEU Boxster | 2008 | 3 | (Bosch) | (Denso) | |
alternator | WEU | VW Group | Porsche | WEU Boxster | 2012 | 4 | (Bosch) | (Denso) | |
.. | .. | .. | .. | .. | .. | .. | .. | .. | .. |
tires | NAF | Toyota | Toyota | NAF Camry | 2001 | 1 | Goodyear | ||
tires | NAF | Toyota | Toyota | NAF Camry | 2006 | 2 | (Goodyear) | ||
tires | NAF | Toyota | Toyota | NAF Camry | 2010 | 3 | Goodyear | Bridgestone | |
alternator | NAF | Toyota | Toyota | NAF Camry | 2001 | 1 | Denso | ||
.. | .. | .. | .. | .. | .. | .. | .. | .. | .. |
To anyone who has taken the time to read this very long post, THANK YOU. If anyone could help me solve these issues, I would be forever grateful, as this would save me days if not weeks of struggling with the Stata syntax.
Kind regards,
Tom
0 Response to Readying My Data for Panelwise Analysis
Post a Comment