I am in the process of cleaning data on historical greenhouse gas emissions. Since the data came formatted with a separate row for each gas and each sector for a given country and year there are multiple lines for a single country and year (i.e. many duplicates). I created variables for each sector and gas combinations. Eventually, I hope to have one line for each year by country (i.e. no duplicates) with these variables of each sector/gas combo.
The problem is that I'm stuck halfway in cleaning. My data looks like this:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str41 country int year float PIK_CH4_Agr "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 7.60e-07 "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1850 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 8.075e-07 "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1851 . "Aruba" 1852 8.625e-07 "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1852 . "Aruba" 1853 9.275e-07 "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1853 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 1.0025e-06 "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1854 . "Aruba" 1855 . "Aruba" 1855 . "Aruba" 1855 . "Aruba" 1855 . "Aruba" 1855 . end
My problem is filling out the data. There are a number of commands to fill down values including carryforward. But I can't fill backward. I tried Nick Cox's simple trick to use gsort -year and then carryforward but suffice to say this doesn't work. I am also unable to use anything with tsset because I have so many duplicates.
I am also limited in do this:
Code:
bysort ISO year : replace PIK_CH4_Agr = PIK_CH4_Agr[_n-1] if missing(PIK_CH4_Agr)
One solution I came up with was to 1.) identify the first value where the value is not missing, 2.) identify the first occurance and 3.) replace the first occurrence with the first non-missing value 4.) from there I can carryforward. I'm stuck on #3.
My code so far is:
Code:
bysort country year: gen first_PIK_CH4_Agr = sum(PIK_CH4_Agr !=. ) == 1 bysort country year: carryforward PIK_CH4_Agr, replace bysort country year: gen start_PIK_CH4_Agr = sum(PIK_CH4_Agr==. ) == 1
0 Response to Filling in Values in Panel Data Above and Below with Unique Orders
Post a Comment