Hello everyone,
I need help reading in a 10GB .csv dataset with very limited physical memory on my computer. I am having trouble with the dictionary I believe.
I am trying to read in the FCC dataset https://opendata.fcc.gov/Wireline/Fi...2019/whue-6pnt which I've downloaded on my usb. I need only CA state data. Because I only have 10GB on my machine as memory (and I can't change that today), I want to infile this only within the range of the state that I need and the variables that I need.
I can see the data if I run
import delim using "Fixed_Broadband_Deployment_Data__December_2019.cs v" , rowr(1:100) clear
but can't run
infile using fccdict in 1, clear
as I get the following:
'LogicalReco' cannot be read as a number for logicalrecordnumber[1]
'rdNumber,Pr' cannot be read as a number for providerid[1]
'oviderID,FR' cannot be read as a number for frn[1]
'rName,DBA' cannot be read as a number for censusblockfipscode[1]
'Name,Ho' cannot be read as a number for consumer[1]
'ldingCo' cannot be read as a number for maxadvertiseddownstreamspeedmbps[1]
'mpanyNam' cannot be read as a number for maxadvertisedupstreamspeedmbps[1]
(1 observation read)
.
. list
+-------------------------------------------------------------------------------+
| logica~r provid~d frn state census~e consumer m~down~s m~upst~s |
|-------------------------------------------------------------------------------|
1. | . . . N, . . . . |
+-------------------------------------------------------------------------------+
with the dictionary file fccdict.dct being
dictionary using "Fixed_Broadband_Deployment_Data__December_2019.cs v"{
* for fcc data
*
*1 logicalrecord~r long %12.0g Logical Record Number
*2 providerid long %12.0g Provider ID
*3 frn long %12.0g FRN
*4 providername str46 %46s Provider Name
*5 dbaname str39 %39s DBA Name
*6 holdingcompan~e str46 %46s Holding Company Name
*7 holdingcompan~r long %12.0g Holding Company Number
*8 holdingcompan~l str46 %46s Holding Company Final
*9 state str2 %9s State
*10 censusblockfi~e double %10.0g Census Block FIPS Code
*11 technologycode byte %8.0g Technology Code
*12consumer byte %8.0g Consumer
*13 m~downstreams~s int %8.0g Max Advertised Downstream Speed (mbps)
*14m~upstreamspe~s float %9.0g Max Advertised Upstream Speed (mbps)
*15business byte %8.0g Business
*
_first(1)
_lines(1)
long logicalrecordnumber %12.0g
long providerid %12.0g
long frn %12.0g
str2 state %9s
double censusblockfipscode %10.0g
byte consumer %8.0g
int maxadvertiseddownstreamspeedmbps %8.0g
float maxadvertisedupstreamspeedmbps %9.0g
}
Doing
import delim using "Fixed_Broadband_Deployment_Data__December_2019.cs v"
just gets Stata stuck and according to stata I need a lot of memory. I am on my work computer so getting more memory is a whole bureaucratic affair that will have to wait till Monday and I need to process my dataset today.
THANKS IN ADVANCE!
About information on my machine:
Stata/MP 14.2 for Windows (64-bit x86-64)
Revision 29 Jan 2018
Copyright 1985-2015 StataCorp LLC
Total physical memory: 16594036 KB
Available physical memory: 9371188 KB
538-user 4-core Stata network perpetual license:
Licensed to: Stata/MP 14 (4 cores)
Related Posts with Reading in 10GB .csv dataset with not enough memory on my machine
Latent variable scoreDear Stata users, could someone tell me how to estimate the latent variable score for a first-order…
Stata16.0 the code of spmapDear all, I am using the code of spmap by Stata SE 16.0,but here some problems. The example data are…
space, global variable and combine graphsDear Statalists, I tried to use Stata command grc1leg2 in Stata 15 to combine several graphs. I have…
How to use 'local' for grouping control variables in multiple regressionsDear all, I'm running multiple regression with the same set of controls. I use "local" command so t…
Delete the last numbersDear All, I have this dataset Code: * Example generated by -dataex-. To install: ssc install dataex…
Subscribe to:
Post Comments (Atom)
0 Response to Reading in 10GB .csv dataset with not enough memory on my machine
Post a Comment