Hello everyone,
I need help reading in a 10GB .csv dataset with very limited physical memory on my computer. I am having trouble with the dictionary I believe.
I am trying to read in the FCC dataset https://opendata.fcc.gov/Wireline/Fi...2019/whue-6pnt which I've downloaded on my usb. I need only CA state data. Because I only have 10GB on my machine as memory (and I can't change that today), I want to infile this only within the range of the state that I need and the variables that I need.
I can see the data if I run
import delim using "Fixed_Broadband_Deployment_Data__December_2019.cs v" , rowr(1:100) clear
but can't run
infile using fccdict in 1, clear
as I get the following:
'LogicalReco' cannot be read as a number for logicalrecordnumber[1]
'rdNumber,Pr' cannot be read as a number for providerid[1]
'oviderID,FR' cannot be read as a number for frn[1]
'rName,DBA' cannot be read as a number for censusblockfipscode[1]
'Name,Ho' cannot be read as a number for consumer[1]
'ldingCo' cannot be read as a number for maxadvertiseddownstreamspeedmbps[1]
'mpanyNam' cannot be read as a number for maxadvertisedupstreamspeedmbps[1]
(1 observation read)
. list
| logica~r provid~d frn state census~e consumer m~down~s m~upst~s |
1. | . . . N, . . . . |
with the dictionary file fccdict.dct being
dictionary using "Fixed_Broadband_Deployment_Data__December_2019.cs v"{
* for fcc data
*1 logicalrecord~r long %12.0g Logical Record Number
*2 providerid long %12.0g Provider ID
*3 frn long %12.0g FRN
*4 providername str46 %46s Provider Name
*5 dbaname str39 %39s DBA Name
*6 holdingcompan~e str46 %46s Holding Company Name
*7 holdingcompan~r long %12.0g Holding Company Number
*8 holdingcompan~l str46 %46s Holding Company Final
*9 state str2 %9s State
*10 censusblockfi~e double %10.0g Census Block FIPS Code
*11 technologycode byte %8.0g Technology Code
*12consumer byte %8.0g Consumer
*13 m~downstreams~s int %8.0g Max Advertised Downstream Speed (mbps)
*14m~upstreamspe~s float %9.0g Max Advertised Upstream Speed (mbps)
*15business byte %8.0g Business
long logicalrecordnumber %12.0g
long providerid %12.0g
long frn %12.0g
str2 state %9s
double censusblockfipscode %10.0g
byte consumer %8.0g
int maxadvertiseddownstreamspeedmbps %8.0g
float maxadvertisedupstreamspeedmbps %9.0g
import delim using "Fixed_Broadband_Deployment_Data__December_2019.cs v"
just gets Stata stuck and according to stata I need a lot of memory. I am on my work computer so getting more memory is a whole bureaucratic affair that will have to wait till Monday and I need to process my dataset today.
About information on my machine:
Stata/MP 14.2 for Windows (64-bit x86-64)
Revision 29 Jan 2018
Copyright 1985-2015 StataCorp LLC
Total physical memory: 16594036 KB
Available physical memory: 9371188 KB
538-user 4-core Stata network perpetual license:
Licensed to: Stata/MP 14 (4 cores)
Related Posts with Reading in 10GB .csv dataset with not enough memory on my machine
Conversion of string to number changes numberHi, I am having trouble when Stata converts a variable from numeric to a string, as it appears to b…
Reshape long formatHi everyone, The whole file is attached below How can I calculate the average value for DM Age colu…
spmatrix error110: weighting matrix W already existsDear all, It's odd that when I put in the code : Code: spmatrix create contiguity W if year == 199…
Creating number of obs and summing for new variable valueHello, I have data on loans, one loan per observation, each with a zipcode and amount attached as va…
Match observations in the concurrent year with observations in the previous yearDear all, I have encountered a problem when using propensity score matching to match observations i…
Subscribe to:
Post Comments (Atom)
0 Response to Reading in 10GB .csv dataset with not enough memory on my machine
Post a Comment