Space Required to run 15 GB File in python
0
0
Entering edit mode
2.5 years ago
anasjamshed ▴ 120

I want to know how much ram memory is required to run a 15 GB data file and to clean datasets through python pandas.

I have a file of 14.8 GB which contains genes information. It contains a total of 45804630 rows and 39 columns

When tried to open it by taking 1000000 rows through pandas it was working fine

Code:

data = pd.read_csv("CosmicGenomeScreensMutantExport.tsv", sep= '\t',nrows=1000000)

But when I am trying to read all datasets at once, it hangs my pc. I only have 4 GB ram. So should I increase my RAM?

python pandas • 1.3k views
ADD COMMENT
1
Entering edit mode

If you need to have a 15G file in memory then you obviously need more than 15G RAM, possibly a lot more depending on what precision you are reading your data (e.g. float16, float32, etc.). The pandas concat function could be of use to you, check e.g. this

ADD REPLY
0
Entering edit mode

I am trying to use this code:

#Reading file in chunks as data is very large
chunks=pd.read_csv("CosmicGenomeScreensMutantExport.tsv",chunksize=1000000,sep='\t')

#Make dataframe by concatating list of chunks
df= pd.concat(list(chunks), ignore_index=True)

from my friend pc who has 8 gb so will it work?

ADD REPLY
0
Entering edit mode

I don't understand why you would need to have the whole file in memory if you only need to "clean" it, whatever that means.. sounds like something that shouldn't require all the rows at the same time, so why not just process the file in chunks?

ADD REPLY
0
Entering edit mode

but i want to clean all datasets that's why i need to upload all at once

ADD REPLY
0
Entering edit mode

I'm just guessing here, but you probably need like minimum 64G RAM if you want to have your file in memory..

ADD REPLY
0
Entering edit mode

Ohh no. How can I acquire 64 GB

ADD REPLY

Login before adding your answer.

Traffic: 1692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6