Remove NA from file while reading
2
1
Entering edit mode
19 months ago
gubrins ▴ 290

Heys,

I have a big dataset (37Gb) with a lot of missing data as NA. I know how to remove NAs once the file is read, but I was wondering if there is any way to remove the NAs while (or without) reading the file, so the read file is already without NAs. If not I have to read all the file and as you can imagine I will need a lot of RAM.

Thanks in advance!!

python R • 938 views
ADD COMMENT
0
Entering edit mode

You don't need to read the entire file into memory to remove the NA's. You can read and write the file line by line to remove the NA, using python or R. You may be able to edit in place with sed.

ADD REPLY
2
Entering edit mode
19 months ago

Assuming you want a solution in R, I would do something along these lines using data.table::fread:

library(data.table)

dat <- fread(cmd="grep -v -w 'NA' bigfile.txt")

of course you would need to customize the shell command cmd to your case but hopefully you get the idea.

ADD COMMENT
0
Entering edit mode
19 months ago

Use a stream-oriented approach to process your data. For example readr with R:

https://readr.tidyverse.org/reference/read_lines.html

ADD COMMENT

Login before adding your answer.

Traffic: 2190 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6