How to remove NAs in multiple CSV files in a folder
2
0
Entering edit mode
3.7 years ago

I have a folder containing 15000 CSV files and I need to remove all NAs in all those files.

R • 1.8k views
ADD COMMENT
1
Entering edit mode

Remove NAs meaning:

  1. Replace NAs with "" ?
  2. Remove rows with NAs?
ADD REPLY
0
Entering edit mode

I need to remove rows with NAs

ADD REPLY
0
Entering edit mode

You have to write a script to iterate on each csv to drop rows that contains NA.

ADD REPLY
0
Entering edit mode

Do you want to remove any rows/columns with NA values, or replace NA with something? For this question you may want to include an example of what one of the files looks like.

ADD REPLY
1
Entering edit mode
3.7 years ago

if files do not have headers, try running the following command (take a back up of your files and create a directory by name "output", before you proceed):

$ parallel  'grep -v "NA" {}  > output/new_{.}.csv' ::: *.csv

if files have headers, install tsv-utils and run following command:

$ parallel  'keep-header {} -- grep -v "NA" > output/new_{.}.csv' ::: *.csv
ADD COMMENT
0
Entering edit mode
3.7 years ago
zx8754 11k

Probably this job more suitable for bash, but here is using R: find all files, keep only complete cases (remove any row that has NA), then output with renamed filename.

library(data.table)

for(i in list.files("path/to/files", pattern = ".*\\.csv", full.names = TRUE)){
  d <- fread(i)
  fwrite(d[ complete.cases(d), ], file = paste0(i, ".clean.csv"))
}
ADD COMMENT
0
Entering edit mode

R uses POSIX extended regex, so * means 0 or more times, and . means any character except newline. Your regex as written should most likely be .*\\.csv, however, csv$ should be sufficient.

ADD REPLY
0
Entering edit mode

By far I am not a regex expert, but "*.csv" works fine for me. In any case, OP can test and amend as needed.

ADD REPLY
1
Entering edit mode

It will work, but not as intended or expected. It's better to use properly formatted regular expressions to avoid capturing Mechanicsville, Virginia.

ADD REPLY
0
Entering edit mode

OK, convinced :) updated the post, thank you.

ADD REPLY
0
Entering edit mode

Take it from someone whose had all kinds of strange unintended regex captures that it's worth it to make sure it's properly formatted, haha.

ADD REPLY
0
Entering edit mode

This code doesn't produce a ".clean.csv" file

ADD REPLY

Login before adding your answer.

Traffic: 3180 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6