Deleting specific Columns in multiple CSV files using rstudio
3
0
Entering edit mode
3.6 years ago

Hello Everyone

I have a folder, consist of 15000 separate CSV files. All of these csv files have the same heading ( date, time, event,...). I want to remove the event column in all of my csv files. How can I do that?

Thank you

R • 3.9k views
ADD COMMENT
2
Entering edit mode
3.6 years ago

You can do this with cut.

mkdir cleaned
for file in $(find . -name "*\.csv"); do cut -d"," -f1,2,4- $file > cleaned/$(basename $file); done

If you really want to do it in R.

library("data.table")
library("purrr")

dir.create("cleaned")
files <- list.files(pattern="\\.csv$", full.names=TRUE)
walk(files, function(x) {
  DT <- fread(x, sep=",")
  DT[, event := NULL]
  fwrite(DT, file.path("cleaned", basename(x)), sep=",", col.names=TRUE, row.names=FALSE, quote=FALSE)
})

EDIT: awk command amended to a cut command after the kind correction from @Alex Reynolds

ADD COMMENT
0
Entering edit mode

Awesome!

1-How can I remove two or more columns like event, time, ...?

Thank you so much

ADD REPLY
1
Entering edit mode

Alex Reynolds gives a great explanation in his reply if you choose to use the cut option. For the R option you can change DT[, event := NULL] to DT[, c("event", "time") := NULL] to remove as many columns as you wish.

EDIT: Alex's post seems to have disappeared so I'll expand on cut a little bit. The part of cut that lets you select columns is the -f1,2,4- part. For that example you select the first, second, and fourth and above columns. -f1-5 would select columns one through five. -f2-5,7- would select columns two through five, and then columns seven and above.

ADD REPLY
0
Entering edit mode

Thank you for your prompt and great answer to this problem. I really appreciate it

ADD REPLY
1
Entering edit mode
3.6 years ago

Do you have specific need for R to remove columns? You can do it outside R. With Gnu-Parallel and tsv-utils:

$ parallel  'tsv-select -H -d "," -e "event" {} > output_directory/{.}_new.csv' ::: *.csv

with Gnu-Parallel and csvtk:

$ parallel csvtk cut -f -event {} -o {.}_new.txt ::: *.txt

Create output directory before running the code.

ADD COMMENT

Login before adding your answer.

Traffic: 3045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6