Question

Renaming one column in multiple csv files using the filename itself.

0

Entering edit mode

3.2 years ago

pramach1 ▴ 40

I have 52 files with same number of columns (5) with same titles (same headers), but different rows. I want to change one column named as "Hits" to the .csv file name itself.

for example filenames are 10235678.csv, 1405872.csv and so on. I want to change the "Hits" in the first csv as 10235678, and in the second csv - Hits column has to be changed into 1405872.

files_to_read <- list.files(pattern = ".csv", full.names = T) # reading all csv files in the directory into list of files.

myfiles = lapply(files_to_read, read.delim, sep = ",") # The list is read into separate data frames with 5 columns in each csv files.

I can see that the myfiles has 52 files as separate data frame with 5 column and their column heading. I need to change one column named "Hits" into the file name itself.

R • 2.4k views

ADD COMMENT • link updated 3.2 years ago by rpolicastro 13k • written 3.2 years ago by pramach1 ▴ 40

0

Entering edit mode

You have a list of file names, and a list of file contents of the same length. Why not write a loop that picks the appropriate part from each corresponding file name and assigns it to the 5th column of each file content?

Even better, you could just write each file with a col.names parameter derived from the original colnames in which the 5th value is substituted with your computed value.

ADD REPLY • link 3.2 years ago by Ram 43k

0

Entering edit mode

I tried this one...

for (x in 1:52){names(myfiles[[x]])[names(myfiles[[x]]) == "Hits"] = paste0("****numbers****", x)}

This can replace some numbers...but not the file name. Not sure how to write a loop for the file names in the list.

ADD REPLY • link 3.2 years ago by pramach1 ▴ 40

0

Entering edit mode

You're replacing the column name with the index, not with the file name at that index. Use files_to_read[x], not just x. Plus, if every file has the same number of columns, I'm assuming the format is fixed. Why do you need to lookup where Hits is each time instead of just using a simple index for the column? It's also bizarre that while you pick that dynamically, you supply the 1:52 manually. Normally, one would pick the end point of a loop dynamically and alter a statically indexed column.

ADD REPLY • link 3.2 years ago by Ram 43k

0

Entering edit mode

Even better, you could just write each file with a col.names parameter derived from the original colnames in which the 5th value is substituted with your computed value.

Also I don't know how to do this one.

ADD REPLY • link 3.2 years ago by pramach1 ▴ 40

0

Entering edit mode

Run ?write.table and look at the col.names parameter. You can specify column names on the fly while writing output tables to file.

ADD REPLY • link 3.2 years ago by Ram 43k

score 3 · Accepted Answer · 2021-01-30

3

Entering edit mode

3.2 years ago

rpolicastro 13k

I'll provide a data.table solution here since it will be quicker and more memory efficient than base R or tidyverse.

library("data.table")

files_to_read <- list.files(pattern = ".csv", full.names = TRUE)

lapply(files_to_read, function(file) {
  DT <- fread(file, sep=",")
  setnames(DT, old="Hits", new=file_path_sans_ext(basename(file)))
  fwrite(DT, file, sep=",")
})

You can also do this on the command line with sed and modify the file in place.

for file in $(find . -name "*\.csv"); do sed -i "1s/Hits/$(basename $file .csv)/" $file; done