Trying to read multiple CSV files in a folder and add filename to each column in R
1
0
Entering edit mode
2.7 years ago
mxz707 • 0

Hello Everyone,

I have multiple csv files (27 files of extracted features of MRI patients). Each csv file has different number of rows but the same set of columns. I am going to read each file and try to add file name to each columns' names. For instance, for file 27 with "AX_FLAIR-parcellationROI_27" name, I want to add "AX_FLAIR-parcellationROI_27" to each columns' names. To work on all the files at the same time , I put all files in a list and use a loop to read each file. I am using "paste" function in order to add new string to each column, but I got an error.

files <- list.files(pattern = "/Users/mostafa/Documents/Mostafa/UM/Papers/Data/TCIA/TCGA-GBM/RadiomicsData_EC/TCGA-02-0009/RadiomicsFeatures-csv/AX_FLAIR")

for(f in files){ 
  ReadInFile <- read.csv(file=f, header=T, na.strings="NULL")
  colnames(ReadIFile[f])=paste(colnames(ReadInFile[f]),"_AX_FLAIR-parcellationROI_",f)
}

Is there a way to fix this last piece of code? Thanks.

R Merging • 5.0k views
ADD COMMENT
0
Entering edit mode

There is easier way (works in zsh and bash)

$ for i in *.csv; do awk 'NR==1{gsub(/,|$/,"_"FILENAME","); sub (/,$/,"")}1' $i > ${i%\.csv}"_new.csv"; done

new files will have "new.csv" extension. It helps if you could post data. Hopefully, text within quotes, do not have comma inside them.

ADD REPLY
0
Entering edit mode

Thanks for you guidance. Is your command in R? If not, how can I do it in R?

ADD REPLY
0
Entering edit mode

Try this in R:

files = list.files(pattern = ".csv", path = "~/Desktop/")

for (i in files){
    iname=sub("\\.csv","",i)
    f=read.csv(i, header = T)
    colnames(f)=paste(colnames(f),iname,sep = "_")
    assign(paste0(iname,"_new"),f)
}
ls()

This would add file name to each column and all new files are stored in R as objects for further manipulation. ls() would list the objects and objects with "_new" are the objects with filename appended in all columns. change the path in code, as per your data location on local machine

ADD REPLY
0
Entering edit mode

Great!! I tried it, but I got the following error:

Error in file(file, "rt") : cannot open the connection

How can I deal with that?

PS: I am using a mac to run the code. Maybe there is a problem with this command:

iname=sub("\\.csv","",i)
ADD REPLY
1
Entering edit mode

Please post the code you have used and also tree of the directory, from where the code is used. print getwd() in R to get active directory.

ADD REPLY
0
Entering edit mode

Thanks, the issue is solved!!

ADD REPLY
1
Entering edit mode

it would be helpful for others if you could post how the issue is resolved.

ADD REPLY
0
Entering edit mode

Here is how the issue is resolved:

files<- list.files(path = "/Users/mostafa/Documents/Mostafa/UM/Papers/Data/TCIA/TCGA-GBM/RadiomicsData_EC/TCGA-02-0009/RadiomicsFeatures-csv/AX_FLAIR_Copy", pattern=".csv", full.names=T)

for (i in files){
    iname=sub("\\.csv","",i)
    f=read.csv(i, header = T)
    colnames(f)=paste(colnames(f),iname,sep = "_")
    assign(paste0(iname,"_new"),f)
}
ls()
ADD REPLY
0
Entering edit mode
2.7 years ago
Sam ★ 4.7k

You want to do

colnames(ReadIFile)=paste0(colnames(ReadInFile),"_AX_FLAIR-parcellationROI_",f)

Though your file format will be horrible and will not be easy to analyze. The easier way might be to add an additional column to your data frame indicating which file the data is from

files <- list.files(pattern = "/Users/mostafa/Documents/Mostafa/UM/Papers/Data/TCIA/TCGA-GBM/RadiomicsData_EC/TCGA-02-0009/RadiomicsFeatures-csv/AX_FLAIR")
res <- NULL
for(f in files){ 
  ReadInFile <- read.csv(file=f, header=T, na.strings="NULL")
  ReadInFile$File <- f
  res <- rbind(res, ReadInFile)
}
ADD COMMENT
0
Entering edit mode

Thanks, Sam. I see your concerns. I know it might be horrible, but this is the one approach that I am thinking about to create a matrix data of extracted features. Now, I am trying to see if it is doable or not. If not, I will drop it and think about another approach. About the correction that you sent me, I tried, but I got the following error:

Error in file(file, "rt") : cannot open the connection

How should I fix that? Thanks for you help.

ADD REPLY
0
Entering edit mode

Try doing

files <- list.files(path = "/Users/mostafa/Documents/Mostafa/UM/Papers/Data/TCIA/TCGA-GBM/RadiomicsData_EC/TCGA-02-0009/RadiomicsFeatures-csv", pattern="AX_FLAIR", full.names=T)
ADD REPLY
0
Entering edit mode

Thanks, Sam!! I changed your command a little bit and it worked perfectly.

files2 <- list.files(path = "/Users/mostafa/Documents/Mostafa/UM/Papers/Data/TCIA/TCGA-GBM/RadiomicsData_EC/TCGA-02-0009/RadiomicsFeatures-csv/AX_FLAIR_Copy", pattern=".csv", full.names=T)
ADD REPLY

Login before adding your answer.

Traffic: 2467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6