Question: Process multiples files
0
gravatar for Lila M
2.3 years ago by
Lila M 460
UK
Lila M 460 wrote:

Hi everyone, I'm new using R and I have a doubt, I have this code for make peak annotation from a bed file (narrowPeak)

peak <- readPeakFile("file", header=F)
peakpeakAnno <- annotatePeak(peak, tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Hs.eg.db")

write.table(peakAnno,"new_name", sep="\t", col.names=T, row.names = F)

The code works, but I would like to know how I can create a loop that processes more than one bed file.

Thank you!!

chip-seq R • 933 views
ADD COMMENTlink modified 2.3 years ago by Biostar ♦♦ 20 • written 2.3 years ago by Lila M 460
1

Applying a task to several files in R

ADD REPLYlink written 2.3 years ago by genomax64k

Thank you, but when I process two files, the code only writes one file:

files = Sys.glob("*.txt")
files
[1] "1.txt"  "2.txt"
for(i in files){
peak <- readPeakFile(i, header=F)
peak
peakAnno <- annotatePeak(peak, tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Hs.eg.db")
write.table(peakAnno,"proof", sep="\t", col.names=T, row.names = F)
}

How can I get the two new tables?

Thanks!

ADD REPLYlink modified 2.3 years ago by WouterDeCoster37k • written 2.3 years ago by Lila M 460

Please use ADD REPLY to answer to earlier comments or posts, as such this thread remains logically structured and easy to follow. I moved your answer now, but as you can see that's not optimal.

You have write.table in the loop, with which you overwrite the previous results. Either write it to separate files depending on the value of i, or keep the information in memory and rbind() the results together (depending on the size of your dataset this may or may not be possible), after which you write the output to a file after the completion of the for loop.

ADD REPLYlink written 2.3 years ago by WouterDeCoster37k

Can you show an example how the files look like?

ADD REPLYlink written 2.3 years ago by Ron910

They are tab delimited files. I can't fix the problem :( can anybody write an example, please? Thanks

ADD REPLYlink written 2.3 years ago by Lila M 460
1

Try this.This code can be used to rbind the tab delimited files (concatenating row-wise). You can change the rbind function to something else.

fileList <- list.files(, pattern=".txt")

new_df=do.call(rbind, lapply( fileList, function(X) {
  data.frame(id = basename(X), tryCatch(read.table(X), error=function(e) NULL))}
))
ADD REPLYlink written 2.3 years ago by Ron910

I think that my problem is easier: This is my code, that works

files <- list("1", "2")
peakAnno <- lapply(files, annotatePeak, TxDb=txdb, tssRegion=c(-3000, 3000), annoDb="org.Hs.eg.db")
print (peakAnno)
for (i in peakAnno){
    write.table(i, xxxx , sep="\t", col.names=T, row.names = F)
}

I only need that "xxx" will be different each time in the loop. It that possible?

Thanks!

ADD REPLYlink modified 2.3 years ago by genomax64k • written 2.3 years ago by Lila M 460

If I understood correctly you just want the output file name to be dependent on the i while looping, right? I don't understand why you use files <- list("1", "2") and the solution of Ron is far better, I modified a bit:

fileList <- list.files(, pattern=".txt")
peakAnno <- lapply(files, annotatePeak, TxDb=txdb, tssRegion=c(-3000, 3000), annoDb="org.Hs.eg.db")
for (i in fileList){
    write.table(i, paste("peaks_", i, sep="") , sep="\t", col.names=T, row.names = F)
}

But you could also write peakAnno to one output file, I guess. As you can see I'm reusing the names from the fileList object and adding a prefix to it, which you can off course freely modify.

ADD REPLYlink written 2.3 years ago by WouterDeCoster37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2393 users visited in the last hour