Question: All result in one file
0
gravatar for amandinelecerfdefer
5 days ago by
amandinelecerfdefer10 wrote:

Hello, I want to perform the Biomart request on several files in the same folder (which works). But I would like the output of the request for each file to be saved either: 1 file to its own output file (1 request = 1 file) or the output of requests for 500 files is in the same file (500 files = 1 output file with accumulation of the output of each request). Here is the code used but in the final file, only the last request is saved.

library(biomaRt)
files<-list.files(path = "/Users/amandinelecerfdefer/Desktop/poi/data/", pattern = (".txt$"))
files
myList2 <- list()

for (k in 1:length(files)) {
  setwd("/Users/amandinelecerfdefer/Desktop/poi/data/")
  myList2[[k]] <- read.delim(files[k])
  snpmart <-
    useMart(biomart = "ENSEMBL_MART_SNP", dataset="hsapiens_snp")

  res <- getBM(
    attributes = c(
      "refsnp_id",
      "ensembl_gene_stable_id",
      "ensembl_transcript_stable_id"
    ),
    filters = "snp_filter",
    values = myList2[[k]]$rsID,
    mart = snpmart,
    uniqueRows = TRUE
  )

  setwd("/Users/amandinelecerfdefer/Desktop/poi/result/")
  write.csv(res[[k]], file = "recovery_gene_trans.txt")
  or 
    for(k in 1:length(files)){
         setwd("/Users/amandinelecerfdefer/Desktop/poi/result/")
         write.csv(res[[k]], file = "recovery_gene_trans.txt")
    }

}

Always the same issue

How to do this?

R • 100 views
ADD COMMENTlink modified 5 days ago by SMK1.4k • written 5 days ago by amandinelecerfdefer10
1
gravatar for manuel.belmadani
5 days ago by
Canada
manuel.belmadani860 wrote:

The easiest way to fix that would be to change your filename as you're writing is, for example:

write.csv(res[[i]], file = paste0("recovery_gene_trans_",k,".txt"))

So for each k file, your file name with be suffixed with k.txt.

But what is i here? it shows up in your write.csv but doesn't seem to get set before, so maybe you want to switch that to a constant other than i if it's always the same?

Another way that might be of interest to you, if each file has the same columns, is to do something like:

# Set full.names=TRUE, so we get the full path and filename, and won't need to change working directory: setwd().
files <- list.files(path = "/Users/amandinelecerfdefer/Desktop/poi/data/", pattern = "*.txt$", full.names = TRUE)

combined.files <-
  do.call(rbind,
          lapply(files, function(filename_k) {
            file_k <- read.delim(filename_k)
            snpmart <-
              useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")

            res <- getBM(
              attributes = c(
                "refsnp_id",
                "ensembl_gene_stable_id",
                "ensembl_transcript_stable_id"
              ),
              filters = "snp_filter",
              values =  file_k$rsID,
              mart = snpmart,
              uniqueRows = TRUE
            )

            return((res[[i]])
          }))

This will call lapply and return you a list of data which gets combined into one table by do.call( rbind,. You could even add a column to your res[[i]] to identify which k file it's coming from.

ADD COMMENTlink modified 5 days ago by zx87547.5k • written 5 days ago by manuel.belmadani860

Thank you for your answer. Excuse me, I made a mistake, there is no i in my code, it's a bad habit, it's a k instead of the i.

ADD REPLYlink written 5 days ago by amandinelecerfdefer10

unfortunately, I just tried your proposals, which unfortunately don't work.

edit : I answer here because the site doesn't want me to comment on your answer: I want to retrieve the total output of each request and not just one item to be returned by BioMart

ADD REPLYlink modified 5 days ago • written 5 days ago by amandinelecerfdefer10

Which one? Does it give you an error message or it doesn't merge them properly?

The main problem I see is that you're getting res from biomart. So calling res[[k]] doesn't seem to make sense since biomart doesn't know that you have k files, that's why I assumed you were using i in res[[i]] to access a specific element of the biomart output.

Check if you want the whole res list or a specific element of it, but it seems unlikely that you'll want element k for each iteration.

ADD REPLYlink written 5 days ago by manuel.belmadani860
0
gravatar for SMK
5 days ago by
SMK1.4k
Ghent, Belgium
SMK1.4k wrote:

Hi amandinelecerfdefer,

To write 1 file to its own output file, you can do something like:

library(biomaRt)
setwd("/Users/amandinelecerfdefer/Desktop/poi")
files <- list.files(path = "data", pattern = (".txt$"))

snpmart <- useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
for (k in 1:length(files)) {
  fname <- files[k]
  cat(paste0("Now parsing data/", fname, "...\n"))
  data <- read.delim(paste0("data/", fname))

  res <- getBM(
    attributes = c(
      "refsnp_id",
      "ensembl_gene_stable_id",
      "ensembl_transcript_stable_id"
    ),
    filters = "snp_filter",
    values = data$rsID,
    mart = snpmart,
    uniqueRows = TRUE
  )

  write.csv(res, file = paste0("result/recovery_gene_trans_", fname))
  rm(data, res)
  Sys.sleep(5)
}

To output to the same file, just remove any existing result/recovery_gene_trans.txt and change write.csv to:

  write.table(
    res,
    file = "result/recovery_gene_trans.txt",
    append = T,
    row.names = F,
    col.names = !file.exists("result/recovery_gene_trans.txt"),
    sep = ","
  )
ADD COMMENTlink modified 5 days ago • written 5 days ago by SMK1.4k

Thank you, it works!

ADD REPLYlink written 5 days ago by amandinelecerfdefer10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1781 users visited in the last hour