Question: All result in one file
0
gravatar for amandinelecerfdefer
11 months ago by
amandinelecerfdefer20 wrote:

Hello, I want to perform the Biomart request on several files in the same folder (which works). But I would like the output of the request for each file to be saved either: 1 file to its own output file (1 request = 1 file) or the output of requests for 500 files is in the same file (500 files = 1 output file with accumulation of the output of each request). Here is the code used but in the final file, only the last request is saved.

library(biomaRt)
files<-list.files(path = "/Users/amandinelecerfdefer/Desktop/poi/data/", pattern = (".txt$"))
files
myList2 <- list()

for (k in 1:length(files)) {
  setwd("/Users/amandinelecerfdefer/Desktop/poi/data/")
  myList2[[k]] <- read.delim(files[k])
  snpmart <-
    useMart(biomart = "ENSEMBL_MART_SNP", dataset="hsapiens_snp")

  res <- getBM(
    attributes = c(
      "refsnp_id",
      "ensembl_gene_stable_id",
      "ensembl_transcript_stable_id"
    ),
    filters = "snp_filter",
    values = myList2[[k]]$rsID,
    mart = snpmart,
    uniqueRows = TRUE
  )

  setwd("/Users/amandinelecerfdefer/Desktop/poi/result/")
  write.csv(res[[k]], file = "recovery_gene_trans.txt")
  or 
    for(k in 1:length(files)){
         setwd("/Users/amandinelecerfdefer/Desktop/poi/result/")
         write.csv(res[[k]], file = "recovery_gene_trans.txt")
    }

}

Always the same issue

How to do this?

R • 261 views
ADD COMMENTlink modified 11 months ago by SMK1.9k • written 11 months ago by amandinelecerfdefer20
1
gravatar for manuel.belmadani
11 months ago by
Canada
manuel.belmadani1.2k wrote:

The easiest way to fix that would be to change your filename as you're writing is, for example:

write.csv(res[[i]], file = paste0("recovery_gene_trans_",k,".txt"))

So for each k file, your file name with be suffixed with k.txt.

But what is i here? it shows up in your write.csv but doesn't seem to get set before, so maybe you want to switch that to a constant other than i if it's always the same?

Another way that might be of interest to you, if each file has the same columns, is to do something like:

# Set full.names=TRUE, so we get the full path and filename, and won't need to change working directory: setwd().
files <- list.files(path = "/Users/amandinelecerfdefer/Desktop/poi/data/", pattern = "*.txt$", full.names = TRUE)

combined.files <-
  do.call(rbind,
          lapply(files, function(filename_k) {
            file_k <- read.delim(filename_k)
            snpmart <-
              useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")

            res <- getBM(
              attributes = c(
                "refsnp_id",
                "ensembl_gene_stable_id",
                "ensembl_transcript_stable_id"
              ),
              filters = "snp_filter",
              values =  file_k$rsID,
              mart = snpmart,
              uniqueRows = TRUE
            )

            return((res[[i]])
          }))

This will call lapply and return you a list of data which gets combined into one table by do.call( rbind,. You could even add a column to your res[[i]] to identify which k file it's coming from.

ADD COMMENTlink modified 11 months ago by zx87549.2k • written 11 months ago by manuel.belmadani1.2k

Thank you for your answer. Excuse me, I made a mistake, there is no i in my code, it's a bad habit, it's a k instead of the i.

ADD REPLYlink written 11 months ago by amandinelecerfdefer20

unfortunately, I just tried your proposals, which unfortunately don't work.

edit : I answer here because the site doesn't want me to comment on your answer: I want to retrieve the total output of each request and not just one item to be returned by BioMart

ADD REPLYlink modified 11 months ago • written 11 months ago by amandinelecerfdefer20

Which one? Does it give you an error message or it doesn't merge them properly?

The main problem I see is that you're getting res from biomart. So calling res[[k]] doesn't seem to make sense since biomart doesn't know that you have k files, that's why I assumed you were using i in res[[i]] to access a specific element of the biomart output.

Check if you want the whole res list or a specific element of it, but it seems unlikely that you'll want element k for each iteration.

ADD REPLYlink written 11 months ago by manuel.belmadani1.2k
0
gravatar for SMK
11 months ago by
SMK1.9k
SMK1.9k wrote:

Hi amandinelecerfdefer,

To write 1 file to its own output file, you can do something like:

library(biomaRt)
setwd("/Users/amandinelecerfdefer/Desktop/poi")
files <- list.files(path = "data", pattern = (".txt$"))

snpmart <- useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
for (k in 1:length(files)) {
  fname <- files[k]
  cat(paste0("Now parsing data/", fname, "...\n"))
  data <- read.delim(paste0("data/", fname))

  res <- getBM(
    attributes = c(
      "refsnp_id",
      "ensembl_gene_stable_id",
      "ensembl_transcript_stable_id"
    ),
    filters = "snp_filter",
    values = data$rsID,
    mart = snpmart,
    uniqueRows = TRUE
  )

  write.csv(res, file = paste0("result/recovery_gene_trans_", fname))
  rm(data, res)
  Sys.sleep(5)
}

To output to the same file, just remove any existing result/recovery_gene_trans.txt and change write.csv to:

  write.table(
    res,
    file = "result/recovery_gene_trans.txt",
    append = T,
    row.names = F,
    col.names = !file.exists("result/recovery_gene_trans.txt"),
    sep = ","
  )
ADD COMMENTlink modified 11 months ago • written 11 months ago by SMK1.9k

Thank you, it works!

ADD REPLYlink written 11 months ago by amandinelecerfdefer20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1135 users visited in the last hour