All result in one file
2
0
Entering edit mode
3.3 years ago

Hello, I want to perform the Biomart request on several files in the same folder (which works). But I would like the output of the request for each file to be saved either: 1 file to its own output file (1 request = 1 file) or the output of requests for 500 files is in the same file (500 files = 1 output file with accumulation of the output of each request). Here is the code used but in the final file, only the last request is saved.

library(biomaRt)
files<-list.files(path = "/Users/amandinelecerfdefer/Desktop/poi/data/", pattern = (".txt$"))
files
myList2 <- list()

for (k in 1:length(files)) {
  setwd("/Users/amandinelecerfdefer/Desktop/poi/data/")
  myList2[[k]] <- read.delim(files[k])
  snpmart <-
    useMart(biomart = "ENSEMBL_MART_SNP", dataset="hsapiens_snp")

  res <- getBM(
    attributes = c(
      "refsnp_id",
      "ensembl_gene_stable_id",
      "ensembl_transcript_stable_id"
    ),
    filters = "snp_filter",
    values = myList2[[k]]$rsID,
    mart = snpmart,
    uniqueRows = TRUE
  )

  setwd("/Users/amandinelecerfdefer/Desktop/poi/result/")
  write.csv(res[[k]], file = "recovery_gene_trans.txt")
  or 
    for(k in 1:length(files)){
         setwd("/Users/amandinelecerfdefer/Desktop/poi/result/")
         write.csv(res[[k]], file = "recovery_gene_trans.txt")
    }

}

Always the same issue

How to do this?

R • 679 views
ADD COMMENT
1
Entering edit mode
3.3 years ago

The easiest way to fix that would be to change your filename as you're writing is, for example:

write.csv(res[[i]], file = paste0("recovery_gene_trans_",k,".txt"))

So for each k file, your file name with be suffixed with k.txt.

But what is i here? it shows up in your write.csv but doesn't seem to get set before, so maybe you want to switch that to a constant other than i if it's always the same?

Another way that might be of interest to you, if each file has the same columns, is to do something like:

# Set full.names=TRUE, so we get the full path and filename, and won't need to change working directory: setwd().
files <- list.files(path = "/Users/amandinelecerfdefer/Desktop/poi/data/", pattern = "*.txt$", full.names = TRUE)

combined.files <-
  do.call(rbind,
          lapply(files, function(filename_k) {
            file_k <- read.delim(filename_k)
            snpmart <-
              useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")

            res <- getBM(
              attributes = c(
                "refsnp_id",
                "ensembl_gene_stable_id",
                "ensembl_transcript_stable_id"
              ),
              filters = "snp_filter",
              values =  file_k$rsID,
              mart = snpmart,
              uniqueRows = TRUE
            )

            return((res[[i]])
          }))

This will call lapply and return you a list of data which gets combined into one table by do.call( rbind,. You could even add a column to your res[[i]] to identify which k file it's coming from.

ADD COMMENT
0
Entering edit mode

Thank you for your answer. Excuse me, I made a mistake, there is no i in my code, it's a bad habit, it's a k instead of the i.

ADD REPLY
0
Entering edit mode

unfortunately, I just tried your proposals, which unfortunately don't work.

edit : I answer here because the site doesn't want me to comment on your answer: I want to retrieve the total output of each request and not just one item to be returned by BioMart

ADD REPLY
0
Entering edit mode

Which one? Does it give you an error message or it doesn't merge them properly?

The main problem I see is that you're getting res from biomart. So calling res[[k]] doesn't seem to make sense since biomart doesn't know that you have k files, that's why I assumed you were using i in res[[i]] to access a specific element of the biomart output.

Check if you want the whole res list or a specific element of it, but it seems unlikely that you'll want element k for each iteration.

ADD REPLY
0
Entering edit mode
3.3 years ago
AK ★ 2.1k

Hi amandinelecerfdefer,

To write 1 file to its own output file, you can do something like:

library(biomaRt)
setwd("/Users/amandinelecerfdefer/Desktop/poi")
files <- list.files(path = "data", pattern = (".txt$"))

snpmart <- useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
for (k in 1:length(files)) {
  fname <- files[k]
  cat(paste0("Now parsing data/", fname, "...\n"))
  data <- read.delim(paste0("data/", fname))

  res <- getBM(
    attributes = c(
      "refsnp_id",
      "ensembl_gene_stable_id",
      "ensembl_transcript_stable_id"
    ),
    filters = "snp_filter",
    values = data$rsID,
    mart = snpmart,
    uniqueRows = TRUE
  )

  write.csv(res, file = paste0("result/recovery_gene_trans_", fname))
  rm(data, res)
  Sys.sleep(5)
}

To output to the same file, just remove any existing result/recovery_gene_trans.txt and change write.csv to:

  write.table(
    res,
    file = "result/recovery_gene_trans.txt",
    append = T,
    row.names = F,
    col.names = !file.exists("result/recovery_gene_trans.txt"),
    sep = ","
  )
ADD COMMENT
0
Entering edit mode

Thank you, it works!

ADD REPLY

Login before adding your answer.

Traffic: 1337 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6