All result in one file
2
0
Entering edit mode
3.3 years ago

Hello, I want to perform the Biomart request on several files in the same folder (which works). But I would like the output of the request for each file to be saved either: 1 file to its own output file (1 request = 1 file) or the output of requests for 500 files is in the same file (500 files = 1 output file with accumulation of the output of each request). Here is the code used but in the final file, only the last request is saved.

library(biomaRt)
files<-list.files(path = "/Users/amandinelecerfdefer/Desktop/poi/data/", pattern = (".txt$")) files myList2 <- list() for (k in 1:length(files)) { setwd("/Users/amandinelecerfdefer/Desktop/poi/data/") myList2[[k]] <- read.delim(files[k]) snpmart <- useMart(biomart = "ENSEMBL_MART_SNP", dataset="hsapiens_snp") res <- getBM( attributes = c( "refsnp_id", "ensembl_gene_stable_id", "ensembl_transcript_stable_id" ), filters = "snp_filter", values = myList2[[k]]$rsID,
mart = snpmart,
uniqueRows = TRUE
)

setwd("/Users/amandinelecerfdefer/Desktop/poi/result/")
write.csv(res[[k]], file = "recovery_gene_trans.txt")
or
for(k in 1:length(files)){
setwd("/Users/amandinelecerfdefer/Desktop/poi/result/")
write.csv(res[[k]], file = "recovery_gene_trans.txt")
}

}


Always the same issue

How to do this?

R • 679 views
1
Entering edit mode
3.3 years ago

The easiest way to fix that would be to change your filename as you're writing is, for example:

write.csv(res[[i]], file = paste0("recovery_gene_trans_",k,".txt"))


So for each k file, your file name with be suffixed with k.txt.

But what is i here? it shows up in your write.csv but doesn't seem to get set before, so maybe you want to switch that to a constant other than i if it's always the same?

Another way that might be of interest to you, if each file has the same columns, is to do something like:

# Set full.names=TRUE, so we get the full path and filename, and won't need to change working directory: setwd().
files <- list.files(path = "/Users/amandinelecerfdefer/Desktop/poi/data/", pattern = "*.txt$", full.names = TRUE) combined.files <- do.call(rbind, lapply(files, function(filename_k) { file_k <- read.delim(filename_k) snpmart <- useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp") res <- getBM( attributes = c( "refsnp_id", "ensembl_gene_stable_id", "ensembl_transcript_stable_id" ), filters = "snp_filter", values = file_k$rsID,
mart = snpmart,
uniqueRows = TRUE
)

return((res[[i]])
}))


This will call lapply and return you a list of data which gets combined into one table by do.call( rbind,. You could even add a column to your res[[i]] to identify which k file it's coming from.

0
Entering edit mode

Thank you for your answer. Excuse me, I made a mistake, there is no i in my code, it's a bad habit, it's a k instead of the i.

0
Entering edit mode

unfortunately, I just tried your proposals, which unfortunately don't work.

edit : I answer here because the site doesn't want me to comment on your answer: I want to retrieve the total output of each request and not just one item to be returned by BioMart

0
Entering edit mode

Which one? Does it give you an error message or it doesn't merge them properly?

The main problem I see is that you're getting res from biomart. So calling res[[k]] doesn't seem to make sense since biomart doesn't know that you have k files, that's why I assumed you were using i in res[[i]] to access a specific element of the biomart output.

Check if you want the whole res list or a specific element of it, but it seems unlikely that you'll want element k for each iteration.

0
Entering edit mode
3.3 years ago
AK ★ 2.1k

To write 1 file to its own output file, you can do something like:

library(biomaRt)
setwd("/Users/amandinelecerfdefer/Desktop/poi")
files <- list.files(path = "data", pattern = (".txt$")) snpmart <- useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp") for (k in 1:length(files)) { fname <- files[k] cat(paste0("Now parsing data/", fname, "...\n")) data <- read.delim(paste0("data/", fname)) res <- getBM( attributes = c( "refsnp_id", "ensembl_gene_stable_id", "ensembl_transcript_stable_id" ), filters = "snp_filter", values = data$rsID,
mart = snpmart,
uniqueRows = TRUE
)

write.csv(res, file = paste0("result/recovery_gene_trans_", fname))
rm(data, res)
Sys.sleep(5)
}


To output to the same file, just remove any existing result/recovery_gene_trans.txt and change write.csv to:

  write.table(
res,
file = "result/recovery_gene_trans.txt",
append = T,
row.names = F,
col.names = !file.exists("result/recovery_gene_trans.txt"),
sep = ","
)

0
Entering edit mode

Thank you, it works!