Get all DNAStrings from a StringSet as individual FASTA files with corresponding names in R
2
0
Entering edit mode
3.3 years ago
cechersa • 0

I have this DNAStringSet (genomes) and need to put each genome as an individual FASTA file in a directory, but that they remain StringSets of length=1. The name of each file is concatenated in a vector (names), since they are too many I created this loop:

n <- seq_len(length(names))
   for(i in 1:n){
      for(j in seq_len(length(names))){
         writeXStringSet(genomes[i],names[j])
      }
   }

and returned this:

>Warning message:
In 1:n : numerical expression has 195 elements: only the first used

I checked the results, the files where there with the right names but the sequence of all of them was the first one. I changed the loop:

for(i in seq_len(length(genomes))){
   for(j in seq_len(length(names))){
      writeXStringSet(genomes[i],names[j])
   }
}

No error appeared, but now all files contain the last sequence of the StringSet.

I'm new using R and the Biostrings package. Is there a way I could fix this or do something else, so I can get all the files with their corresponding genome on it?

Thank you in advance!

Sample of the objects:

>genomes
DNAStringSet object of length 3:
    width seq                                           names
[1] 47 TATAAAACACCCTCAATTCAAGGGTTTAATTTTTCACAATCATTAAA HP83
[2] 47 TAAAACACCCTCAATTCAAGGGTTTCATTTTTTAAAACTATTAAATA HPS49
[3] 47 AAAAACCTTGTTTTAGTCTTTTTTATAGATTTCATGTTCAAGTCTTC P49

>names <- c("HP83.fasta","HPS49.fasta","P49.fasta")
R genome DNA StringSet • 1.8k views
ADD COMMENT
0
Entering edit mode

Please don't use all caps, it's bad etiquette. I've edited your post and made the necessary changes this time.

ADD REPLY
2
Entering edit mode
3.3 years ago
ATpoint 81k

I am not sure I can reproduce this, can you please make a little reproducible example on how your data look? For example type dput(genomes) and see whether you can provide the output if it is short enough to paste it here. So you have a single DNAStringSet, and this has multiple entries (each being a genome), and you want to save each to disk as fasta?

Here is a little example, I guess similar to what you did, probably I do not get the error, lease provide some reproducible example:

#/ Example data: BiocManager::install("drosophila2probe")
library(drosophila2probe)
library(Biostrings)
probes <- DNAStringSet(drosophila2probe)[1:3]
names(probes) <- paste0("genome_", seq(1,length(probes)))

#/ Access each genome by name:
for(s in names(probes)) writeXStringSet(probes[s], paste0(s, ".fasta"))
ADD COMMENT
0
Entering edit mode

Thanks a lot! Yes, the real StringSet is one object with 195 entries, all the genomes are in one object, so I want to get each one out into a .fasta file. The object looks like the sample I posted and example data you wrote, just that has 195 entries and width of about 1.6Mb each.

But, thank you, this worked very well on a subset of my data, just tried it.

ADD REPLY
0
Entering edit mode
3.3 years ago

For starters,

Are you completely sure you want this

n <- seq_len(length(names))
for(i in 1:n){

And not

n <- seq_len(length(names))
for(i in n){

or

n <- length(names)
for(i in 1:n){

seq_len() returns a list of numbers, length() returns just one number

ADD COMMENT
0
Entering edit mode

Yes, in order to tell R to iterate a number of times you have to give it a sequence in which it will iterate in, for example to do:

> for(i in 1:4){
print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4

But, if you don't give a sequence of integers, you get this:

> for(i in 4){
print(i)
}
[1] 4

seq_length gets that integer sequence out of the length of an object, and I needed to use each of those values to tell writeXStringSet that was the element number in genomes I wanted to write as fasta.

ADD REPLY
0
Entering edit mode

But are you giving it 1:4, or are you giving it 1:c(1:4)?

n <- seq_len(length(names))
n

Is n a single number, or a list of numbers?

ADD REPLY
0
Entering edit mode

n is a number originally, but yes you could save the sequence as n as well

ADD REPLY

Login before adding your answer.

Traffic: 1692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6