Question: Get all DNAStrings from a StringSet as individual FASTA files with corresponding names in R
0
gravatar for cechersa
10 weeks ago by
cechersa0
cechersa0 wrote:

I have this DNAStringSet (genomes) and need to put each genome as an individual FASTA file in a directory, but that they remain StringSets of length=1. The name of each file is concatenated in a vector (names), since they are too many I created this loop:

n <- seq_len(length(names))
   for(i in 1:n){
      for(j in seq_len(length(names))){
         writeXStringSet(genomes[i],names[j])
      }
   }

and returned this:

>Warning message:
In 1:n : numerical expression has 195 elements: only the first used

I checked the results, the files where there with the right names but the sequence of all of them was the first one. I changed the loop:

for(i in seq_len(length(genomes))){
   for(j in seq_len(length(names))){
      writeXStringSet(genomes[i],names[j])
   }
}

No error appeared, but now all files contain the last sequence of the StringSet.

I'm new using R and the Biostrings package. Is there a way I could fix this or do something else, so I can get all the files with their corresponding genome on it?

Thank you in advance!

Sample of the objects:

>genomes
DNAStringSet object of length 3:
    width seq                                           names
[1] 47 TATAAAACACCCTCAATTCAAGGGTTTAATTTTTCACAATCATTAAA HP83
[2] 47 TAAAACACCCTCAATTCAAGGGTTTCATTTTTTAAAACTATTAAATA HPS49
[3] 47 AAAAACCTTGTTTTAGTCTTTTTTATAGATTTCATGTTCAAGTCTTC P49

>names <- c("HP83.fasta","HPS49.fasta","P49.fasta")
dna stringset R genome • 233 views
ADD COMMENTlink modified 10 weeks ago by swbarnes29.6k • written 10 weeks ago by cechersa0

Please don't use all caps, it's bad etiquette. I've edited your post and made the necessary changes this time.

ADD REPLYlink written 10 weeks ago by Ram32k
2
gravatar for ATpoint
10 weeks ago by
ATpoint46k
ATpoint46k wrote:

I am not sure I can reproduce this, can you please make a little reproducible example on how your data look? For example type dput(genomes) and see whether you can provide the output if it is short enough to paste it here. So you have a single DNAStringSet, and this has multiple entries (each being a genome), and you want to save each to disk as fasta?

Here is a little example, I guess similar to what you did, probably I do not get the error, lease provide some reproducible example:

#/ Example data: BiocManager::install("drosophila2probe")
library(drosophila2probe)
library(Biostrings)
probes <- DNAStringSet(drosophila2probe)[1:3]
names(probes) <- paste0("genome_", seq(1,length(probes)))

#/ Access each genome by name:
for(s in names(probes)) writeXStringSet(probes[s], paste0(s, ".fasta"))
ADD COMMENTlink written 10 weeks ago by ATpoint46k

Thanks a lot! Yes, the real StringSet is one object with 195 entries, all the genomes are in one object, so I want to get each one out into a .fasta file. The object looks like the sample I posted and example data you wrote, just that has 195 entries and width of about 1.6Mb each.

But, thank you, this worked very well on a subset of my data, just tried it.

ADD REPLYlink written 10 weeks ago by cechersa0
0
gravatar for swbarnes2
10 weeks ago by
swbarnes29.6k
United States
swbarnes29.6k wrote:

For starters,

Are you completely sure you want this

n <- seq_len(length(names))
for(i in 1:n){

And not

n <- seq_len(length(names))
for(i in n){

or

n <- length(names)
for(i in 1:n){

seq_len() returns a list of numbers, length() returns just one number

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by swbarnes29.6k

Yes, in order to tell R to iterate a number of times you have to give it a sequence in which it will iterate in, for example to do:

> for(i in 1:4){
print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4

But, if you don't give a sequence of integers, you get this:

> for(i in 4){
print(i)
}
[1] 4

seq_length gets that integer sequence out of the length of an object, and I needed to use each of those values to tell writeXStringSet that was the element number in genomes I wanted to write as fasta.

ADD REPLYlink written 10 weeks ago by cechersa0

But are you giving it 1:4, or are you giving it 1:c(1:4)?

n <- seq_len(length(names))
n

Is n a single number, or a list of numbers?

ADD REPLYlink written 10 weeks ago by swbarnes29.6k

n is a number originally, but yes you could save the sequence as n as well

ADD REPLYlink written 10 weeks ago by cechersa0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour
_