Question: How to write pairwiseAlignment in R in Fasta Format with Headers?
0
gravatar for Yosi
4.2 years ago by
Yosi0
Germany
Yosi0 wrote:

Hi there,

I try to write an R-script with which I can align a bench of sequences in one file with a single sequence in another file.

So far, I'm really happy with the results, but I have one big problem. How to write the pairwiseAlignment output in fasta format WITH header names?

I did this (among others):

 

seq1 <- readDNAStringSet("file1.fasta", use.names=T)
seq2 <- readDNAStringSet("file2.fasta", use.names=T)

mat <- nucleotideSubstitutionMatrix(match = 1, mismatch = -3, baseOnly= F)

# here I will introduce a for-loop to align every sequence against the sequence in file2
globalAlign <- pairwiseAlignment(seq1[1], seq2[1], type='global-local', substitutionMatrix=mat, gapOpening=10, gapExtension=-5)

r = BStringSet( c(toString(subject(globalAlign)), toString(pattern(globalAlign))) )
writeXStringSet(r,"out1.txt")

 

But my "out1.txt"-outfile looks like this:
>
ATGCGATGCTAGCTGCATAGCTCGATCG
>
ATGCGAT---AGCTGCATAGCT---TCG

 

Has anyone of you an idea how to include the sequence names so that it will look like this:
>seq_name_1
ATGCGATGCTAGCTGCATAGCTCGATCG
>seq_name_2
ATGCGAT---AGCTGCATAGCT---TCG

 

Many thanks in advance!

alignment sequence R • 1.7k views
ADD COMMENTlink modified 4.2 years ago by Brice Sarver3.2k • written 4.2 years ago by Yosi0
0
gravatar for Brice Sarver
4.2 years ago by
Brice Sarver3.2k
United States
Brice Sarver3.2k wrote:

I tried this with two DNAString instances, a and b, with sequences "ACTG" and "GTCA". I then performed the global alignment the way you did, and I converted them to a BStringSet the same way you did. This is 'r'.

Objects of class XStringSet can have names assigned, but pairwiseAlignments cannot as far as I know. This makes sense; it's just a comparison, and you ought to know what you're comparing beforehand. This doesn't help when you want to write out the files. You can add the names this way:

> r
  A BStringSet instance of length 2
    width seq
[1]     4 GTCA
[2]     4 ACTG
> names(r)
NULL
> names(r) <- c("a", "b")
> r
  A BStringSet instance of length 2
    width seq                                               names
[1]     4 GTCA                                              a
[2]     4 ACTG                                              b​
ADD COMMENTlink modified 3 days ago by RamRS24k • written 4.2 years ago by Brice Sarver3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 704 users visited in the last hour