Question: How to write pairwiseAlignment in R in Fasta Format with Headers?
0
gravatar for Yosi
3.6 years ago by
Yosi0
Germany
Yosi0 wrote:

Hi there,

I try to write an R-script with which I can align a bench of sequences in one file with a single sequence in another file.

So far, I'm really happy with the results, but I have one big problem. How to write the pairwiseAlignment output in fasta format WITH header names?

I did this (among others):

 

seq1 <- readDNAStringSet("file1.fasta", use.names=T)
seq2 <- readDNAStringSet("file2.fasta", use.names=T)

mat <- nucleotideSubstitutionMatrix(match = 1, mismatch = -3, baseOnly= F)

# here I will introduce a for-loop to align every sequence against the sequence in file2
globalAlign <- pairwiseAlignment(seq1[1], seq2[1], type='global-local', substitutionMatrix=mat, gapOpening=10, gapExtension=-5)

r = BStringSet( c(toString(subject(globalAlign)), toString(pattern(globalAlign))) )
writeXStringSet(r,"out1.txt")

 

But my "out1.txt"-outfile looks like this:
>
ATGCGATGCTAGCTGCATAGCTCGATCG
>
ATGCGAT---AGCTGCATAGCT---TCG

 

Has anyone of you an idea how to include the sequence names so that it will look like this:
>seq_name_1
ATGCGATGCTAGCTGCATAGCTCGATCG
>seq_name_2
ATGCGAT---AGCTGCATAGCT---TCG

 

Many thanks in advance!

alignment sequence R • 1.4k views
ADD COMMENTlink modified 3.5 years ago by Brice Sarver2.5k • written 3.6 years ago by Yosi0
0
gravatar for Brice Sarver
3.5 years ago by
Brice Sarver2.5k
United States
Brice Sarver2.5k wrote:

I tried this with two DNAString instances, a and b, with sequences "ACTG" and "GTCA." I then performed the global alignment the way you did, and I converted them to a BStringSet the same way you did. This is 'r.'

Objects of class XStringSet can have names assigned, but pairwiseAlignments cannot as far as I know. This makes sense; it's just a comparison, and you ought to know what you're comparing beforehand. This doesn't help when you want to write out the files. You can add the names this way:

> r
  A BStringSet instance of length 2
    width seq
[1]     4 GTCA
[2]     4 ACTG
> names(r)
NULL
> names(r) <- c("a", "b")
> r
  A BStringSet instance of length 2
    width seq                                               names
[1]     4 GTCA                                              a
[2]     4 ACTG                                              b​
ADD COMMENTlink written 3.5 years ago by Brice Sarver2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2190 users visited in the last hour