Question: How to align DNA reads against a database of protein references
gravatar for bioinfo
4.6 years ago by
bioinfo740 wrote:

I was wondering whether bowtie, BWA etc. can map nucleotide reads to protein reference database? or they are just simply DNA aligners? I found one called PAUDA that possibly could be useful but have anyone of you used that before? 

bwa bowtie alignment • 3.9k views
ADD COMMENTlink modified 4.6 years ago by Len Trigg1.3k • written 4.6 years ago by bioinfo740
gravatar for thackl
4.6 years ago by
thackl2.7k wrote:

Bowtie2, BWA etc. only do DNA-DNA. I don't know about PAUDA, but from the doc, it sounds reasonable.

Update: I thought about reverse translation a bit more and like to revise my original statement - probably not a good idea ;)

(My idea would be to convert protein to pseudo transcripts by translating them to DNA and then try a standard mapper. But of course, there are ambiguity issues regarding the genetic code. Still, a sensitive mapper, for example bwa mem, could work)

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by thackl2.7k

I wrote a tool for this purpose - TranslateSixFrames.  It translates back and forth between amino acids and nucleotides.  Theoretically, the way you would use it in order to do mapping with a nucleotide aligner is:

Translate the reads to proteins in all six frames.

Translate the aa-encoded reads back to nucleotides, selecting one canonical codon per nucleic acid (TranslateSixFrames does this automatically for aa->nt translation).  So, for each initial read, you end up with 6 nucleotide reads.

Translate the proteins to nt-space.

Finally, map the double-translated reads to the translated proteins, and select the best mapping of each of the six read frames.

Theoretically...   this should work fine, at least for RNA-seq reads.  For DNA reads, most of them will be intronic, but the coding ones should still generally be OK.  I wrote this with the intention of integrating it into BBMap to do this automatically, but I have not had time.  I might do it in the future.  You can still follow this workflow using as a standalone tool.


ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Brian Bushnell17k

I very much like the nt*->aa->nt + aa->nt idea to get consistent codon usage.

ADD REPLYlink written 4.6 years ago by thackl2.7k
gravatar for Len Trigg
4.6 years ago by
Len Trigg1.3k
New Zealand
Len Trigg1.3k wrote:

The RTG metagenomics tools include a command called mapx which is analogous to (but orders of magnitude faster than) blastx, which we developed for use on the HMP project. It internally translates the DNA reads into amino acids on the possible frames and performs protein alignment against your protein database (including support for protein scoring matrices such as blosum, which your alternative approach would not permit).

ADD COMMENTlink written 4.6 years ago by Len Trigg1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 579 users visited in the last hour