Question: How to remove transcripts that have poor alignment scores in exonerate analysis
gravatar for Ginsea Chen
4.0 years ago by
Ginsea Chen130
Chinese Academy of Tropical Agricultural Sciences, Danzhou, China
Ginsea Chen130 wrote:

Dear all

I am a new user of exonerate. I tried to map protein-evidences to whole genome assembly by using exonerate with protein2genome model. After protein-evidences mapping, I wanted to filter all obtained transcripts (exonerate output file) that have poor alignment scrores. In Liang et al article (Liang C, Mao L, Ware D, et al. Evidence-based gene predictions in plant genomes[J]. Genome research, 2009, 19(10): 1912-1923.), they generally use a sequence identity threshold of 90% for same-species alignment and of 30% (protein sequence similarity) for cross-species alignments, while I only found raw alignment score (such as 805) in output file of exonerate.

So I don't know how to filter my transcripts based on exonerate results. In other words, I can't find any sequence identity value (i.e 90%) in exonerate results. So I doubt that if there were some ways to transfer raw alignment score (i.e. 805) to sequence identity value (i.e. 90%).

Thanks all

alignment genome • 1.4k views
ADD COMMENTlink modified 4.0 years ago by Michael Dondrup46k • written 4.0 years ago by Ginsea Chen130
gravatar for Michael Dondrup
4.0 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

From the manpage:

--ryo <format>
              Roll-your-own  output  format.  This allows specification of a printf-esque format line which is used
              to specify which information to include in the output, and how it is to be shown.  The  format  field
              may contain the following fields:

                     For  either  {query,target},  report the {id,definition,length,sequence,Strand,type} Sequences
                     are reported in a fasta-format like block (no headers).
                     For   either   {query,target}   region   which   occurs   in   the   alignment,   report   the
                     For  either {query,target} region which occurs in the coding sequence in the alignment, report
                     the {begin,end,length,sequence}
              %s     The raw score
              %r     The rank (in results from a bestn search)
              %m     Model name
                     Equivalenced {total,id,similarity,mismatches} (ie. %em == (%et - %ei))
              %p[is] Percent {id,similarity} over the equivalenced portions of the alignment.  (ie. %pi == 100*(%ei
                     / %et))

ADD COMMENTlink written 4.0 years ago by Michael Dondrup46k

I get it ! Thanks for your suggestions!

ADD REPLYlink written 3.9 years ago by Ginsea Chen130

Thank you for pointing this out, I was looking for exactly that but was too lazy to go through the --ryo options...

ADD REPLYlink written 2.8 years ago by cschu1811.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2626 users visited in the last hour