Question: How to remove transcripts that have poor alignment scores in exonerate analysis
3.4 years ago by
Ginsea Chen120
Chinese Academy of Tropical Agricultural Sciences, Danzhou, China
Ginsea Chen120 wrote:

Dear all

I am a new user of exonerate. I tried to map protein-evidences to whole genome assembly by using exonerate with protein2genome model. After protein-evidences mapping, I wanted to filter all obtained transcripts (exonerate output file) that have poor alignment scrores. In Liang et al article (Liang C, Mao L, Ware D, et al. Evidence-based gene predictions in plant genomes[J]. Genome research, 2009, 19(10): 1912-1923.), they generally use a sequence identity threshold of 90% for same-species alignment and of 30% (protein sequence similarity) for cross-species alignments, while I only found raw alignment score (such as 805) in output file of exonerate.

So I don't know how to filter my transcripts based on exonerate results. In other words, I can't find any sequence identity value (i.e 90%) in exonerate results. So I doubt that if there were some ways to transfer raw alignment score (i.e. 805) to sequence identity value (i.e. 90%).

Thanks all

alignment genome • 1.3k views
written 3.4 years ago by Ginsea Chen120
3.4 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

From the manpage:

--ryo <format>
              Roll-your-own  output  format.  This allows specification of a printf-esque format line which is used
              to specify which information to include in the output, and how it is to be shown.  The  format  field
              may contain the following fields:

                     For  either  {query,target},  report the {id,definition,length,sequence,Strand,type} Sequences
                     are reported in a fasta-format like block (no headers).
                     For   either   {query,target}   region   which   occurs   in   the   alignment,   report   the
                     For  either {query,target} region which occurs in the coding sequence in the alignment, report
                     the {begin,end,length,sequence}
              %s     The raw score
              %r     The rank (in results from a bestn search)
              %m     Model name
                     Equivalenced {total,id,similarity,mismatches} (ie. %em == (%et - %ei))
              %p[is] Percent {id,similarity} over the equivalenced portions of the alignment.  (ie. %pi == 100*(%ei
                     / %et))

written 3.4 years ago by Michael Dondrup45k

I get it ! Thanks for your suggestions!

written 3.4 years ago by Ginsea Chen120

Thank you for pointing this out, I was looking for exactly that but was too lazy to go through the --ryo options...

written 2.2 years ago by cschu1811.5k
