How to remove transcripts that have poor alignment scores in exonerate analysis
1
0
Entering edit mode
6.0 years ago
Ginsea Chen ▴ 130

Dear all

I am a new user of exonerate. I tried to map protein-evidences to whole genome assembly by using exonerate with protein2genome model. After protein-evidences mapping, I wanted to filter all obtained transcripts (exonerate output file) that have poor alignment scrores. In Liang et al article (Liang C, Mao L, Ware D, et al. Evidence-based gene predictions in plant genomes[J]. Genome research, 2009, 19(10): 1912-1923.), they generally use a sequence identity threshold of 90% for same-species alignment and of 30% (protein sequence similarity) for cross-species alignments, while I only found raw alignment score (such as 805) in output file of exonerate.

So I don't know how to filter my transcripts based on exonerate results. In other words, I can't find any sequence identity value (i.e 90%) in exonerate results. So I doubt that if there were some ways to transfer raw alignment score (i.e. 805) to sequence identity value (i.e. 90%).

Thanks all

genome alignment • 2.0k views
ADD COMMENT
1
Entering edit mode
6.0 years ago

From the manpage:

--ryo <format>
              Roll-your-own  output  format.  This allows specification of a printf-esque format line which is used
              to specify which information to include in the output, and how it is to be shown.  The  format  field
              may contain the following fields:

              %[qt][idlsSt]
                     For  either  {query,target},  report the {id,definition,length,sequence,Strand,type} Sequences
                     are reported in a fasta-format like block (no headers).
              %[qt]a[bels]
                     For   either   {query,target}   region   which   occurs   in   the   alignment,   report   the
                     {begin,end,length,sequence}
              %[qt]c[bels]
                     For  either {query,target} region which occurs in the coding sequence in the alignment, report
                     the {begin,end,length,sequence}
              %s     The raw score
              %r     The rank (in results from a bestn search)
              %m     Model name
     --->     %e[tism]
                     Equivalenced {total,id,similarity,mismatches} (ie. %em == (%et - %ei))
     --->     %p[is] Percent {id,similarity} over the equivalenced portions of the alignment.  (ie. %pi == 100*(%ei
                     / %et))
ADD COMMENT
0
Entering edit mode

I get it ! Thanks for your suggestions!

ADD REPLY
0
Entering edit mode

Thank you for pointing this out, I was looking for exactly that but was too lazy to go through the --ryo options...

ADD REPLY

Login before adding your answer.

Traffic: 2690 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6