Question: Exonerate GFF output is missing some transcripts
6 months ago
harry.smith wrote:


I have been converting IsoSeq long reads from a fasta file to GFF/GTF formats using exonerate. I have been using the default parameters and the est2genome option in client serve mode. Everything ran and worked great, but the final GFF is missing ~2000 of the original 17000 transcripts. I was just wondering if anyone has experience using exonerate, and can offer any reason why this might be happening. I imagine that these 2000 transcripts do not have matches based on the default thresholds, but I haven't been able to find any evidence of this in the exonerate user manual.

Thank you Harry

written 6 months ago by harry.smith

I assume you're mapping the isoseq reads to a certain genomic sequence with exonerate? Or are you 'predicting' genes on each of the isoseq sequences?

written 6 months ago by lieven.sterck
6 months ago
h.mon wrote:
-s | --score <threshold>
    This is the overall score threshold. Alignments will not be reported below this threshold. For heuristic alignments, the higher this threshold, the less time the analysis will take.

Check the alignments reported, you can then find the default threshold and run the remaining 2000 transcripts using a lower score.

edit: from

Applying score thresholds

    There are several ways to apply score threshold in exonerate - applying a sensible score threshold not only reduced the number of spurious alignments, but will also make the searches run faster due to the way that BSDP works.
    --score : a simple score threshold
    --percent : report alignment over a percentage of the maximum score attainable by each query
    --bestn : report the best N matches for each query.

    exonerate --score 500

    more to follow ...
modified 6 months ago • written 6 months ago by h.mon

Thank you . This is what I needed. Looks like the default threshold is a score of 100.

written 6 months ago by harry.smith
