Question

Exonerate GFF output is missing some transcripts

0

Entering edit mode

5.6 years ago

harry.smith • 0

Hello,

I have been converting IsoSeq long reads from a fasta file to GFF/GTF formats using exonerate. I have been using the default parameters and the est2genome option in client serve mode. Everything ran and worked great, but the final GFF is missing ~2000 of the original 17000 transcripts. I was just wondering if anyone has experience using exonerate, and can offer any reason why this might be happening. I imagine that these 2000 transcripts do not have matches based on the default thresholds, but I haven't been able to find any evidence of this in the exonerate user manual.

Thank you Harry

RNA-Seq Assembly alignment exonerate • 1.4k views

ADD COMMENT • link 5.6 years ago by harry.smith • 0

0

Entering edit mode

I assume you're mapping the isoseq reads to a certain genomic sequence with exonerate? Or are you 'predicting' genes on each of the isoseq sequences?

ADD REPLY • link 5.6 years ago by lieven.sterck 15k

score 1 · Answer 1 · 2018-09-07

-s | --score <threshold>
    This is the overall score threshold. Alignments will not be reported below this threshold. For heuristic alignments, the higher this threshold, the less time the analysis will take.

Check the alignments reported, you can then find the default threshold and run the remaining 2000 transcripts using a lower score.

edit: from https://www.animalgenome.org/bioinfo/resources/manuals/exonerate/advanced.html#score-thresholds

Applying score thresholds

    There are several ways to apply score threshold in exonerate - applying a sensible score threshold not only reduced the number of spurious alignments, but will also make the searches run faster due to the way that BSDP works.
    --score : a simple score threshold
    --percent : report alignment over a percentage of the maximum score attainable by each query
    --bestn : report the best N matches for each query.

    exonerate --score 500

    more to follow ...