Question

Tool:ORFanage: by-reference protein annotation and comparison for transcriptome assembly

0

Entering edit mode

23 hours ago

Ales ▴ 70

Shamelessly sharing an older tool ORFanage for ORF annotation - it might be useful to others who work with transcriptome assemblies and genome annotation.

While longest-ORF, most-upstream-ORF, or de novo prediction approaches often work well, they can sometimes miss biologically relevant isoforms, introduce errors or be inefficient for larger datasets. Our method solves these issues by selecting the most biologically consistent ORF for each transcript based on similarity to reference proteins, using an efficient interval-based algorithm.

In short, ORFanage:

Finds the most likely ORF for each transcript in a GTF/GFF file based on maximizing similarity to proteins in one or more reference annotations.
Quantifies frame shifts and other changes relative to the reference. Can also be used to perform exhaustive comparisons of annotated proteins between annotations.
Scales efficiently to very large datasets using an interval-based pseudo-alignment algorithm avoiding costly sequence comparisons for most cases..

Additionally, we have recently added a small utility method ORFcompare to perform all-vs-all comparisons of CDS records between multiple annotation sources

When applied to large RNA-seq assemblies, ORFanage can help identify relevant transcripts, novel proteins, filter out noise and help take raw assemblies several steps closer towards complete annotations. It can also highlight inconsistencies or possible corrections in reference annotations—something we observed when applying it to RefSeq and GENCODE human datasets.

ORFanage and ORFcompare are both available on GitHub: https://github.com/alevar/ORFanage

You can also read more in the published study: https://pmc.ncbi.nlm.nih.gov/articles/PMC10718564/

Hope the methods are useful and easy to use!

orf rna-seq assembly annotation transcriptome • 85 views

ADD COMMENT • link 23 hours ago by Ales ▴ 70