Question: Annotation with de novo or genome guided transcriptome assembly
gravatar for EarlyEvol
12 months ago by
University of Arizona
EarlyEvol0 wrote:

Hi all,

This might be pretty inconsequential in the end but should I use a de novo or genome guided transcriptome assembly to feed into an annotation pipeline (funannotate)? It seems to me like the trade-off is accuracy vs independence of evidence. Genome guided might be more accurate, but a little redundant because RNA-seq reads are mapped to the genome to create this assembly and mapping is used directly as evidence too. De novo assembly is more error prone yet is completely independent of genome structure.

One thing that might change the answer is that i'm really interested in gene paralogs, which Trinity's genome guided approach is (reportedly) better at identifying.

This probably falls into the category of over optimization, but I would like to get someone else's take on it for sanity (and knowledge).

Thanks, Earl

ADD COMMENTlink modified 12 months ago by h.mon28k • written 12 months ago by EarlyEvol0
gravatar for h.mon
12 months ago by
h.mon28k wrote:

If you have RNAseq data and pass it to funannotate predict, it will do genome-guided assembly for you (and parse results, etc). From the manual:

funannotate train

In order to use this script you will need RNA-seq data from the genome you are annotating, if you don't have RNA-seq data then funannotate predict will train Augustus during runtime. This script is a wrapper for genome-guided Trinity RNA-seq assembly followed by PASA assembly. These methods will generate the input data to funannotate predict, i.e. coord-sorted BAM alignments, trinity transcripts, and high quality PASA GFF3 annotation. This script unfortunately has lots of dependencies that include Hisat2, Trinity, Samtools, Fasta, GMAP, Blat, MySQL, PASA, and RapMap. The $PASAHOME and $TRINITYHOME environmental variables need to be set or passed at runtime.

Thankfully, MySQL is not needed, funannotate can use SQLite.

My experience is funannotate will find more genes with the genome-guided assembly, compared to de novo assembly and mapping of the transcripts.

ADD COMMENTlink written 12 months ago by h.mon28k

Thanks for your reply.

Dang! I should have read the manual better. I guess it the "train" command does a GG assembly, that is the recommended method.

I ran Triniity and PASA separately, feeding a Trinity de novo transcript set to PASA. Then fed the the pasa gff along with the original Trinity assembly and an RNA-seq BAM to funannotate predict. This seemed to work reasonably well.

I have a genome guided assembly I could use and I'll see what differences show up.

ADD REPLYlink written 11 months ago by EarlyEvol0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1078 users visited in the last hour