Question

assessing de novo Trinity assembly quality

2

Entering edit mode

7.8 years ago

trish0401 ▴ 20

I used Trinity to make a de novo transcriptome assembly for 1 human sample (I will be using more samples later, this is just the test run). I want to assess the quality of the transcriptome. I've aligned the reads to the transcriptome using Bowtie and have an alignment of ~80%. Does 80% alignment imply that the transcriptome assembly was successful?

I wanted to also align my transcriptome (the Trinity.fasta output file) to the human reference genome, but the keep receiving errors related to the format of the Trinity.fasta file. Does anyone have any advice as to how to alter the Trinity.fasta file to make it compatible with Bowtie/Bowtie2? Or is there a different output file (in trinity_out_dir) that would be more appropriate?

I am very new at this - any help would be greatly appreciated! -Trish

If it is helpful, I have already run the TrinityStats.pl:

Counts of transcripts, etc.

Total trinity 'genes':  109,215
Total trinity transcripts:  115,144
Percent GC: 45.72


Stats based on ALL transcript contigs:


    Contig N10: 2666
    Contig N20: 1782
    Contig N30: 1251
    Contig N40: 883
    Contig N50: 639

    Median contig length: 321
    Average contig: 518.17
    Total assembled bases: 59,664,141

 Stats based on ONLY LONGEST ISOFORM per 'GENE':

    Contig N10: 2366
    Contig N20: 1548
    Contig N30: 1082
    Contig N40: 772
    Contig N50: 570

    Median contig length: 315
    Average contig: 488.92
    Total assembled bases: 53,397,469

RNA-Seq de novo transcriptome assembly • 3.4k views

ADD COMMENT • link updated 6.0 years ago by Biostar 20 • written 7.8 years ago by trish0401 ▴ 20

score 2 · Answer 1 · 2016-07-12

It really depends on the quality of your reads whether 80% is a good number - if you have many very short reads, or reads of a low to average quality, then even with a perfect genome you'll only see 70-90% align. Software like FASTQC can tell you of the quality of your reads. I'd say 80% is a reasonable number, and the Trinity wiki agrees: "A typical Trinity transcriptome assembly will have the vast majority of all reads mapping back to the assembly, and ~70-80% of the mapped fragments found mapped as proper pairs." Furthermore, by default bowtie1 uses an insert size of 0 to 250, so if your read library is partially outside that insert size you may not see those reads align.

As for quality assessment of the transcriptome assembly, do you have a similar number of predicted genes from the Trinity contigs compared to a closely related species? You can predict proteins and their function from Trinity output using the Trinotate pipeline: http://trinotate.github.io/

The Trinity wiki lists a few more tools for quality assessment of Trinity assemblies: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment

And there's also this recent paper with another method: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4652379/