Question

Understanding de-novo transcriptome assembly quality using Transrate

0

Entering edit mode

6.8 years ago

nayaraevelin2 • 0

Hi,

I'm working with RNA-seq data and I'm trying to analise de-novo transcriptome assembly quality of mouse heart transcriptome using Transrate. I got the following results:

Contig metrics

n_seqs (the number of contigs in the assembly): 37.921

smallest (the size of the smallest contig): 201

largest (the size of the largest contig):11.914

n_bases (the number of bases included in the assembly): 27.936.062

mean_len (the mean length of the contigs):736.69

n under 200 (the number of contigs shorter than 200 bases): 0

n over 1k (the number of contigs greater than 1,000 bases long): 8.060

n over 10k (the number of contigs greater than 10,000 bases long): 3

n with orf (the number of contigs that had an open reading frame): 10.925

mean orf percent (for contigs with an ORF, the mean % of the contig covered by the ORF): 63.41

N90: 286

N70: 630

N50: 1.226

N30: 2.055

N10: 3.580

gc (% of bases that are G or C): 0.49

bases n (the number of bases that are N): 0

proportion n (the proportion of bases that are N): 0,0

Read mapping metrics

fragments (the number of read pairs provided): 12.282.839

fragments mapped (the total number of read pairs mapping): 7.036.174 (57%)

good mappings (the number of read pairs mapping in a way indicative of good assembly): 6272741 (51%)

bad mappings (the number and proportion of reads pairs mapping in a way indicative of bad assembly): 763.433

potential bridges (the number of potential links between contigs that are supported by the reads): 0

bases uncovered (the number of bases that are not covered by any reads): 7.612.386 (27%)

contigs uncovbase (the number of contigs that contain at least one base with no read coverage): 16814 (44%)

contigs uncovered (the number of contigs that have a mean per-base read coverage of < 1): 37.921

p_contigs_uncovered (the proportion of contigs that have a mean per-base read coverage of < 1): 1.0

contigs_lowcovered (the number of contigs that have a mean per-base read coverage of < 10): 37921

p_contigs_lowcovered (the proportion of contigs that have a mean per-base read coverage of < 10): 1.0

contigs_segmented (the number of contigs that have >=50% estimated chance of being segmented): 2754 (7%)

TRANSRATE ASSEMBLY SCORE: 0.1676

TRANSRATE OPTIMAL SCORE: 0.2625

TRANSRATE OPTIMAL CUTOFF: 0.1275

I would like to know:

Are these values of quality good or bad?
Based on your experience, which values should I consider most? Is there a protocol or best practices to evaluate de novo transcriptome assembly?
Do you suggest another software besides Transrate?

Thank you!

RNA-Seq Assembly • 3.0k views

ADD COMMENT • link updated 6.8 years ago by Carlos Caicedo ▴ 210 • written 6.8 years ago by nayaraevelin2 • 0

score 2 · Answer 1 · 2017-06-27

In THIS page the developers of transrate explain the significance of the metrics obtained with the software and the ideal value for each metrics.

Personally, I think that you TRANSRATE ASSEMBLY SCORE is very low and also the fragments mapped (That means, only 57 % of the reads are being used to assemble the transcriptome)

You may try these options: tuning the parameters of you assembler, to remove the redundancy of the transcriptome, to filter out the chimeras (See the s(Cseg) value in the contigs.csv file that Tranrate deliver you, if the value is very low the transcipts could be a chimera), check the quality of the reads and the adapter content before the assembly (I think it was already done) because in de novo assembly is very important to remove the adapter in order to avoid the inclusion of this sequences in the assembled transcripts.

Regards.