Forum: How to know that our trinity denovo assembly is good or not?
0
gravatar for ashokkumar.mb
4 months ago by
ashokkumar.mb0 wrote:

I have used Trinity 2.8.4 to do denovo assembly of my plant RNA seq data. Now I have finished assembly, but how to know that this denovo assembly is really good and what are all the stats I should consider?

N50 value is 2865
Total trinity 'genes':72392
Total trinity transcripts:  146848
Median contig length: 1071
Average contig: 1664.26
Total assembled bases: 244393103

Is this looks good?

forum assembly • 306 views
ADD COMMENTlink modified 4 months ago by Vijay Lakhujani4.0k • written 4 months ago by ashokkumar.mb0

copy/pasted from : https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment

"You can see that the Nx values based on the single longest isoform per gene are lower than the Nx stats based on all assembled contigs, as expected, and even though the Nx statistic is really not a reliable indicator of the quality of a transcriptome assembly, the Nx value based on using the longest isoform per gene is perhaps better for reasons described above". (emphasis is mine)

ADD REPLYlink modified 4 months ago • written 4 months ago by cpad011211k
0
gravatar for toralmanvar
4 months ago by
toralmanvar750
toralmanvar750 wrote:

You should not consider N50 while selecting RNASeq assembly as it is not the proper parameter to check in case of RNAseq. You can do assembly with some other assembler also and compare their statistics. But to get a better idea, you should compare at annotation level also.

ADD COMMENTlink written 4 months ago by toralmanvar750
0
gravatar for Vijay Lakhujani
4 months ago by
Vijay Lakhujani4.0k
India
Vijay Lakhujani4.0k wrote:

Using the normal N50 metric for transcriptome assemblies can be highly misleading, as transcriptomes do not strive to achieve long contig lengths and high N50, but instead one contig for each transcript. Furthermore, the most highly expressed transcripts do not necessarily constitute the longest ones and the majority of transcripts in a transcriptome assembly will normally have relatively low expression levels. Check out this discussion on biostars.

Is it true that N50 is not an important parameter for quality in Transcriptome Assembly?

The N50 values can often be exaggerated due to an assembly program generating too many transcript isoforms, especially for the longer transcripts. To mitigate this effect, Trinity assembler also compute the Nx values based on using only the single longest isoform per 'gene':

  ## Stats based on ONLY LONGEST ISOFORM per 'GENE':


Contig N10: 3685
Contig N20: 1718
Contig N30: 909
Contig N40: 588
Contig N50: 439

Go though this paper for methods to evaluate transcriptome assembly

My take is that even for genome assemblies, N50 should be taken with a pinch of salt, as it can mislead the assembly evaluation. If you want to learn more, check out this blog post

Why is N50 used as an assembly metric (and what's the deal with NG50)?

ADD COMMENTlink modified 4 months ago • written 4 months ago by Vijay Lakhujani4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1310 users visited in the last hour