Question: 3 millions de novo RNA mammalian contigs. Too many?
gravatar for vlad1
9 months ago by
United States
vlad10 wrote:

I wonder if somebody saw similar number of contigs? Do you consider them as real splicing variants or assembling errors? The Trinity assembly was made from HiSeq 2x150 paired-end reads, ~110 mammalian brain samples. Totally ~ 6.3 total billions read pairs, 1,907,297 Mbases. I can't say how many reads were discarded after trimming. But the mean read quality doesn't seem unusual: % of >= Q30 Bases: 90.88; Quality Score: 37.96 Trinity parameters were default, i.e. included " --max_cov 50" Here are Trinity assembly metrics:

n_seqs  3236542
smallest    201
largest 20360
n_bases 2340786001
mean_len    723.23671
n_under_200 0
n_over_1k   642187
n_over_10k  609
n_with_orf  222378
mean_orf_percent    34.14792
n90 291
n70 594
n50 1136
n30 1954
n10 3685
gc  0.45103
bases_n 0
proportion_n    0

Thanks, Vlad

rna-seq next-gen assembly • 335 views
ADD COMMENTlink modified 9 months ago by genomax59k • written 9 months ago by vlad10

Trinity FAQ #1. Have been asked time and time again. That said, 3 million contigs is really a lot, it is a lot more than the "a lot" I have usually observed - in the range of 100-500 thousands. I've found the ExN50 to be really useful, particularly this part:

If you want to know, how many transcripts correspond to the Ex 90 peak, you could:

cat transcripts.TMM.EXPR.matrix.E-inputs |  egrep -v ^\# | awk '$1 <= 90' | wc -l
ADD REPLYlink modified 9 months ago • written 9 months ago by h.mon21k
gravatar for colindaven
9 months ago by
Hannover Medical School
colindaven840 wrote:

Map to the genome with GMAP (or of late minimap2), for GMAP choose GFF3 output. Then visualize in your favourite genome browser. I am sure there is a lot of absolute rubbish in there, particularly partial transcripts, so compare by locus to the Gencode transcript sets for example.

ADD COMMENTlink written 9 months ago by colindaven840
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 762 users visited in the last hour