Question: 3 millions de novo RNA mammalian contigs. Too many?
gravatar for vlad1
3 months ago by
United States
vlad10 wrote:

I wonder if somebody saw similar number of contigs? Do you consider them as real splicing variants or assembling errors? The Trinity assembly was made from HiSeq 2x150 paired-end reads, ~110 mammalian brain samples. Totally ~ 6.3 total billions read pairs, 1,907,297 Mbases. I can't say how many reads were discarded after trimming. But the mean read quality doesn't seem unusual: % of >= Q30 Bases: 90.88; Quality Score: 37.96 Trinity parameters were default, i.e. included " --max_cov 50" Here are Trinity assembly metrics:

n_seqs  3236542
smallest    201
largest 20360
n_bases 2340786001
mean_len    723.23671
n_under_200 0
n_over_1k   642187
n_over_10k  609
n_with_orf  222378
mean_orf_percent    34.14792
n90 291
n70 594
n50 1136
n30 1954
n10 3685
gc  0.45103
bases_n 0
proportion_n    0

Thanks, Vlad

rna-seq next-gen assembly • 217 views
ADD COMMENTlink modified 3 months ago by genomax49k • written 3 months ago by vlad10

Trinity FAQ #1. Have been asked time and time again. That said, 3 million contigs is really a lot, it is a lot more than the "a lot" I have usually observed - in the range of 100-500 thousands. I've found the ExN50 to be really useful, particularly this part:

If you want to know, how many transcripts correspond to the Ex 90 peak, you could:

cat transcripts.TMM.EXPR.matrix.E-inputs |  egrep -v ^\# | awk '$1 <= 90' | wc -l
ADD REPLYlink modified 3 months ago • written 3 months ago by h.mon15k
gravatar for colindaven
3 months ago by
Hannover Medical School
colindaven690 wrote:

Map to the genome with GMAP (or of late minimap2), for GMAP choose GFF3 output. Then visualize in your favourite genome browser. I am sure there is a lot of absolute rubbish in there, particularly partial transcripts, so compare by locus to the Gencode transcript sets for example.

ADD COMMENTlink written 3 months ago by colindaven690
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1600 users visited in the last hour