Question: 3 millions de novo RNA mammalian contigs. Too many?
gravatar for vlad1
2.2 years ago by
United States
vlad10 wrote:

I wonder if somebody saw similar number of contigs? Do you consider them as real splicing variants or assembling errors? The Trinity assembly was made from HiSeq 2x150 paired-end reads, ~110 mammalian brain samples. Totally ~ 6.3 total billions read pairs, 1,907,297 Mbases. I can't say how many reads were discarded after trimming. But the mean read quality doesn't seem unusual: % of >= Q30 Bases: 90.88; Quality Score: 37.96 Trinity parameters were default, i.e. included " --max_cov 50" Here are Trinity assembly metrics:

n_seqs  3236542
smallest    201
largest 20360
n_bases 2340786001
mean_len    723.23671
n_under_200 0
n_over_1k   642187
n_over_10k  609
n_with_orf  222378
mean_orf_percent    34.14792
n90 291
n70 594
n50 1136
n30 1954
n10 3685
gc  0.45103
bases_n 0
proportion_n    0

Thanks, Vlad

rna-seq next-gen assembly • 621 views
ADD COMMENTlink modified 2.2 years ago by genomax83k • written 2.2 years ago by vlad10

Trinity FAQ #1. Have been asked time and time again. That said, 3 million contigs is really a lot, it is a lot more than the "a lot" I have usually observed - in the range of 100-500 thousands. I've found the ExN50 to be really useful, particularly this part:

If you want to know, how many transcripts correspond to the Ex 90 peak, you could:

cat transcripts.TMM.EXPR.matrix.E-inputs |  egrep -v ^\# | awk '$1 <= 90' | wc -l
ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by h.mon29k
gravatar for colindaven
2.2 years ago by
Hannover Medical School
colindaven2.2k wrote:

Map to the genome with GMAP (or of late minimap2), for GMAP choose GFF3 output. Then visualize in your favourite genome browser. I am sure there is a lot of absolute rubbish in there, particularly partial transcripts, so compare by locus to the Gencode transcript sets for example.

ADD COMMENTlink written 2.2 years ago by colindaven2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1845 users visited in the last hour