Question: Extreme normalization using Trinity related to poor mapping efficiency?
gravatar for wasphunter
3.8 years ago by
wasphunter0 wrote:


QUESTION: Can the combination of high levels of pre-mRNA and extreme normalization explain a poor mapping rate?

ISSUE: I am getting poor mapping efficiency (~64%) from my RNAseq reads against a Trinity assembly derived from a normalized subset of those reads. There is little evidence of DNA contamination but substantial evidence of what appears to be pre-mRNA (i.e., I see lots of intron/exon and exon/intron fragments in the assembled contigs and the handful of unaligned read pairs I analyzed manually are mapping to genes but not to Trinity-formed contigs derived from the mRNA from these genes). FastQC analysis suggests substantial sequence duplication. The normalization process in Trinity resulted in only ~10% of the reads being used for the assembly.

BACKGROUND: I have 2x125 PE RNAseq read sets (~80million pairs/set) from my study organism (Genome ~500 MB). FastQC analyses indicate phred>30 for almost 100% of the read length AND a substantial amount of sequence duplication in the data. Trinity normalized each library to ~10% (!) and a little more once my two sets of data were combined (I normalized by set and again after two sets were combined to generate the assembly). The resulting Trinity assembly has ~ 146430 transcripts (67000 'genes').

IDEA: Should I attempt to generate an assembly without normalizing the data to see if I can improve mapping rate and trust that most of the transcripts produced will be poorly supported and so be dropped in later analyses? This is bound to gobble up memory. Any suggestions would be appreciated.

ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 3.8 years ago by wasphunter0

Did you do any QC on your raw reads prior to assembly?

ADD REPLYlink written 3.8 years ago by

I only did the FastQC analysis reported above in BACKGROUND. I found evidence for adapter contamination but was readily able to remove that using Trimmomatic.

What else would you recommend?

ADD REPLYlink written 3.8 years ago by wasphunter0

Did you check for ribosomal RNA contamination?

ADD REPLYlink written 3.8 years ago by h.mon32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1488 users visited in the last hour