Question: How would sequence overrepresentation and duplication level affect the quality of a De Novo transcriptome assembly?
gravatar for guillermo.ponz.segrelles
5.1 years ago by

Hello everyone,

I am trying to prepare two files containing several millions of illumina RNA pair-end reads for a De Novo assembly using Trinity, and, as I posted the other day I have some doubts about how to prepare the datasets in order to obtain the best transcriptome assembly.

In this case my doubt is about haw would affect the assembly the overrepresentation of some sequences. My datasets have a deep coverage and, as a result, I have a great overrepresentation of some (non-artifact) sequences (some of them representing up to the 0.2% of the total number of sequences) and a huge level of sequence duplication (73% aprox.). Are this parameters important for the quality of the assembly? How can I solve this if it is important? Should I normalized the datasets before performing the assembly?

I would be very grateful if someone can help me with this (at least for me) puzzling issue.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by guillermo.ponz.segrelles0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1095 users visited in the last hour