How would sequence overrepresentation and duplication level affect the quality of a De Novo transcriptome assembly?
Entering edit mode
9.0 years ago

Hello everyone,

I am trying to prepare two files containing several millions of illumina RNA pair-end reads for a De Novo assembly using Trinity, and, as I posted the other day I have some doubts about how to prepare the datasets in order to obtain the best transcriptome assembly.

In this case my doubt is about haw would affect the assembly the overrepresentation of some sequences. My datasets have a deep coverage and, as a result, I have a great overrepresentation of some (non-artifact) sequences (some of them representing up to the 0.2% of the total number of sequences) and a huge level of sequence duplication (73% aprox.). Are this parameters important for the quality of the assembly? How can I solve this if it is important? Should I normalized the datasets before performing the assembly?

I would be very grateful if someone can help me with this (at least for me) puzzling issue.

next-gen Assembly Trinity RNA-Seq • 2.3k views
Entering edit mode

Login before adding your answer.

Traffic: 1961 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6