Question: How to deal with high duplication rate in FASTQC ( de novo transcriptom assembly) ?
gravatar for doinelpierrot
4 months ago by
doinelpierrot10 wrote:

Hello everybody,

I am doing de novo transcriptom assembly of some alga. I judt did the FASTQC analysis and all my sample are "failed" regarding to the percentage of duplicate reads (= 70-80 % in all my files).

I have a total of 600M reads. Is it ok to just put my data to Trinity or is it better to do something about these duplicates reads ? I would say that these duplicates that are probably caused by PCR will just make the assembly run a bit longer, right ?

Thank you for your help :)

rna-seq assembly • 217 views
ADD COMMENTlink modified 3 months ago • written 4 months ago by doinelpierrot10

Thank you, I've finished my assembly and every indicator seem fine

ADD REPLYlink written 3 months ago by doinelpierrot10
gravatar for h.mon
4 months ago by
h.mon32k wrote:

RNAseq will almost always "fail" the duplication module from FastQC, because highly expressed genes will be flagged as duplicates - check the documentation of the Duplicate Sequences module. You can't be certain of artifact duplication without a more careful analysis - I have used dupRadar for this in the past.

Trinity performs digital normalization by default - see Trinity Insilico Normalization, so there is no need to remove duplicates.

ADD COMMENTlink written 4 months ago by h.mon32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2078 users visited in the last hour