Question: How to deal with high duplication rate in FASTQC ( de novo transcriptom assembly) ?
gravatar for doinelpierrot
5 weeks ago by
doinelpierrot0 wrote:

Hello everybody,

I am doing de novo transcriptom assembly of some alga. I judt did the FASTQC analysis and all my sample are "failed" regarding to the percentage of duplicate reads (= 70-80 % in all my files).

I have a total of 600M reads. Is it ok to just put my data to Trinity or is it better to do something about these duplicates reads ? I would say that these duplicates that are probably caused by PCR will just make the assembly run a bit longer, right ?

Thank you for your help :)

rna-seq assembly • 163 views
ADD COMMENTlink modified 24 days ago • written 5 weeks ago by doinelpierrot0

Thank you, I've finished my assembly and every indicator seem fine

ADD REPLYlink written 24 days ago by doinelpierrot0
gravatar for h.mon
5 weeks ago by
h.mon31k wrote:

RNAseq will almost always "fail" the duplication module from FastQC, because highly expressed genes will be flagged as duplicates - check the documentation of the Duplicate Sequences module. You can't be certain of artifact duplication without a more careful analysis - I have used dupRadar for this in the past.

Trinity performs digital normalization by default - see Trinity Insilico Normalization, so there is no need to remove duplicates.

ADD COMMENTlink written 5 weeks ago by h.mon31k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 782 users visited in the last hour