Question: How to deal with high duplication rate in FASTQC ( de novo transcriptom assembly) ?
0
gravatar for doinelpierrot
5 weeks ago by
doinelpierrot0 wrote:

Hello everybody,

I am doing de novo transcriptom assembly of some alga. I judt did the FASTQC analysis and all my sample are "failed" regarding to the percentage of duplicate reads (= 70-80 % in all my files).

I have a total of 600M reads. Is it ok to just put my data to Trinity or is it better to do something about these duplicates reads ? I would say that these duplicates that are probably caused by PCR will just make the assembly run a bit longer, right ?

Thank you for your help :)

rna-seq assembly • 163 views
ADD COMMENTlink modified 24 days ago • written 5 weeks ago by doinelpierrot0

Thank you, I've finished my assembly and every indicator seem fine

ADD REPLYlink written 24 days ago by doinelpierrot0
3
gravatar for h.mon
5 weeks ago by
h.mon31k
Brazil
h.mon31k wrote:

RNAseq will almost always "fail" the duplication module from FastQC, because highly expressed genes will be flagged as duplicates - check the documentation of the Duplicate Sequences module. You can't be certain of artifact duplication without a more careful analysis - I have used dupRadar for this in the past.

Trinity performs digital normalization by default - see Trinity Insilico Normalization, so there is no need to remove duplicates.

ADD COMMENTlink written 5 weeks ago by h.mon31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 782 users visited in the last hour