Question: How to deal with high duplication rate in FASTQC ( de novo transcriptom assembly) ?
0
gravatar for doinelpierrot
4 months ago by
doinelpierrot10 wrote:

Hello everybody,

I am doing de novo transcriptom assembly of some alga. I judt did the FASTQC analysis and all my sample are "failed" regarding to the percentage of duplicate reads (= 70-80 % in all my files).

I have a total of 600M reads. Is it ok to just put my data to Trinity or is it better to do something about these duplicates reads ? I would say that these duplicates that are probably caused by PCR will just make the assembly run a bit longer, right ?

Thank you for your help :)

rna-seq assembly • 217 views
ADD COMMENTlink modified 3 months ago • written 4 months ago by doinelpierrot10

Thank you, I've finished my assembly and every indicator seem fine

ADD REPLYlink written 3 months ago by doinelpierrot10
3
gravatar for h.mon
4 months ago by
h.mon32k
Brazil
h.mon32k wrote:

RNAseq will almost always "fail" the duplication module from FastQC, because highly expressed genes will be flagged as duplicates - check the documentation of the Duplicate Sequences module. You can't be certain of artifact duplication without a more careful analysis - I have used dupRadar for this in the past.

Trinity performs digital normalization by default - see Trinity Insilico Normalization, so there is no need to remove duplicates.

ADD COMMENTlink written 4 months ago by h.mon32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2078 users visited in the last hour
_