Question: How to deal with high duplication rate in FASTQC ( de novo transcriptom assembly) ?
Hello everybody,

I am doing de novo transcriptom assembly of some alga. I judt did the FASTQC analysis and all my sample are "failed" regarding to the percentage of duplicate reads (= 70-80 % in all my files).

I have a total of 600M reads. Is it ok to just put my data to Trinity or is it better to do something about these duplicates reads ? I would say that these duplicates that are probably caused by PCR will just make the assembly run a bit longer, right ?

Thank you for your help :)

Thank you, I've finished my assembly and every indicator seem fine

RNAseq will almost always "fail" the duplication module from FastQC, because highly expressed genes will be flagged as duplicates - check the documentation of the Duplicate Sequences module. You can't be certain of artifact duplication without a more careful analysis - I have used dupRadar for this in the past.

Trinity performs digital normalization by default - see Trinity Insilico Normalization, so there is no need to remove duplicates.

