High levels of duplicated reads Illumina from PCR-free libraries
Entering edit mode
3 months ago
grey ▴ 20

Checking fastqc results from HiSeq 4000 run of PCR-free libraries and came across really high Sequence Duplication Levels.

Note: We were trying to sequence at high depth to detect somatic mutations (~240X).

Can't be PCR duplicates since these are PCR-free so it's a bit mysterious to me, though others might have ideas. Optical duplicates? Insufficient DNA in the sample? Should we be ok just removing duplicates or does this indicate something systemically wrong?

fastqc results

fastqc PCR duplicate reads Illumina • 252 views
Entering edit mode

You can use clumpify.sh from BBMap suite to check on types/numbers of duplicates you have without doing alignments : Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.

While it is possible that there are some cluster/optical duplicates it may also simply be the characteristic of this library prep. I am not sure if your kit used tagmentation so there is a possibility that similar fragments were generated,


Login before adding your answer.

Traffic: 1673 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6