ctDNA WES duplication rates from NovaSeq 6000 and HiSeq 4000
0
2
Entering edit mode
4.4 years ago
graeme.thorn ▴ 100

We have performed whole exome sequencing on cell-free DNA from pancreatic cancer patients with both the HiSeq 4000 and the NovaSeq 6000 machines. The HiSeq run was 150bp paired-end, and the NovaSeq was 100bp paired-end, on DNA fragments up to 400bp.

When running through the GATK mapping pipeline, we are getting vastly different rates of duplication for these samples: the HiSeq results in about 20-30% duplication (as reported and flagged by MarkDuplicates), but the NovaSeq results in anywhere from 60-98% (!) duplication, meaning that our aimed-for 1000X sequencing is reduced down to an effective 20X.

We are utilising our pipeline both with and without the MarkDuplicates step to check whether it materially effects any of our downstream analyses (variant calling/copy number variation), and it seems that the massive drop in coverage in NovaSeq does appear to affect the number and quality of the variant/CNAs called.

Has anyone experienced this level of duplication before? Is it an indication of the low complexity of the input library (100bp paired-end compared to 150bp paired-end), hence lots of DNA fragments are being called duplicates as false positives or is it an issue with NovaSeq sequencing in general?

If it's either of these possibilities, then we will just have to ignore the MarkDuplicates flags (such as here: https://www.ncbi.nlm.nih.gov/pubmed/30404863 where the duplicates are marked, but not removed and so contribute to their downstream analysis).

wes sequencing illumina novaseq • 2.5k views
ADD COMMENT
1
Entering edit mode

You should check to see how many of these are optical duplicates. These are a known issue with patterned flowcells ( Duplicates on Illumina ) and if the libraries are not meeting a narrow criteria (defined insert sizes and loading conc) then you can end up with a problem.

Check this thread on how to identify all vs optical replicates: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files

ADD REPLY

Login before adding your answer.

Traffic: 2328 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6