Acceptable duplication levels in ATAC-Seq data
1
1
Entering edit mode
5.5 years ago
bowwow ▴ 10

Hi,

I am in the process of analysing some ATAC-Seq data. I have already performed QC on the data using FastQC, and I noticed that the range of duplication levels of the samples was quite high in general (10% to 95%). I came across the ATACSeqQC paper (https://www.ncbi.nlm.nih.gov/pubmed/29490630) where they recorded duplication levels of 0.6% to 38% in their data. However, there is no information on what an acceptable level of duplication would be. Can someone please give some advice on this matter? Thanks!

P.S. I am new to ATAC-Seq data analysis. I have scanned the literature and haven't found much help on this topic.

ATAC-Seq sequencing PCR duplicates • 4.9k views
ADD COMMENT
1
Entering edit mode

I hope you removed everything other than nuclear reads?

ADD REPLY
2
Entering edit mode
5.5 years ago

This will end up varying strongly based upon your sequencing depth, which is why there aren't any strict thresholds. If you're looking at differential accessibility then the most important thing is that you have comparable duplication rates across samples/groups. For what it's worth, our most recent ATAC-seq samples had ~60 million reads each and had 20-25% duplication rates. That's pretty normal, in my experience. If you have >50% duplication and haven't thrown a HiSeq lane at it then likely something went wrong during library prep.

ADD COMMENT
0
Entering edit mode

Is it a case of human genome? For a yeast with 12Mb genome I get around 55-65% of duplications with 30-40 mln reads, though I don't know whether its good or not.

ADD REPLY
1
Entering edit mode

Yeah, human or mouse. For super small genomes like yeast I would expect higher duplication rates like you're observing.

ADD REPLY
0
Entering edit mode

Thank you all for responses. I forgot to mention that I am dealing with human samples (~60 million reads PE). We did a couple of different experiments, for e.g. there is a control versus treatment experiment. We found >50% duplication in the control and >80% in the treatment samples, which points to some issues at the sample prep itself...

ADD REPLY

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6