Question

duplicated read in ChIP seq

2

Entering edit mode

7.4 years ago

op263 ▴ 50

Hello,

I just received sequences from the first ChIP seq experiment done in my lab. I run triplicates for the samples and used input as a control. I started the analysis with UseGalaxy and already have some problems after FastQC step!! I found high level of duplicated read in my samples (input are fines) with only 4% of seqs remaining if deduplicated.

Is it worth making the analysis after removing the duplicates? I was considering removing the duplicated read and combining single reads from the triplicates.

many thanks for any help!

Olivier

ChIP-Seq • 4.9k views

ADD COMMENT • link updated 7.4 years ago by harold.smith.tarheel ★ 4.9k • written 7.4 years ago by op263 ▴ 50

score 5 · Answer 1 · 2016-12-05

Duplication is expected in ChIP-Seq, but 96% duplication is not unless 1) you depth of coverage is massively excessive, or 2) your binding factor interacts with very few sites. A much more common explanation is that your IP failed and/or the amount of IPed chromatin was too low for efficient library construction, which results in a huge amount of PCR duplication. You can discriminate via genome browser of your non-deduplicated data. Bona fide peaks will have multiple overlapping reads with offsets, while samples with only PCR duplicates will stack up perfectly without offsets.

score 2 · Answer 2 · 2016-12-05

2

Entering edit mode

7.4 years ago

mastal511 ★ 2.1k

You would expect to find duplicated sequences in Chip-Seq data, because you are only sequencing the parts of the genome pulled down by the IP procedure. Your data is probably fine, so don't remove the duplicates.

ADD COMMENT • link 7.4 years ago by mastal511 ★ 2.1k