Cg Contents In Chip-Seq Reads
5
7
Entering edit mode
13.4 years ago
Jim ▴ 70

Hi, I am wondering if anybody can share some experience with the GC% bias in the reads of a ChIP-seq or INPUT DNA sample. Thanks.

chip-seq gc • 5.2k views
ADD COMMENT
3
Entering edit mode
13.2 years ago

It depends on the sequencing technology used, but in all cases I looked at (using Illumina Genome Analyzers), there's definitely been a positive correlation between GC content and ChIP-seq read density. This has been noted especially when looking at "negative" controls: Using an antibody that should not bind anything, you still see a clear enrichment for nucleosome occupied regions. Someone even turned this into a method called Sono-Seq.

ADD COMMENT
2
Entering edit mode
13.1 years ago

First, I found the HOMER tools to be a reasonably good way of looking at GC bias across samples (see http://biowhat.ucsd.edu/homer/chipseq/qc.html the "Sequence Bias" section). I've seen the curves be non-linear across the read length (see Dr. Chris Benner's example of biased sample) and it sometimes correlates informally with a higher percentage of clonal reads. There are plenty of people who just sequenced the adapter 8 million times -- those plots look more skewed than the ones with real data...

But for me the odd observation has been when the baselines differ by different treatments. I define baseline as "far up- or down-stream the mapped read" and should in theory be the global GC% for the genome. I've seen the baseline GC% vary across samples, by different treatments usually.

ADD COMMENT
1
Entering edit mode
13.2 years ago
Alex ★ 1.5k

The ChIP and Seq parts usually include a PCR step or steps for probe enrichment, so you anyway have a under-representation of GC/AT-rich reads. If you are interested in those reads then you could run several experiments with different temperatures for PCR steps. But I've never seen such experiments.

ADD COMMENT
0
Entering edit mode
13.4 years ago

I've investigated something relatively similar to your question, which is:

(a) what is the fraction of all the reads in a chip-seq experiment that contain a low complexity signature, according to the 'dust' formula implemented in megablast.

For the datasets I've tried, this value ranges from 5-15% both for A/T and C/G, so high C/G is about half of that.

(b) what is the fraction of all the reads in a chip-seq experiment that are repeated. The definition of repeated is that there is a significant sequence overlap between the repeated reads so that they can be clustered together, or that they map to multiple repetitive regions in the genome.

Here, for the datasets I've tried, this value ranges from 10-25%.

ADD COMMENT
0
Entering edit mode
13.2 years ago
Gjain 5.8k

you can also intersect the Chip Peaks with the CpGislands data in the UCSC genome browser. That will give you a good estimate.

ADD COMMENT

Login before adding your answer.

Traffic: 1736 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6