Question

Overlap in Peaks of ChIP data for 2 TFs

0

Entering edit mode

5.0 years ago

yugen30 ▴ 10

Hi all,

I have recently started analyzing ChIP-seq data. I have two datasets from GEO for two different transcription factors in same sample and want to compare their overlap binding sites and determine genes that are co-regulated by these TFs. I have aligned the two datasets using the same pipeline and have used MACS2 for peakcalling. I was going to use bedtools intersect to determine the overlapping peaks. Before I proceed, however, I wanted to know whether I should be normalizing for the library size? Typically the data is normalized for Libray size by the statistical method used to detect differential abundances. In this instance, are these peak numbers comparable to each other as is? If I need to normalize, how can I proceed with it?

ChIP-Seq • 1.5k views

ADD COMMENT • link updated 5.0 years ago by Prakash ★ 2.2k • written 5.0 years ago by yugen30 ▴ 10

1

Entering edit mode

Since these are two different TF, you must be wanted to identify maximum peaks possible for that particular TF and that depends on the ChIP efficiency and sequencing depth. So I would suggest to do peak calling without library size normalization and then normalize by sequencing depth for downstream comparison. If the TF is same under different condition, we should normalize by sequencing depth by input or sample having lowest depth.

ADD REPLY • link 5.0 years ago by Prakash ★ 2.2k

0

Entering edit mode

Thank you so much for your reply, Prakash. Yes, I want to identify the maximum number of peaks for each transcription factors. So, currently I have performed peak calling without any normalization between the two TFs although for each IP is normalized with respect to its corresponding Input. I now have the MACS2 output peaks file. As I do the downstream comparison, I had some follow-up questions. Also, though the samples are biologically the same, the ChIP experiments are performed years apart and in different labs.

Isn't library size and sequencing depth the same in this case (i.e total read count)?
I am not clear on how to normalize for sequencing depth post peak calling for the two TFs. If I have 2 bed files listing the peaks for the 2 TFs respectively and I was to look at how many peaks overlap for the 2 TFs, how do I normalize for seq depth at this stage? At the same time, I feel that if I normalize the two libraries initially I'll lose out on some peaks. I'd appreciate if you can elaborate on that. Also, are there any tools available for that?

I would really like to understand the normalization aspect for 2 different ChIP seq data though this might be a naive question!

ADD REPLY • link 5.0 years ago by yugen30 ▴ 10

score 2 · Answer 1 · 2019-04-11

Isn't library size and sequencing depth the same in this case (i.e total read count)?

yes both are synonymous

how do I normalize for seq depth at this stage?

Normalize to sample having lowest sequencing depth

are there any tools available for that?

I would suggest to use deeptools