Question

Comparing peaks between multiple samples

0

Entering edit mode

3.7 years ago

brisbio ▴ 30

I have a set of ChIP-seq peaks for 4 samples. I want to create one file in which the intersected genomic peaks are listed along with how many out of the 4 samples it appeared. I would also like the genomic region of the intersect over multiple samples to have the lowest start and highest end coordinates across the samples. E.g Sample A - Chr 1 349-678 Sample B - Chr 1 328-669 Sample C - Chr 1 330-671 Sample D - Chr 1 351-677

I would like it come back as Chr 1 328-678 and it was in 4 samples. I think the bedtools multiinter is something I can use but there doesn’t seem to be much information on it and also how to do it according to what I want. Can anyone help in what the best way going about this is?

ChIP-Seq bedtools • 1.2k views

ADD COMMENT • link updated 3.7 years ago by Rory Stark ★ 2.0k • written 3.7 years ago by brisbio ▴ 30

score 0 · Answer 1 · 2020-08-19

0

Entering edit mode

3.7 years ago

Rory Stark ★ 2.0k

The Bioconductor tool DiffBind does exactly this, merging peaks and maintaining how many (and which) peaksets they were called in.

ADD COMMENT • link 3.7 years ago by Rory Stark ★ 2.0k

0

Entering edit mode

That is just what I am after! I have looked through the DiffBind reference manual and can follow how to do this. I have another question after reading through it though - in section 6.3 it demonstrates how you can identify sites that are unique to a sample group. In that example resistant vs responsive in the tamoxifen dataset and you can produce Venn diagrams which nicely show the overlap and the unique peak numbers. I have ChIP-seq results for two transcription factors at different conditions that were performed using the same samples. I can see from the tamoxifen example that I will be able to use DiffBind to look at the difference between the two conditions for each of the transcription factors individually. However can I look at the the overlap between the two transcription factors at the same condition in a certain % of samples? If so could I do this by uploading a sample sheet that contains the samples for the two transcription factors at the one condition. Instead of the other way round in which the sample sheet would be for one TF containing the samples of the two different conditions.

ADD REPLY • link 3.7 years ago by brisbio ▴ 30

score 0 · Answer 2 · 2020-08-25

0

Entering edit mode

3.7 years ago

Rory Stark ★ 2.0k

Yes you can do this. You can divide things up using any combination of meta-factors (Tissue, Factor, Condition, Treatment, Replicate), each of which can contain anything you like, or by calling out any subsets of peak using masks.

Cheers-

Rory

ADD COMMENT • link 3.7 years ago by Rory Stark ★ 2.0k