Question: peak calling of ChIP-seq
0
gravatar for Ben
24 months ago by
Ben50
Ben50 wrote:

I have many ChIP-Seq data containing duplicated data. Firstly, I aligned these fastq files into reference genome separately, then I merged these bam files into one bigger bam file. I used MACS to do peak calling. However, many papers did not merge these bam files, but they did peak calling separately and merge these peaks produced by MACS. Does anyone know which one method is better? And how to merge these peaks generated by MACS?

chip-seq tutorial • 1.4k views
ADD COMMENTlink modified 24 months ago by dnamonk10 • written 24 months ago by Ben50
1

If these are biological replicates follow the IDR analysis of encode. Do quality analysis of noise to signal with SPP using cross correlation analysis as EagleEye suggested. Also perform chance in parallel to understand the quality of the signals. FInally peak calling with MACS2 (i hope you are doing with the latest). Multiple peak calling can also be done with macs2, having one input and all the bam files for your samples.

Check the link

ADD REPLYlink written 24 months ago by ivivek_ngs4.8k

Please be reminded that SPP or IDR protocol can be only used for single-end read data. So, better use masc2 peak caller which can handle both single and pair end data. Please check my reply for more details.

ADD REPLYlink written 24 months ago by dnamonk10
1

OP did not mention if its SE or PE.

ADD REPLYlink written 24 months ago by ivivek_ngs4.8k

Thanks for your suggestions! But I have another question, you siad that I should merge the common peaks from multiple peak calling. However, what are the common peaks? In fact, I do not know to merge peaks from multiple files.

ADD REPLYlink written 24 months ago by Ben50
1

Use BEDtools intersect.

ADD REPLYlink written 24 months ago by EagleEye6.2k
1
gravatar for EagleEye
24 months ago by
EagleEye6.2k
Sweden
EagleEye6.2k wrote:

Hi,

I recommend you to use phantompeakqualtools cross-correlation analysis

  • Check the column 11 values. If the replicates have values close to each other, you can merge those samples and do single peak calling. Othewise you do peak calling separately and merge/ take the common peaks from both peak calling.

    COL11: QualityTag: Quality tag based on thresholded RSC (codes: -2:veryLow,-1:Low,0:Medium,1:High,2:veryHigh)
    
  • Also recheck/verify the samples using 'plotFingerpring'.

ADD COMMENTlink modified 24 months ago • written 24 months ago by EagleEye6.2k
1
gravatar for dnamonk
24 months ago by
dnamonk10
Germany
dnamonk10 wrote:

The best approach is to do peak calling separately on each replicate (make sure to use input) and then use either: phantompeakqualtools if you have single end read data (Reference: https://sites.google.com/site/anshulkundaje/projects/idr).

OR

Use ChiLin: https://www.ncbi.nlm.nih.gov/pubmed/27716038 if you have pair-end data to assess the quality of each replicate. Please remember that SPP can be only used for single end read data. So, you better use macs2 peak caller.

Nowadays, in newly coming papers calculating Pearson's correlation for checking read density for overlapping replicates is regarded as a better approach than IDR. So, you should also give it a try.

Then only select those replicates which have significant overlaps. Later, you can merge the peaks for each replicate. Best is to perform downstream analysis on only those peaks which are overlapping. Use Bedtools to merge peaks.

Good luck!

ADD COMMENTlink modified 24 months ago • written 24 months ago by dnamonk10
1

I agree about the Pearson's correlation for checking the read density. Something I reckon is applied in bamcompare of deeptools, if am not wrong.

ADD REPLYlink written 24 months ago by ivivek_ngs4.8k

Could you comment on the differences between IDR and Pearson? I understand what each approach is doing, but given that a Pearson for the read count of the peak summits gives, lets say >= 0.9, is it then possible that IDR would mark these two replicates as unacceptable? So essentially, is a good linear correlation sufficient to assess the reproducibility of a replicate?

ADD REPLYlink written 23 months ago by ATpoint15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1102 users visited in the last hour