How to merge samples of the same cell type to do differential peak calling?
2
1
Entering edit mode
2.9 years ago
zau saa ▴ 150

Hi! Now I have 6 ATAC-seq samples which have been aligned to hg19 and called peaks by MACS2. 3 of them are from A cells and the other are from B cells. Now I'm going to do differential peak calling by MACS2. Before it, I need to merge peaks from 3 samples of the same cell type. I use this command to call peaks.

macs2 callpeak -B -t sample_1.bam -n sample_1 --nomodel --extsize 120 --shift 60  \
--outdir sample_1 --keep-dup all

And then I get 5 output files.

sample_1_control_lambda.bdg
sample_1_peaks.narrowPeak
sample_1_peaks.xls
sample_1_summits.bed
sample_1_treat_pileup.bdg

But I'm confused how to merge them. Shall I use bedtools intersect to merge 3 control_lambda.bdg files and treat_pileup.bdg files of A cell respectively? I'm worried if it will influence the value in the fourth column of the bdg files.

Or shall I merge their bam files before peak calling?

ATAC-seq calling merge differential peak • 2.8k views
ADD COMMENT
4
Entering edit mode
2.9 years ago
ATpoint 85k

In my opinion "differential peak calling" does not exist with macs2 as it is not able to make proper use of experimental replication and as such is not suitable to derive differential calls. It is great at calling peaks because it has been developed for that, but not more than that. I would call peaks on the merged files:

macs2 callpeak -t *.bam (...)

There is no need to merge the bam files as t can read multiple bams and then combine the pipeup internally.

Then convert the narrowPeak file to SAF format:

Converting from BED to SAF/GFF

Then make a count matrix with featureCounts from the subread package:

featureCounts -a your.saf -F SAF -o out.counts *.bam

...and then feed this into something like DESeq2 or edgeR. Alternatively, look into the DiffBind or csaw packages which will instruct you on how to run a proper differential analysis.

ADD COMMENT
0
Entering edit mode

Thanks for your detailed description!

ADD REPLY
0
Entering edit mode

Hi just read your comments, good to know that we actually don't need to merge the bam file prior to peak calling. I have one question, if I merged my three replicates for peak calling, then import the count to Deseq. Due to the lack of replicates Deseq won't give me any p-value. How should I interpret the results of the differential analysis? Thanks in advance.

ADD REPLY
0
Entering edit mode

Without replicates there is no differential analysis I fear. There are several threads here discussion what to do in the absence of replicates.

ADD REPLY
0
Entering edit mode

Thanks! I will look into it.

ADD REPLY
2
Entering edit mode
2.9 years ago
Marco Pannone ▴ 810

My suggestion is to merge .bam files prior to peak calling, rather than merging peak calling output files. You can merge .bam files using samtools merge.

ADD COMMENT
0
Entering edit mode

Thanks for you reply!

ADD REPLY

Login before adding your answer.

Traffic: 1177 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6