Question

ATAC seq: from peaks to differential analysis ??

0

Entering edit mode

4 days ago

Picasa ▴ 680

Hi,

I’m fairly new to ATAC-seq and have successfully run MACS2 separately for each of my samples. I now have individual *.narrowPeak files as output.

My experimental design looks like this:

Sample_ID     Cell_type     Condition     Donor

Sample_1      T_cells       Tumor         Donor_1
Sample_2      T_cells       Normal        Donor_1
Sample_3      T_cells       Tumor         Donor_2
Sample_4      T_cells       Normal        Donor_2
...
Sample_11     Dendritics    Tumor         Donor_10
Sample_12     Dendritics    Normal        Donor_10
Sample_13     Dendritics    Tumor         Donor_11
Sample_14     Dendritics    Normal        Donor_11

As you can see, I have two cell types (T_cells and Dendritics), and for each donor, I have paired Tumor and Normal samples.

My goal is to perform a differential accessibility analysis (Tumor vs Normal), accounting for both Donor and Cell_type. I’m also interested in comparing Tumor (T_cells) vs Tumor (Dendritics).

I heard that it is possible to use DESeq2 for ATAC-seq data, so my design will look like this: https://bioconductor.org/packages/3.21/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups

I have a few questions:

1) Can I use the exact same code from the vignette, as typically done for RNA-seq data? Or are there any parameters or steps specific to ATAC-seq that I should consider?

2) I’m struggling with how to convert my individual *.narrowPeak files into a count matrix. Do you have any recommendations or tools to help with this step?

3) Are there alternative methods to DESeq2 that would be better suited for this kind of analysis? I guess limma should work the same no ?

Thank you in advance for your help !

MACS2 DESeq2 ATACseq • 297 views

ADD COMMENT • link updated 4 days ago by rfran010 ★ 1.5k • written 4 days ago by Picasa ▴ 680

score 0 · Answer 1 · 2025-06-16

2:

# make saf file:
awk 'OFS="\t" {print $1":"$2+1"-"$3, $1, $2+1, $3, "+"}' ${sample}_peaks.narrowPeak > featureCounts_peaks.saf

$featureCounts -a featureCounts_peaks.saf \
    -F SAF \
    --read2pos 5 \
    -p \
    -o ${peak_all_dir}${sample}_countMatrix.txt ${rmBL_dir}*.bam

${rmBL_dir}*.bam: all bam files for generating narrowPeaks

3: DEseq2 is oke. design matrix and param depends on research question, not on the ATAC-seq or RNA-seq as they are all readcount type

score 0 · Answer 2 · 2025-06-16

I also recommend converting the peak to SAF file as QX shows, although I just count fragments centered on cut site. There's little difference either way though.

For more guidance on creating a count matrix, you do need to create a common set of peaks first. Two main ways to do this, either take the intersection of peaks present in all samples or the union.

I use bedtools, e.g.

# N just represents number of total samples.
bedtools mutliinter -i ${PEAKS_1} ${PEAKS_2} ... ${PEAKS_N} | awk '$4 == N' | bedtools sort -i - | bedtools merge -i - > All_Samples_Intersection.bed

Or,

cat  ${PEAKS_1} ${PEAKS_2} ... ${PEAKS_N} | bedtools sort -i - | bedtools merge -i - > All_Samples_Union.bed

Then you can convert that bed into SAF and use featurecounts to count reads from each sample and treat similar to RNA-seq.

One thing I would suggest, I usually use a low count filter before moving forward with DE analysis. With ATAC-seq, there's usually more noise (I use relatively lax peak calling), so I tend to use a higher count threshold.