What are the best tools for differential detection between ATAC-seq samples?
2
7
Entering edit mode
6.0 years ago

I'm trying to figure out what tools most people use for differential chromatin detection among samples (condition1 vs condition2). I normally use diffReps for differential detection for ChIP-seq but haven't done it for ATAC. Any suggestions?

ATAC-seq next-gen • 14k views
ADD COMMENT
0
Entering edit mode

You may have a look at this publication: Nfib Promotes Metastasis through a Widespread Increase in Chromatin Accessibility (doi:10.1016/j.cell.2016.05.052). They use ATAC-seq to compare chromatin accessability between metastatic cells and its solid tumor-of-origin and use DESeq2.

ADD REPLY
0
Entering edit mode

I'll check it out thanks

ADD REPLY
15
Entering edit mode
6.0 years ago
  1. Call the peaks, by merging all the replicates (bam files), with very less stringent criteria ( p-value of 0.1 ) such that you end up with union of regions ( peaks ) across two biological conditions, that have some signal enrichment.

  2. Now either use the entire peak or center the peaks on summit with +/- 200bp and count the number of reads with in each peak from all replicates independently using featureCounts. ( You can easily covert the above bed file to saf format )

    awk -v OFS = "\t" 'BEGIN { print "PeakID", "Chr", "Start", "End", "Strand" } { print "Peak_"NR, $1, $2, $3, "." }' test.bed > test.saf
    
  3. Feed this matrix to DESeq2 or edgeR and get differential peaks. Do some filtering of peaks like removing peaks that have very few counts across replicates, otherwise this will be a problem for multiple testing correction to include so many peaks .

Edit:

Most of the differential binding tools run either DESeq or edgeR in the back end and they are suitable for TFs, as the signal is sharp with out any noise.

But with ATAC, the peaks have both characters of TF and histone modifications.

I would

1. Run macs2 with p-value of 0.1 

2. Use the narrowPeaks file, either the entire peak (Better)or take +/- 200bp from the summit (only if peaks called with --summit option in MACS, so that broad peaks are split into multiple sub-peaks).

3. Quantify peaks using featureCounts.

4. Feed the matrix to DESeq2.

You should also upload this file ( from step 2 ) to browser to get a feel of how the regions that you are using for DE analysis looks like. Most likely they will be okay.

ADD COMMENT
0
Entering edit mode

This is a great idea. Did you try this method over some published differential chromatin tools? (ex;Diffbind or diffReps). This seems like a similar method to what DiffBind does, however in the past I have noticed peak independent differential detection tools have performed better with my data. Just curious if you've looked into it?

ADD REPLY
0
Entering edit mode

I updated my answer.

ADD REPLY
0
Entering edit mode

How many replicates do you normally have when doing this? Right now, our pilot data has only one replicate per condition

ADD REPLY
0
Entering edit mode

What are these "peak independent differential detection tools"?

ADD REPLY
0
Entering edit mode

I used to use diffReps for ChIP data and I thought I could apply it to ATAC when starting out but it did not work very well for me. I think because ATAC data tends to be much more noisy than ChIP

ADD REPLY
0
Entering edit mode

Its not because ATAC is more noisy, Its because the open chromatin is not very dynamic unless the cell environment changes drastically. If you compare the ATAC profiles between two different cell types, you will see differential open chromatin regions but may not with in the same cell type exposed to different environments. It could be pure biological, as ATAC measures openness of chromatin but not the binding ( as what you expect in Chip-Seq experiments ).

ADD REPLY
0
Entering edit mode

Aka more biological noise in signal than ChIP data

ADD REPLY
0
Entering edit mode

Its not "noise".

ADD REPLY
0
Entering edit mode

How do you easily convert from BED to SAF?

ADD REPLY
0
Entering edit mode

test.bed

chr1    123 456

.

awk -v OFS="\t" 'BEGIN {print "GeneID","Chr","Start","End","Strand"} { print "Peak_"NR,$1,$2,$3,"."}' test.bed

Output:

GeneID  Chr Start   End Strand
Peak_1  chr1    123 456 .
ADD REPLY
0
Entering edit mode

What if you have only a single replicate for each sample? I think DESeq2 requires replicates for each sample. I have ATAC-seq peaks for 8 time points of 4 conditions --> 32 samples in total. There are no replicates.

I'd appreciate any advice

Kenneth.

ADD REPLY
1
Entering edit mode

I think that you have to learn about design formula and design matrices, that DESseq2 and other packages use to describe experimental designs like yours. The topic is covered in the DESeq2 manual and many posts on the Bioconductor support forum, and is a bit too complex for me to introduce it briefly here.

ADD REPLY
0
Entering edit mode

Thank you Charles...I'll start there.

ADD REPLY
0
Entering edit mode

Have you tried MANorm?

ADD REPLY
0
Entering edit mode

Hi Goutham,

How could I do your 3rd step: "3. Count the reads for each replicate."?

I have the BAM file and the narrowPeaks from macs2.

Thank you very much!

ADD REPLY
0
Entering edit mode

I use featurecounts from Rsubread, a bioconductor package. There is a pretty good example on pages 5-7 of the vingette under Section 3 "Counting mapped reads for genomic features". You can also do this via HTSeq-count.

ADD REPLY
0
Entering edit mode

feautureCounts should do the trick. It has an R interface or command line version. You need to convert your peak coordinates to SAF format as given on their website.

ADD REPLY
0
Entering edit mode

Is there normally a lot of variability in your data? When I do a spearman rank correlation, all of my samples are not very well related to eachother (spearman < 0.5) but the peaks look good among replicates. Is this abnormal?

ADD REPLY
0
Entering edit mode

Did you do PCA analysis on your replicate samples ?

ADD REPLY
0
Entering edit mode

So I started to draft up a response and then thought it might just be easier to send the data. I'll provide a link to a correlation heat map and MDS plot from edgeR. The problem that we have had with our data is that our sequencing depth has varied pretty substantially so this could have some affect (In the link will also be read depth plots). Also, I should add that there are two replicates for each sample but these are TECHNICAL replicates from the same dog. When it comes to the factor we are running differential analysis on, it's gonna be based upon many factors so these dogs are just normal domestic dogs that we are gonna check differences in sex, bread, etc.

http://rpubs.com/achits/285291

ADD REPLY
0
Entering edit mode

Hi geek_y! I am still a bit confused about how to call differential peaks with DESeq2 since after performing featureCounts I have got several matrixes and the rows of each matrix will not be the same. How can I combine these matrixes into one when DESeq2 only take one input?

ADD REPLY
1
Entering edit mode

I believe before doing featureCounts, you will somehow have to merge the peak calls. So that you end up with a set of "query" genomic regions. Then do featureCounts for these "query" regions across samples. This will make it easier to combine counts from all samples in the end... ready for DESeq2.

ADD REPLY
0
Entering edit mode

You'll find better support on bioconductor for this question

ADD REPLY
0
Entering edit mode
4.5 years ago

Once you have your peaks for each condition, you can try DAStk, a tool recently put out for that purpose:

https://biof-git.colorado.edu/dowelllab/DAStk

ADD COMMENT

Login before adding your answer.

Traffic: 885 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6