Question: ATAC-seq DE analysis
2
gravatar for collmerr
2.1 years ago by
collmerr30
collmerr30 wrote:

Hello all. I'm trying to do differential expression analysis using ATAC-seq data sets, but I'm having trouble getting HTseq-count or featureCounts to actually count the data. Any good protocols to accomplish this?

Thanks!

ADD COMMENTlink modified 2.1 years ago by trausch1.5k • written 2.1 years ago by collmerr30
1

HT-seq or featureCounts are not really suitable (tedious work to prepare an input format with bed regions) for ChIP/ATAC data. Use bamCoverage or multiBamSummary from deeptools. All you need is BAMs and a bed file.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by venu6.6k

Disagreed. See my comment below. bedtools does not really support paired-end data, while featureCounts perfectly does. Preparing a featureCounts input file from BED is as easy as:

awk 'OFS="\t" {print $1"."$2+1"."$3, $1, $2+1, $3, "."}' peaks.narrowPeak > peaks.saf
ADD REPLYlink written 2.1 years ago by ATpoint36k

" peaks.narrowPeak" is it the combination of all the peaks form different condition ?

ADD REPLYlink written 7 months ago by krushnach80810

It is what you give it, but yes, that would imho be a reasonable choice, so the peak list produced when calling peaks on the merged BAM files.

ADD REPLYlink written 7 months ago by ATpoint36k

actually i don't have merged bam files as i have patient data which are not similar .So i thought would be useful if I merge all the bam files. But is there other way where i can do peak call on different condition and merge those peaks?

ADD REPLYlink written 7 months ago by krushnach80810
4
gravatar for Devon Ryan
2.1 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

Differential accessibility, not differential expression.

We tend to use CSAW for this. The full pipeline is here, which is essentially:

  1. Fragments are filtered (alignmentSieve) so only those <=150 bases (i.e., likely from open chromatin) are present.
  2. MACS2 is run on that.
  3. CSAW is run in R and given the peaks from 2 and the unfiltered BAM files that were input into step 1.

Alternatively, you could use diffBind in the last step. As Venu pointed out, you can use multiBamSummary with a BED file to get your counts, but CSAW at least has functions that also do that.

ADD COMMENTlink written 2.1 years ago by Devon Ryan95k
1

Your link does not work anymore. Could you update it? Thanks!

ADD REPLYlink written 20 months ago by tiago2112871.2k
2

Here's the general snakePipes link.

ADD REPLYlink written 20 months ago by Devon Ryan95k

Just a quick question: Would it be possible to identify allele specific chromatin accessibility regions (ATAC-Seq) through snakePipes

ADD REPLYlink written 18 months ago by ancient_learner630

You should be able to, yes. The DNA-mapping workflow has an allele-specific mode, which should produce two BAM files. You can then input them as different groups. I've never personally tried this though.

ADD REPLYlink written 18 months ago by Devon Ryan95k
3
gravatar for Benn
2.1 years ago by
Benn8.0k
Netherlands
Benn8.0k wrote:

The way I approach this, is to first use MACS2 for peak calling using all BAM files together. This will generate a bed file with coordinates of all peaks. Then I use bedtools, something like this:

bedtools coverage -b ${bamFile} -a All_peaks.bed > CoverageFiles/${bamFile%.bam}.txt

And then merge all columns with the read counts from the files for edgeR.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Benn8.0k
1

The issue with bedtools is one can not input multiple bam files at once. With multiBamSummary from deeptools, you can input as many BAM files as possible, it will generate one counts table for all BAMs.

ADD REPLYlink written 2.1 years ago by venu6.6k
2

Bedtools can count alignments from multiple BAM files (see http://bedtools.readthedocs.io/en/latest/content/tools/multicov.html)

ADD REPLYlink written 2.1 years ago by James Ashmore2.9k

+1. I completely forgot about multicov function. Thanks for adding.

ADD REPLYlink written 2.1 years ago by venu6.6k

so its same a as doing with featurecounts isn't it ? and the the count output can be used for deseq2 for differential accessibility analysis .

ADD REPLYlink written 7 months ago by krushnach80810
1

Yes, but bedtools is slow for these kinds of things and not multithreaded. Use featureCounts.

ADD REPLYlink written 7 months ago by ATpoint36k

"multiBamSummary" i guess this would be faster and straight forward since its multithreaded ,as i have some confusion about the featurecounts in context of Chip/ATAC data regarding the generation of saf file which is needed as i see below. It needs a combined peak file i guess so its is merging all the peak file together which is same as merge function of bedtool or it is something else?

ADD REPLYlink written 7 months ago by krushnach80810

As ATAC-seq is typically paired-end, I recomennd featureCounts from the subread package, which can handle this (bedtools cannot as far as I know), and is multithreaded.

ADD REPLYlink written 2.1 years ago by ATpoint36k

It would be helpful then to show how you do it with featureCount, see OP's remark in the question: "but I'm having trouble getting HTseq-count or featureCounts to actually count the data".

ADD REPLYlink written 2.1 years ago by Benn8.0k
1

Sorry, did not notice that sentence. There we go:

featureCounts cannot take bed or narrowPeak files directly, but wants a custom format called SAF:

## Make SAF file (+1 because SAF is 1-based, BED/narrowPeak is 0-based)
awk 'OFS="\t" {print $1"."$2+1"."$3, $1, $2+1, $3, "+"}' peaks.narrowPeak > peaks.saf

## Then run fc:
featureCounts --read2pos 5 -a peaks.saf -F SAF -T <int> -p -o peaks_countMatrix.txt *.bam

-T = number of threads; -p indicates paired-end data; --read2pos5 limits the read to the 5' end (so the actual cutting site but this is optional)

ADD REPLYlink modified 5 months ago • written 2.1 years ago by ATpoint36k

When I use the featureCount for ATAC-seq. Should I set the length of reads? For now, I just consider reads shorter than 100bp.

subread-1.6.3-source/bin/featureCounts -F SAF -T 64 -P -p -d 0 -D 100 -a test.saf -o count.txt test.bam
ADD REPLYlink written 12 months ago by vw20

In my current pipeline I typically only use the 5' end of each read (so of both the forward and reverse read) as this is the cutting site of the transposome and treat reads as single-end for this purpose. Limiting reads to those with a certain insert size imho throws away the majority of reads so I take all of them. featureCounts has an option to only count 5' ends.

ADD REPLYlink modified 12 months ago • written 12 months ago by ATpoint36k

Thank for your reply. Could you please provide your command line?

ADD REPLYlink written 12 months ago by vw20
"peaks.narrowPeak" I got to this point but a little doubt, let say I have multi condition data ,so the "peaks.narrowPeak" would be combination of all the peak called by any peak calling tool .So to get the "peaks.narrowPeak" file do I have merge all the peak files or overlap the peak files ?
ADD REPLYlink written 5 months ago by krushnach80810
1

There is no clear guideline for this. I typically call peaks with Genrich in ATAC-seq mode (this peak caller has a mode to handle replicates) over all replicates of all groups and then combine the peaks of all groups into a consensus file using bedtools merge. You can also use macs2 to do the same with merged BAM files per condition or all BAM files of the experiment merged together. If you browse BioC support page for this issue you will find different opinions. Probably the safest is to merge all BAM files to get kind of the "average" location of all potential peaks, and differential analysis will then do its part to tell which are indeed significantly different.

ADD REPLYlink written 5 months ago by ATpoint36k

"Probably the safest is to merge all BAM files " never used it but how do I identify which bam file is what sample?

ADD REPLYlink written 5 months ago by krushnach80810

I do not understand, please explain.

ADD REPLYlink written 5 months ago by ATpoint36k

I got it merging bam files you suggested if I use macs2. My doubt is if I have different bam files like from different condition when I merge them, how do I identify which bam file is from what condition ?

ADD REPLYlink written 5 months ago by krushnach80810
1

macs2 can accept bam files like macs2 callpeak -t *.bam. No need for actually merging them. Maybe joint peak calling is the better word. But you can leave files as they are and just use *.bam

ADD REPLYlink written 5 months ago by ATpoint36k

Thanks for your answer! I'm wondering do you use DEseq2 or EdgeR for differential analysis on these merged read counts coverage files? I thought they all need sample replicates as input.

Thank you!

ADD REPLYlink written 3 months ago by Jingyue30
1

It doesn't matter if you use DESeq2 or edgeR, both methods are well-accepted and perform similarly. I personally use edgeR but that has no specific reason. Yes you need replicates. What I am referring to is the merge of the peak regions which is the template for creating the count matrix where every sample is a column and every region is a row.

ADD REPLYlink written 3 months ago by ATpoint36k
2
gravatar for trausch
2.1 years ago by
trausch1.5k
Germany
trausch1.5k wrote:

Another ATAC-Seq pipeline is available here. As part of the pipeline a count matrix is generated and an Rscript template for differential peak calling is part of the pipeline (script is here).

ADD COMMENTlink written 2.1 years ago by trausch1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1306 users visited in the last hour