ATAC peak-calling: same peak ID across multiple experimental conditions?
Entering edit mode
4 months ago

I'm trying to reproduce the ATAC analysis here. I have a few questions about alignment and peak-calling.

  1. In the "ATAC-seq alignment" section in Method Details, they use Trimmomatic to remove adapters. However, I don't know the adapter sequence. Would it still be possible for me to remove adapters?

  2. In "ATAC-seq replicate correlation and peak calling", it seems like they did peak calling on samples from each time point separately. However, in "ATAC-seq peak fuzzy clustering", they cluster the peaks by RPKM across the time points (i.e. the same peak had an RPKM value for each time point). How do I do peak calling across multiple time points, so that there are the same set of peaks for each time point?

Thank you for your help -- sorry I'm new to ATAC-seq!

Genrich ATAC • 631 views
Entering edit mode
4 months ago
rfran010 ▴ 840

1) You can use software to detect adapters, I'll paste some common ones below from the ENCODE detect adapters script (I am able to run the script directly on the command line with python. Just pass your fastq file name as an argument. Example included below as well.


python3 sample_R1.fq.gz

2) It is also a little unclear to me exactly what they did, but either they used something like diff bind to take into account different peak sets, or the "Union peak set of time course" section is out of order, and they generated that before clustering.

A simpler way, that I have seen, to create a union peak set could be to call peaks for each timepoint individually, then manually combine them into one file, then sort and merge using bedtools. Not sure of the benefits or disadvantages of this way though.

Entering edit mode

Thank you for the help!

1) That makes sense, I'll try it

manually combine them into one file, then sort and merge using bedtools

How do I do this and have a separate score column for each time point?

Entering edit mode

You want to use the MACS2 score columns? You can retain these with bedtools option -c . This will retain the score columns from any merged regions in a column with the scores comma-separated. Then you could parse them out afterwards. To keep track of the source timepoint (e.g. some regions may be merged from 3 timepoints, while some will be merged from different timepoints, or only one timepoint) I would add a column of the timepoint name to the indvidual files before merging, and retain this column during merging as the keys.

Easier though would be to generate the merged regions, then run any counting/scoring over those regions with all timepoints. Probably depends what exactly you want to calculate. Example of what I sort of have in mind:

#concatenate peaks, sort, and merge.
cat atacPeaks1.bed atacPeaks2.bed atacPeaks3.bed > compiled_atacPeaks.bed
sort -k1,1 -k2,2n compiled_atacPeaks.bed > compiled_atacPeaks.sort.bed
bedtools merge -i compiled_atacPeaks.sort.bed > merged_atacPeaks.bed

multiBamSummary BED-file \
  --BED merged_atacPeaks.bed \
  --bamfiles atacPeaks1.bam atacPeaks2.bam atacPeaks3.bam \
  -o results.npz \
  --outRawCounts results.rawCounts.txt

#use results.rawCounts.txt for downstream analysis (e.g. FPKM, clustering, fold-change...)

Login before adding your answer.

Traffic: 1306 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6