Hi all, I'm relatively new to ATAC-seq and bioinformatics in general. I am processing an ATAC-seq dataset (have treatment and control), in particular looking to find differential peaks and then seek for TF binding sites by motif finding and JASPAR.
- Reads were trimmed, aligned to reference genome using
bowtie2, and filtered
- Peaks called using
macs2. For instance:
macs2 callpeak --nomodel -t Sample_WT_dedup.bam -n Sample_WT --keep-dup all --gsize 11250000000 --shift -37 --extsize 73 --outdir macs2_output -B
- The above was done for both the WT (control) and treatment data. Then I tried to use
bdgdiffto call differential peaks, following the guidelines here
macs2 bdgdiff --t1 Sample_LMN_output/macs2_output/Sample_LMN_treat_pileup.bdg --t2 Sample_WT_output/macs2_output/Sample_WT_treat_pileup.bdg --c1 Sample_LMN_output/macs2_output/Sample_LMN_control_lambda.bdg --c2 Sample_WT_output/macs2_output/Sample_WT_control_lambda.bdg --d1 15430860 --d2 23095133 -g 60 -l 120 --outdir WT_LMN_diff --o-prefix WT_LMN
Here, I have taken --d1 and --d2 to be respectively the number of tags encountered by macs2 in the respective samples (extracted from
.xls output). From what I understand, MACS2 uses these inputs for read-count normalization so the fact that I have more reads in WT than treatment samples should be corrected for.
However, the output is otherwise: only 223 peaks are enriched in the treatment (relative to WT), whilst ~10000 peaks are enriched in the WT (relative to treatment).
Furthermore, some of the log-likelihood ratios are crazy... see last column
chr2 19825317 19825545 WT_LMN_cond1_89 -632933.75000 chr2 19826135 19826362 WT_LMN_cond1_90 -647127.06250 chr2 19827419 19827635 WT_LMN_cond1_91 -683938.56250 chr2 19832164 19832295 WT_LMN_cond1_92 -2365761.25000 chr2 19832486 19832752 WT_LMN_cond1_93 -736282.37500 chr2 22930658 22930814 WT_LMN_cond1_94 -1535228.75000
Seems like there is still the effect of the amount of reads in each sample. I've looked around for information regarding this, but I haven't really found anything yet. To my understanding MACS2 bdgdiff should be able to deal with normalization itself (since it is taking the --d1 and --d2 params).
Am I correct that MACS2 does its own read count normalization? Or does this have to be a manual step ahead of time?
Would be grateful for any help :)