I'm analyzing a SNP located in a digital genomic footprint, as determined by DNase-seq at UW. My understanding is that a DNase footprint will see reduced DNase-seq reads due to transcription factor binding, which prevents DNase cleavage. So, read counts correlate with cleavage frequency.
In viewing the raw read coverage at this particular locus, however, there is INCREASING signal towards the center of the peak, rather than decreasing. Similarly, when I downloaded the BAM files for each experiment, and used samtools mpileup to view the read coverage at each base pair, I see that read coverage is highest right at the motif that is bound by the transcription factor (CTCF is known to bind at this region by ChIP-seq, and the CTCF core motif is represented by the highest number of reads, outside of which the reads drop off significantly).
Now, this is opposite what the 2012 ENCODE paper (by John Stam's lab at UW) says about how they identify footprints - they say that footprints are identified by seeing a DECREASE in reads, flanked by areas of high read coverage, whereas I'm seeing (in both the genome browser and the BAM file using mpileup) an INCREASE in reads flanked by areas of very low coverage.
The SNP that I'm looking at within this footprint is covered by significantly more reads on the major allele chromosome than on the minor allele chromosome, when looking at heterozygous cells. Problem is I don't know if that means the major allele is more accessible to DNase cleavage, or less accessible.
If anyone knows how mpileup works, it might help me understand this discrepancy.