I want to calculate the coverage for multiple bam files over multiple regions. I have ChIP-seq data for different histones, so I would like to take into consideration the positive and negative stranded reads. Also, I have multiple bam files, where the coverage should be determined over specific genomic regions.
I took a look on samtools depth
and mpileup
.
As far as I understood, depth
and mpileup
are doing exactly what I need. However, the results differ:
samtools depth -r chr11:56824193-57143746 file.bam | awk -F '\t' '{print $3}' | sort | uniq -c | sort -nr
214 16
184 17
99 18
93 19
59 20
39 22
27 21
15 23
4 24
samtools mpileup -r chr11:56824193-57143746 file.bam | awk -F '\t' '{print $3}' | sort | uniq -c | sort -nr
212 16
212 0
174 17
103 18
90 19
56 20
38 22
27 21
15 23
3 24
Why is there a difference in the counts?
Also, concerning my forward and reverse coverage: prior using samtools depth
, I need to filter forward and reverse reads with samtools view
:
samtools view -b -F 20 lala.bam
as a result I am getting an unindexed .bam file, so I can't use depth
directly.
So, will it be appropriate to use samtools mpileup
with --rf 16
to calculate reverse reads and with --ff 20
to calculate positive reads?
Just find an example of a position where there's a difference and look at that in IGV. It's likely that (A) the Phred threshold is different between the two tools and/or (B) depth isn't doing the overlap detection. It's also possible that (C) depth isn't using a MAPQ threshold (there are settings for that, but the default isn't mentioned).