coverage per base using samtools depth and mpileup
2
1
Entering edit mode
8.6 years ago
tonja.r ▴ 600

I want to calculate the coverage for multiple bam files over multiple regions. I have ChIP-seq data for different histones, so I would like to take into consideration the positive and negative stranded reads. Also, I have multiple bam files, where the coverage should be determined over specific genomic regions.

I took a look on samtools depth and mpileup.

As far as I understood, depth and mpileup are doing exactly what I need. However, the results differ:

samtools depth -r chr11:56824193-57143746 file.bam | awk -F '\t' '{print $3}' | sort | uniq -c | sort -nr
214 16
184 17
99 18
93 19
59 20
39 22
27 21
15 23
4 24

samtools mpileup -r chr11:56824193-57143746 file.bam | awk -F '\t' '{print $3}' | sort | uniq -c | sort -nr

212 16
212 0
174 17
103 18
90 19
56 20
38 22
27 21
15 23
3 24

Why is there a difference in the counts?

Also, concerning my forward and reverse coverage: prior using samtools depth, I need to filter forward and reverse reads with samtools view:

samtools view -b -F 20 lala.bam

as a result I am getting an unindexed .bam file, so I can't use depth directly.

So, will it be appropriate to use samtools mpileup with --rf 16 to calculate reverse reads and with --ff 20 to calculate positive reads?

sequencing • 9.3k views
ADD COMMENT
1
Entering edit mode

Just find an example of a position where there's a difference and look at that in IGV. It's likely that (A) the Phred threshold is different between the two tools and/or (B) depth isn't doing the overlap detection. It's also possible that (C) depth isn't using a MAPQ threshold (there are settings for that, but the default isn't mentioned).

ADD REPLY
1
Entering edit mode
8.6 years ago
tonja.r ▴ 600

Yep. mpileup used minimum base quality for a base to be considered = 13.

ADD COMMENT
1
Entering edit mode

Glad it was an easy resolution!

ADD REPLY
1
Entering edit mode
8.6 years ago
Shicheng Guo ★ 9.4k

Actually, the most direct way is use bedtools coverage, there are hundreds options in bedcoverage, you can find all the aim you want to do.

ADD COMMENT
0
Entering edit mode

But isn't it slower? I tried coverageBed -a gene_coord_red.bed -b reads.sort.bam -d with only one region and it took much longer than mpileup with the same region.

ADD REPLY

Login before adding your answer.

Traffic: 2685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6