BEDtools coverage output
1
1
Entering edit mode
8.3 years ago
novice ★ 1.1k

I'm trying to produce a simple window => coverage table from a bam file. I produced (non-sliding) windows with bedtools makewindows and then ran bedtools coverage -abam alignment.bam -b reference.windows > coverage. The problem is that I cannot make sense of the output I got. Here's a sample:

I    0    215    SRR1297046.176411/2    23    +    0    215    0,0,0    1    215,    0,    1    215    215    1.0000000
I    0    216    SRR1297046.2210768/1    23    +    0    216    0,0,0    1    216,    0,    1    216    216    1.0000000
I    1    222    SRR1297046.178368/1    24    +    1    222    0,0,0    1    221,    0,    1    221    221    1.0000000

The first field is the chromosome name and the fourth one is the read's, but what are the others? This output is inconsistent with the ones described here. How can I convert this output into a "window => coverage" table?

I'd appreciate any help.

coverage alignment bedtools • 11k views
ADD COMMENT
2
Entering edit mode
8.3 years ago

If you want the coverage of each entry in reference.windows then bedtools coverage -a reference.windows -b alignment.bam > coverage is what you want.

BTW, this will probably double count overlapping paired-end reads. You could alternatively use samtools depth with some post-processing and not have to worry about that.

ADD COMMENT
0
Entering edit mode

Thank you Denvon! That worked. I tried samtools depth -b reference.windows alignment.bam but it gave me the coverage at each position, not windows. Did you mean that it'd be more accurate if I post-processed this output to get what I want?

I also tried samtools bedcov reference.windows alignment.bam, which gave me the coverage at each window. However, the coverage values were way bigger (in hundreds of thousands) than those output by bedtools (in thousands). Do you know why that is?

ADD REPLY
1
Entering edit mode

Ah, the "bedcov" command is new. It reports the sum of the per-base coverage, so just divide that by the window size to get the average coverage.

ADD REPLY
1
Entering edit mode

I think both samtools depth and bedcov will double count overlapping read pairs, unless the overlapping part has been clipped in one of the two reads (I use clipOverlap for that). However, samtools mpileup should do the right thing unless the option --ignore-overlaps (disable read-pair overlap detection) is set.

But if you want to count the fragments per window (as in a typical RNA-Seq or ChIP-Seq experiments) than you could use bedtools coverage using only read 1 or better merge read pairs to a single fragment.

ADD REPLY
1
Entering edit mode

Indeed, setting -Q 1 with bedcov (and -q 1 with depth) would presumably get around this, though obviously one is then filtering out phred 0 bases as well.

ADD REPLY
1
Entering edit mode

Mmm... -q 1 will filter for base quality but it will not affect the overlapping issue.

ADD REPLY
1
Entering edit mode

Overlaps are removed in mpileup internally by setting the phred scores to 0. It's a nasty hack, but that's how it's done.

ADD REPLY
0
Entering edit mode

For the record, using samtools mpileup doesn't yield the same result as using samtools bedcov -q 1 or depth -Q 1. At least according to my results. I post-processed the outputs from all of the commands and found that mpileup gives a lower coverage than both depth -Q 1 and bedcov -q 1 (which give the same coverage).

Here's the script I used to get the average coverage per window from mpileup's output. Takes the window size (non-sliding) as an argument. Can be modified easily to work with depth.

Example usage:

samtools mpileup alignment.bam | perl windows.pl 10000 > coverage

I hope someone would find it useful:

#! /usr/bin/perl

my $window = shift @ARGV
  or die "Specify a window size!";

while (<>) {
  my $pos = (split)[1];

  $cov += (split)[3]; #change this to 2 for samtools depth output

  unless ($pos % $window) {
    printf "%s\t%s-%s\t%s\n", (split)[0], ($pos - $window), $pos, ($cov / $window);
    $cov = 0;
  }
}
ADD REPLY
0
Entering edit mode

Mpileup performs significantly more filtering, which is why you get discordant results.

ADD REPLY
0
Entering edit mode
Hi Denvon, Do you know what kind of filtering mpileup does? I'm not sure which of these options I should rely on. BTW, bedtools coverage gives slightly less average coverage per window than mpileup.
ADD REPLY
0
Entering edit mode

See the defaults for -A, -d and -Q (as well as -x).

ADD REPLY

Login before adding your answer.

Traffic: 2321 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6