Question

BEDtools coverage output

1

Entering edit mode

8.3 years ago

novice ★ 1.1k

I'm trying to produce a simple window => coverage table from a bam file. I produced (non-sliding) windows with bedtools makewindows and then ran bedtools coverage -abam alignment.bam -b reference.windows > coverage. The problem is that I cannot make sense of the output I got. Here's a sample:

I    0    215    SRR1297046.176411/2    23    +    0    215    0,0,0    1    215,    0,    1    215    215    1.0000000
I    0    216    SRR1297046.2210768/1    23    +    0    216    0,0,0    1    216,    0,    1    216    216    1.0000000
I    1    222    SRR1297046.178368/1    24    +    1    222    0,0,0    1    221,    0,    1    221    221    1.0000000

The first field is the chromosome name and the fourth one is the read's, but what are the others? This output is inconsistent with the ones described here. How can I convert this output into a "window => coverage" table?

I'd appreciate any help.

coverage alignment bedtools • 11k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.3 years ago by novice ★ 1.1k

Ram · Accepted Answer · 2016-01-10

2

Entering edit mode

8.3 years ago

Devon Ryan 104k

If you want the coverage of each entry in reference.windows then bedtools coverage -a reference.windows -b alignment.bam > coverage is what you want.

BTW, this will probably double count overlapping paired-end reads. You could alternatively use samtools depth with some post-processing and not have to worry about that.

ADD COMMENT • link 8.3 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you Denvon! That worked. I tried samtools depth -b reference.windows alignment.bam but it gave me the coverage at each position, not windows. Did you mean that it'd be more accurate if I post-processed this output to get what I want?

I also tried samtools bedcov reference.windows alignment.bam, which gave me the coverage at each window. However, the coverage values were way bigger (in hundreds of thousands) than those output by bedtools (in thousands). Do you know why that is?

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by novice ★ 1.1k

1

Entering edit mode

Ah, the "bedcov" command is new. It reports the sum of the per-base coverage, so just divide that by the window size to get the average coverage.

ADD REPLY • link 8.3 years ago by Devon Ryan 104k

1

Entering edit mode

I think both samtools depth and bedcov will double count overlapping read pairs, unless the overlapping part has been clipped in one of the two reads (I use clipOverlap for that). However, samtools mpileup should do the right thing unless the option --ignore-overlaps (disable read-pair overlap detection) is set.

But if you want to count the fragments per window (as in a typical RNA-Seq or ChIP-Seq experiments) than you could use bedtools coverage using only read 1 or better merge read pairs to a single fragment.

ADD REPLY • link 8.3 years ago by dariober 14k

1

Entering edit mode

Indeed, setting -Q 1 with bedcov (and -q 1 with depth) would presumably get around this, though obviously one is then filtering out phred 0 bases as well.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Devon Ryan 104k

1

Entering edit mode

Mmm... -q 1 will filter for base quality but it will not affect the overlapping issue.

ADD REPLY • link 8.3 years ago by dariober 14k

1

Entering edit mode

Overlaps are removed in mpileup internally by setting the phred scores to 0. It's a nasty hack, but that's how it's done.

ADD REPLY • link 8.3 years ago by Devon Ryan 104k

0

Entering edit mode

For the record, using samtools mpileup doesn't yield the same result as using samtools bedcov -q 1 or depth -Q 1. At least according to my results. I post-processed the outputs from all of the commands and found that mpileup gives a lower coverage than both depth -Q 1 and bedcov -q 1 (which give the same coverage).

Here's the script I used to get the average coverage per window from mpileup's output. Takes the window size (non-sliding) as an argument. Can be modified easily to work with depth.

Example usage:

samtools mpileup alignment.bam | perl windows.pl 10000 > coverage

I hope someone would find it useful:

#! /usr/bin/perl

my $window = shift @ARGV
  or die "Specify a window size!";

while (<>) {
  my $pos = (split)[1];

  $cov += (split)[3]; #change this to 2 for samtools depth output

  unless ($pos % $window) {
    printf "%s\t%s-%s\t%s\n", (split)[0], ($pos - $window), $pos, ($cov / $window);
    $cov = 0;
  }
}

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by novice ★ 1.1k

0

Entering edit mode

Mpileup performs significantly more filtering, which is why you get discordant results.

ADD REPLY • link 8.3 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Denvon, Do you know what kind of filtering mpileup does? I'm not sure which of these options I should rely on. BTW, bedtools coverage gives slightly less average coverage per window than mpileup.

ADD REPLY • link 8.3 years ago by novice ★ 1.1k

0

Entering edit mode

See the defaults for -A, -d and -Q (as well as -x).

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Devon Ryan 104k