Count strand-specific 5' / 3' coverage across whole genome for paired-end fragments
1
0
Entering edit mode
3.1 years ago
acvill ▴ 350

I'd like to use bedtools to create strand-specific per-base coverage profiles for my paired-end bam alignments of stranded RNA-seq data. Existing Biostars answers here and here suggest using bedtools genomecov with the -5 option to extract only coverage from the 5' ends of fragments. These are the commands that I've tried to implement with bedtools v2.29.2:

bedtools genomecov -pc -3 -d -strand + -ibam sample.bam > sample_plus_3p.txt
bedtools genomecov -pc -3 -d -strand - -ibam sample.bam > sample_minus_3p.txt
bedtools genomecov -pc -5 -d -strand + -ibam sample.bam > sample_plus_5p.txt
bedtools genomecov -pc -5 -d -strand - -ibam sample.bam > sample_minus_5p.txt

My expectation when running these commands is that I will get a per-base genome coverage reports (-d) for the ends (-3/-5) of my paired-end fragments (-pc) in a strand-specific manner (-strand), with zeros given at all positions where the end of a paired-end read does not fall. No error is given running the above commands, and I indeed get a report listing every position in my genome. However, checksum values for the different files indicate that the -3 and -5 options are giving the same results.

Input1

bedtools genomecov -pc -3 -d -strand + -ibam sample.bam | md5sum
bedtools genomecov -pc -3 -d -strand - -ibam sample.bam | md5sum
bedtools genomecov -pc -5 -d -strand + -ibam sample.bam | md5sum
bedtools genomecov -pc -5 -d -strand - -ibam sample.bam | md5sum

Output1

d93beb0747c29913877fc3aa20d2b9f9  -
8010aaaa685d55dd627f48e9cf5d9328  -
d93beb0747c29913877fc3aa20d2b9f9  -
8010aaaa685d55dd627f48e9cf5d9328  -

This is true for both per-base reports (-d) and BedGraph outputs (-bga). I get the same checksum values when I exclude the -3 and -5 options, indicating that these flags are ignored in the presence of the other parameters, and bedtools is giving coverage across full fragment intervals.

Input2

bedtools genomecov -pc -d -strand + -ibam sample.bam | md5sum
bedtools genomecov -pc -d -strand - -ibam sample.bam | md5sum

Output2

d93beb0747c29913877fc3aa20d2b9f9  -
8010aaaa685d55dd627f48e9cf5d9328  -

When I take away the -pc option, I get the desired behavior (i.e. different results for 3' and 5' ends).

Input3

bedtools genomecov -d -strand + -ibam sample.bam | md5sum
bedtools genomecov -d -3 -strand + -ibam sample.bam | md5sum
bedtools genomecov -d -5 -strand + -ibam sample.bam | md5sum
bedtools genomecov -d -strand - -ibam sample.bam | md5sum
bedtools genomecov -d -3 -strand - -ibam sample.bam | md5sum
bedtools genomecov -d -5 -strand - -ibam sample.bam | md5sum

Output3

92459f43d4a9a229545d93552d53eab3  -
3f03d8707c0cdfbc4ebec1f539a91541  -
9d8d6f3275a01781ff044a2fe5ac8368  -
68cd7d76828f1289b71fd934fd032487  -
85dbc711cb38961a6d6d253581544fd6  -
7cca5f34bfd95e296d8e3b3cb8b091a7  -

However, the paired-end nature of my data is not handled correctly without the -pc option. I can tell because paired reads are being mapped to opposite strands, where they should be mapping to the strand indicated by the first read in the pair.

My Questions

Am I misunderstanding how genomecov parses bam files? Do I need to need to include the -du option (which has been eliminated from the help page for v2.30.0)? What are some other strategies by which I can get the per-base coverage reports in the format that I need?

From bedtools genomecov -h:

-du Change strand af [sic] the mate read (so both reads from the same strand) useful for strand specific

Edit

I think the new definition of the -pc option may be enlightening.

bedtools version 2.29.2

-pc Calculate coverage of pair-end fragments.

bedtools version 2.30.0

-pc Calculates coverage of intervals from left point of a pair reads to the right point.

bedtools RNA-seq bam strand-specific • 2.6k views
ADD COMMENT
1
Entering edit mode

Yes, -pc , -5 and -3 options should be mutually exclusive. It would be best if the code threw at least a warning, but -pc has been added relatively recently to the list of bedtools options and is perhaps a bit less stable (I also had a weird and probably unrelated -pc issue a few years ago)/

Regarding your workaround, I used something similar in the past so it should work. Good job in figuring it out by yourself !

ADD REPLY
0
Entering edit mode

Thanks for your reply. I'm finding that bedtools is a great suite of tools, but it has its quirks

ADD REPLY
2
Entering edit mode
3.1 years ago
acvill ▴ 350

My workaround to extract coverage data from the regions of alignments corresponding to fragment ends was to pre-filter my alignments using samtools.

For my library structure, the 5' end of the first read in each pair gives the 3' end of the transcript on the opposite strand, so coverage at 3' fragment ends was calculated like this:

# read paired, mapped in proper pair, and first in pair
# NOT read unmapped, NOT mate unmapped, NOT not primary alignment, NOT secondary alignment
# flip strand

samtools view -@ 8 -f 67 -F 2316 -b $bam | \
        bedtools genomecov -5 -d -strand + -ibam - \
        > minus_3p.cov
samtools view -@ 8 -f 67 -F 2316 -b $bam | \
        bedtools genomecov -5 -d -strand - -ibam - \
        > plus_3p.cov

Likewise, the 5' end of the second read in each pair corresponds to the 5' end of the transcript. However, because reads in proper pairs map to opposite strands, no strand flipping was necessary:

# read paired, mapped in proper pair, and second in pair
# NOT read unmapped, NOT mate unmapped, NOT not primary alignment, NOT secondary alignment

samtools view -@ 10 -f 131 -F 2316 -b $bam | \
        bedtools genomecov -5 -d -strand + -ibam - \
        > plus_5p.cov
samtools view -@ 10 -f 131 -F 2316 -b $bam | \
        bedtools genomecov -5 -d -strand - -ibam - \
        > minus_5p.cov

Tools

  • samtools v1.11
  • bedtools v2.29.2
  • alignments made with bwa v0.7.17
ADD COMMENT

Login before adding your answer.

Traffic: 805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6