I'd like to use bedtools to create strand-specific per-base coverage profiles for my paired-end bam alignments of stranded RNA-seq data. Existing Biostars answers here and here suggest using bedtools genomecov
with the -5
option to extract only coverage from the 5' ends of fragments. These are the commands that I've tried to implement with bedtools v2.29.2:
bedtools genomecov -pc -3 -d -strand + -ibam sample.bam > sample_plus_3p.txt
bedtools genomecov -pc -3 -d -strand - -ibam sample.bam > sample_minus_3p.txt
bedtools genomecov -pc -5 -d -strand + -ibam sample.bam > sample_plus_5p.txt
bedtools genomecov -pc -5 -d -strand - -ibam sample.bam > sample_minus_5p.txt
My expectation when running these commands is that I will get a per-base genome coverage reports (-d
) for the ends (-3
/-5
) of my paired-end fragments (-pc
) in a strand-specific manner (-strand
), with zeros given at all positions where the end of a paired-end read does not fall. No error is given running the above commands, and I indeed get a report listing every position in my genome. However, checksum values for the different files indicate that the -3
and -5
options are giving the same results.
Input1
bedtools genomecov -pc -3 -d -strand + -ibam sample.bam | md5sum
bedtools genomecov -pc -3 -d -strand - -ibam sample.bam | md5sum
bedtools genomecov -pc -5 -d -strand + -ibam sample.bam | md5sum
bedtools genomecov -pc -5 -d -strand - -ibam sample.bam | md5sum
Output1
d93beb0747c29913877fc3aa20d2b9f9 -
8010aaaa685d55dd627f48e9cf5d9328 -
d93beb0747c29913877fc3aa20d2b9f9 -
8010aaaa685d55dd627f48e9cf5d9328 -
This is true for both per-base reports (-d
) and BedGraph outputs (-bga
). I get the same checksum values when I exclude the -3
and -5
options, indicating that these flags are ignored in the presence of the other parameters, and bedtools is giving coverage across full fragment intervals.
Input2
bedtools genomecov -pc -d -strand + -ibam sample.bam | md5sum
bedtools genomecov -pc -d -strand - -ibam sample.bam | md5sum
Output2
d93beb0747c29913877fc3aa20d2b9f9 -
8010aaaa685d55dd627f48e9cf5d9328 -
When I take away the -pc
option, I get the desired behavior (i.e. different results for 3' and 5' ends).
Input3
bedtools genomecov -d -strand + -ibam sample.bam | md5sum
bedtools genomecov -d -3 -strand + -ibam sample.bam | md5sum
bedtools genomecov -d -5 -strand + -ibam sample.bam | md5sum
bedtools genomecov -d -strand - -ibam sample.bam | md5sum
bedtools genomecov -d -3 -strand - -ibam sample.bam | md5sum
bedtools genomecov -d -5 -strand - -ibam sample.bam | md5sum
Output3
92459f43d4a9a229545d93552d53eab3 -
3f03d8707c0cdfbc4ebec1f539a91541 -
9d8d6f3275a01781ff044a2fe5ac8368 -
68cd7d76828f1289b71fd934fd032487 -
85dbc711cb38961a6d6d253581544fd6 -
7cca5f34bfd95e296d8e3b3cb8b091a7 -
However, the paired-end nature of my data is not handled correctly without the -pc
option. I can tell because paired reads are being mapped to opposite strands, where they should be mapping to the strand indicated by the first read in the pair.
My Questions
Am I misunderstanding how genomecov
parses bam files? Do I need to need to include the -du
option (which has been eliminated from the help page for v2.30.0)? What are some other strategies by which I can get the per-base coverage reports in the format that I need?
From bedtools genomecov -h
:
-du
Change strand af [sic] the mate read (so both reads from the same strand) useful for strand specific
Edit
I think the new definition of the -pc
option may be enlightening.
bedtools version 2.29.2
-pc
Calculate coverage of pair-end fragments.
bedtools version 2.30.0
-pc
Calculates coverage of intervals from left point of a pair reads to the right point.
Yes,
-pc
,-5
and-3
options should be mutually exclusive. It would be best if the code threw at least a warning, but -pc has been added relatively recently to the list of bedtools options and is perhaps a bit less stable (I also had a weird and probably unrelated-pc
issue a few years ago)/Regarding your workaround, I used something similar in the past so it should work. Good job in figuring it out by yourself !
Thanks for your reply. I'm finding that bedtools is a great suite of tools, but it has its quirks