bedGraphToBigWig: Missing Genome Coordinates
1
0
Entering edit mode
8 months ago
vanbelj ▴ 40

I have a somewhat complicated pipeline in which bam files are converted from bam > bed > bedGraph > bigwig. The final bigwig files are missing genome ranges that have a coverage value of 0, despite a chromosome size file, S288C_R64.fa.fai, being referenced during file type conversion. The missing genome coordinates are causing an error in downstream analysis in R: when I import the bigwig files using rtracklayer ranges are missing (because they are missing in the bigwig file).

How can I force genome coordinates with 0 coverage to be included in the final bigwig file, when converting from a bed file? TYIA!

1) Convert bam to bed (bamtobed, bedtools), then modify read coordinates with awk.

bedtools bamtobed -i input.bam | bedtools sort -g S288C_R64.fa.fai > output.bed

output.bed>
chrI    877 878 K00408:348:HT7MVBBXY:5:1207:11454:46592 42  -
chrI    942 943 K00408:348:HT7MVBBXY:5:1211:24982:31523 35  +
chrI    945 946 K00408:348:HT7MVBBXY:5:1119:5954:27180  42  -
chrI    945 946 K00408:348:HT7MVBBXY:5:2119:3853:33141  42  -
chrI    971 972 K00408:348:HT7MVBBXY:5:2201:1793:46065  42  +

2) Convert bed to bedGraph (genomecov, bedtools)

bedtools genomecov -i output.bed -bg -g S288C_R64.fa.fai | sort -k1,1 -k2,2n > output.bedGraph

output.bedGraph>
chrI    877 878 1
chrI    942 943 1
chrI    945 946 2
chrI    971 972 1

4) Convert bedGraph to bigwig (bedGraphToBigWig). I converted bigwig to wig to get a human readable file to troubleshoot.

bedGraphToBigWig output.bedGraph S288C_R64.fa.fai output.bigwig
bigWigToWig output.bigwig output.wig

output.wig>
#bedGraph section chrI:877-32548
chrI    877 878 1
chrI    942 943 1
chrI    945 946 2
chrI    971 972 1

Note the coordinates above chrI:877-32548

If I convert the same bam file with deeptools, I get the proper "headers" for the genome

bamCoverage  -b input.bam -o output2.bigwig -of bigwig -bs 1 -p max --effectiveGenomeSize 12000000
bigWigToWig output2.bigwig output2.wig

output2.wig>
#bedGraph section chrI:0-33818
chrI    0   877 0
chrI    877 878 1
chrI    878 942 0
chrI    942 943 1
chrI    943 945 0
chrI    945 946 2
chrI    946 971 0
chrI    971 972 1

Note the coordinates above chrI:0-33818, and the inclusion of coverage for every base even if the value is 0.

bamtobed bed • 533 views
ADD COMMENT
2
Entering edit mode
8 months ago
vanbelj ▴ 40

I found the issue. In order to include genome locations with 0 coverage in a bedGraph file, you must use the following flag/option: -bga, whereas I was only using -bg. Replacing Step two above with following fixed the issue.

bedtools genomecov -i output.bed -bga -g S288C_R64.fa.fai | sort -k1,1 -k2,2n > output.bedGraph
ADD COMMENT

Login before adding your answer.

Traffic: 1407 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6