Question

making a BIGWIG from BAM file

0

Entering edit mode

2.7 years ago

Rajendra KC ▴ 20

Hello everyone,

I have 50 BAM files, some of them single-end and some of them paired-end. Well, I want to make a single bigwig file by combining reads from all of these bam files.

For this, I merged all bam files to a single giant bam file(700GB). However, I am getting out of memory issues while sorting this giant bam file.

Is there any way I could sort this huge bam file?
Is it ok to merge single end and paired end bam files together?

bam samtools bigwig • 13k views

ADD COMMENT • link updated 2.7 years ago by ATpoint 89k • written 2.7 years ago by Rajendra KC ▴ 20

score 1 · Answer 1 · 2023-02-23

1

Entering edit mode

2.7 years ago

LChart 5.1k

Is there any reason you need to merge the bam before converting to bigwig? You could use

bedtools genomeCoverageBed -d -bga -ibam $bam hg38.chromInfo.txt > $bam.cov

Then you can simply sum the depths a la

paste $cov1 $cov2 | awk '{print $1,$2,$3,$4+$8}' > merged.bg

How you want to do this (sequentially, hierarchically, all at once) is up to you.

and then run bedGraphToBigWig for the conversion.

ADD COMMENT • link 2.7 years ago by LChart 5.1k

1

Entering edit mode

Have you tested this? bedGraph format should have (afaik) non-overlapping adjacent bins so you would need to also parse the coordinates and transform them. Easy with https://bedtools.readthedocs.io/en/latest/content/tools/unionbedg.html to get a proper bedGraph in terms of the coordinates and then some awk-fu to sum the coverage values.

ADD REPLY • link 2.7 years ago by ATpoint 89k

0

Entering edit mode

Thanks alot ATpoint and LChart . I feel, in the end I need to normalize the bedgraph with the total mapped reads(probably the sum total of coverage signals of all bins in this case.)

ADD REPLY • link 2.7 years ago by Rajendra KC ▴ 20

0

Entering edit mode

According do the docs at least -d -bga should be giving per-base coverage for every base, including 0-coverage bases; so the outputs should all line up.

ADD REPLY • link 2.7 years ago by LChart 5.1k

0

Entering edit mode

Yes, but in bedGraph per-base values with identical coverage get binned, so like

chr1   1   2   3
chr1   2   3   3
chr1   3   4   3

is displayed as

chr1   1   4   3

so the length of these bins is different between bam files. Not sure what -d does, never used it, but bedGraph is 0-based by definition so -d is probably ignored.

Regardless, bedtools genomecov does not need sorted files so you can simply use samtools cat to concat all BAMs and then stream that right into genomecov. That saves you from any issues as what I describe.

ADD REPLY • link 2.7 years ago by ATpoint 89k

score 0 · Answer 2 · 2023-02-24

0

Entering edit mode

2.7 years ago

colindaven 8.0k

Why do it the hard way ?

deeptools bamCoverage will make a bigwig for you straight from a bam without any awk hacking.

https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html

ADD COMMENT • link 2.7 years ago by colindaven 8.0k

0

Entering edit mode

Here, I am making a single bigwig file from reads from multiple bam files(total 700GB). If I'm gonna use bamcoverage, I'm gonna need to have a single giant bam file. With the available space I have, I can't even get the merged bam file this big sorted, thusI'm suggested here to make multiple bedgraph for each bam and add them to get single bed graph, and eventually convert bed graph to bigwig.

ADD REPLY • link 2.7 years ago by Rajendra KC ▴ 20