Question: Different bigwig file sizes between BedgraphToBigwig and bamCoverage
0
gravatar for m93
9 months ago by
m93150
m93150 wrote:

I am trying to convert BAM files to Bigwig format. I was originally using a combination of genomeCoverageBed and BedgraphToBigWig to go from BAM to Bigwig in 2 steps:

samtools sort file.bam -o file.sorted.bam;
genomeCoverageBed -bg -split -ibam file.sorted.bam -g file.chrom.sizes > file.bedgraph;
sort -k1,1 -k2,2n file.bedgraph >file.sorted.bedgraph;
bedGraphToBigWig file.sorted.bedgraph file.chrom.sizes file.bw;

After finding out that deeptools (function called bamCoverage) could convert BAM to Bigwig AND normalize all in one go, I decided to use it. I first decided to not normalize (just for testing purposes)

bamCoverage -b file.bam -o file.bw

This command works but I can't help but notice the output file.bw in both scenarios are drastically different. My starting BAM file is 1.1GB. When using bedGraphToBigWig, my output file is 122MB. When using bamCoverage, it's 23MB.

I know that you can change the --binSize in bamCoverage which will lead to a larger file. I tried a bin of 10 and my file was 45MB.

My question is: what is the relationship between file sizes between the 2 software? I am confused as to why the size is so different. From what I understand, in bamCoverage, the coverage is calculated as the number of reads per bin, where bins are short consecutive counting windows of a defined size. But what is the equivalent setting in BedGraphtoBigwig? Is it -blockSize or maybe itemsPerSlot? Their defaults are 256 and 1024 respectively so I am a bit confused.

I am not sure whether this is something to worry about? Presumably, all it means is that my 2 files (the one produced with BedgraphToBigwig and the one produced with bamCoverage) will have different 'resolutions', when visualized on a something like IGV or UCSC Genome Browser. Is this correct?

ADD COMMENTlink modified 9 months ago • written 9 months ago by m93150
2
gravatar for ATpoint
9 months ago by
ATpoint16k
Germany
ATpoint16k wrote:

The reason is that genomeCoverageBed by default spills out bedGraphs at the base-pair level, so it piles up the depth for every base. In contrast, bamCoverage has a default 50bp window (option -bs, see documentation) that it aggregates reads over. I always use -bs 1 because visually these tracks looks much nicer/smoother than the default -bs 50.

ADD COMMENTlink written 9 months ago by ATpoint16k

Ok makes sense thanks. I tried running it again with -bs = 1, I got a file which is 110M so still not quite the same size as my original .bw but I'm guessing this might not due to slight differences in algorithms or something and perhaps the fact that my settings in genomeCoverageBed also play a role?

ADD REPLYlink written 9 months ago by m93150
1

Or a different compression level. deeptools uses libBigWig as far as I know, and bedGraphToBigWig is part of the kentUtils. File sizes are typically not too informative, especially when talking about binary files.

ADD REPLYlink written 9 months ago by ATpoint16k

Thanks so much, you've been a great help!

ADD REPLYlink written 9 months ago by m93150

You're very welcome =)

ADD REPLYlink written 9 months ago by ATpoint16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2002 users visited in the last hour