Hello everyone,
I have 50 BAM files, some of them single-end and some of them paired-end. Well, I want to make a single bigwig file by combining reads from all of these bam files.
For this, I merged all bam files to a single giant bam file(700GB). However, I am getting out of memory issues while sorting this giant bam file.
- Is there any way I could sort this huge bam file?
- Is it ok to merge single end and paired end bam files together?
Have you tested this? bedGraph format should have (afaik) non-overlapping adjacent bins so you would need to also parse the coordinates and transform them. Easy with https://bedtools.readthedocs.io/en/latest/content/tools/unionbedg.html to get a proper bedGraph in terms of the coordinates and then some awk-fu to sum the coverage values.
Thanks alot ATpoint and LChart . I feel, in the end I need to normalize the bedgraph with the total mapped reads(probably the sum total of coverage signals of all bins in this case.)
According do the docs at least
-d -bgashould be giving per-base coverage for every base, including 0-coverage bases; so the outputs should all line up.Yes, but in bedGraph per-base values with identical coverage get binned, so like
is displayed as
so the length of these bins is different between bam files. Not sure what
-ddoes, never used it, but bedGraph is 0-based by definition so-dis probably ignored.Regardless,
bedtools genomecovdoes not need sorted files so you can simply usesamtools catto concat all BAMs and then stream that right intogenomecov. That saves you from any issues as what I describe.