I have a WGS BAM file that is fairly large (>150GB) and a smaller BAM file (<5GB) with reads in a small 10Mbp region. I want to (efficiently) merge the two BAM files while retaining reads from only the smaller BAM file in the overlapping 10Mbp region.
My current solution is to first use "bedtools intersect" to remove reads overlapping the 10Mbp region from the big bam file, then to merge this new bam file with the smaller bam using samtools merge.
bedtools intersect -abam big.bam -b 10Mbp.bed -v > temp.bam
samtools merge -o output.bam temp.bam small.bam
Is there a more efficient way to do this? The bedtools command alone is taking a long while to run on the 150GB file.
Thanks for your suggestions. You may have misunderstood my question. I need output.bam to contain all regions in big.bam, not just small.bed. Within the small.bed region, I want reads from small.bam but not big.bam; outside this region I want reads from big.bam.
ah !
Thanks. This ran in about 11 hours. Happy to accept as answer if you would like to edit your original reply above.