Is it possible to split BAM without sorting it?
1
1
Entering edit mode
2.2 years ago
tomas4482 ▴ 430

I have some large bam files for a pipeline. But the RAM ran out when I was performing the pipe. Sorting bam will cause fatal bugs to my pipeline and cannot be debuged for now, hence not the solution. The only option is to downsize the input (upgrading the RAM cannot be done for now due to the high expense).

Then I thought I could split the original bam to several smaller bam files. But I cannot skip the sorting step. E.g. samtools view requires sorting and indexing before spliting.

Is there other way to do this?

Thank you.

samtools BAM bamtools • 2.0k views
ADD COMMENT
0
Entering edit mode

samtools view -h should not need sorted BAM. If you split the BAM file (creating muiltiple intermediate SAM files) make sure you add the header to all the pieces otherwise the files may be unusable.

ADD REPLY
0
Entering edit mode

I've tried. It requires an indexed bam. But indexing requires sorted BAM.

ADD REPLY
0
Entering edit mode

How so? You are simply converting a BAM file to SAM using samtools view -h.

ADD REPLY
0
Entering edit mode

Here is the command I used to split the bam. samtools view -h -@ 48 sampleAligned.out.bam chr1 > sample_1.bam. Because I specified the region, it requires sorted and indexed BAM.

May I ask how to create splitted BAM referring to your method? Thanks.

ADD REPLY
1
Entering edit mode

Because I specified the region

This is new information. Your post originally had not asked about splitting a BAM not based on a specific region. If you need to do this then indeed the file will have to be sorted.

If you don't have the resources, you could split the unsorted file first. Grab regions you need from the pieces after sorting/indexing them, merge the region specific files and then sort them again. This will be a lot more work but will eventually get you what you need.

ADD REPLY
3
Entering edit mode
2.2 years ago
LChart 4.7k

Using samtools alone the bam file needs to be sorted. However at the cost of materializing the actual string values, you can use awk to do this:

samtools view -h $bam | awk 'substr($1,0,1) == "@" || $3 == "chr1"' | samtools view -hb > chr1.bam

etc.

ADD COMMENT
0
Entering edit mode

Thank you. It is what I need. Besides, I found bamtools split -reference has similar effect and is more convenient for such purpose.

ADD REPLY
0
Entering edit mode

May I ask one more question? Is it possible or reasonable to split BAM by lines? E.g a BAM has 10000 lines. Can I split the BAM into two 5000-line BAM?

ADD REPLY
1
Entering edit mode

Not all in one go, because you need to replicate the header for each file; but awk works here too:

awk 'substr($1,0,1) == "@" || (NR >= 5001 && NR < 10000)'

You will need to increase 5001 and 10000 to also include the number of lines in your header; but you get the idea.

ADD REPLY
0
Entering edit mode

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2206 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6