how to split a Bam by chromosome and unplaced scaffolds.

0

Entering edit mode

23 months ago

Maxine ▴ 40

I have some big bam files (more than 200GB per file) which need to be split for downstream analysis. My plan was to split the bam by chromosome until I realized there are hundreds of unplaced scaffolds in my reference genome.

The whole reference genome has 747 scaffolds in total where includes 11 chromosomes, which means there are 736 unplaced scaffolds. If I just split it by chromosome, I would lose much information.

In that situation, how should I do to put all reads aligned to unplaced scaffolds in a single bam and split the rest reads by chromosomes?

p.s. Best to use samtools. I used to use bamtools to do the split work, the file generated by bamtools some how lost EOF marker.

Thanks.

samtools bam • 1.0k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 23 months ago by Maxine ▴ 40

0

Entering edit mode

Please see the answer here: samtools: splitting a bam file putting all scaffolds together

ADD REPLY • link 23 months ago by GenoMax 141k

0

Entering edit mode

Thank you! The accepted answer in that post works!

samtools idxstats in.bam | cut -f1 | grep 'scaffold' | xargs samtools view -o scaffolds.bam in.bam

For the other solution, I'm very curious that how to generate the BED file from a header?

Sorry, My questions are too rudimentary.

ADD REPLY • link 23 months ago by Maxine ▴ 40

Login before adding your answer.