how to split a Bam by chromosome and unplaced scaffolds.
0
0
Entering edit mode
23 months ago
Maxine ▴ 40

I have some big bam files (more than 200GB per file) which need to be split for downstream analysis. My plan was to split the bam by chromosome until I realized there are hundreds of unplaced scaffolds in my reference genome.

The whole reference genome has 747 scaffolds in total where includes 11 chromosomes, which means there are 736 unplaced scaffolds. If I just split it by chromosome, I would lose much information.

In that situation, how should I do to put all reads aligned to unplaced scaffolds in a single bam and split the rest reads by chromosomes?

p.s. Best to use samtools. I used to use bamtools to do the split work, the file generated by bamtools some how lost EOF marker.

Thanks.

samtools bam • 1.0k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you! The accepted answer in that post works!

samtools idxstats in.bam | cut -f1 | grep 'scaffold' | xargs samtools view -o scaffolds.bam in.bam

For the other solution, I'm very curious that how to generate the BED file from a header?

Sorry, My questions are too rudimentary.

ADD REPLY

Login before adding your answer.

Traffic: 1468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6