Split bam by multiple chromosome to a single bam file
1
2
Entering edit mode
8.3 years ago
l0o0 ▴ 220

Hi, I am using bowtie + samtools pipeline to call snp. Split bam file and call snp by chromosome will save a lot of time.

But the reference genome have many scaffold, split bam by chromosome will produce a lot of scaffold_bam file. Now I want to split bam by scaffold, so the scaffold will split into one file.

Is there any way to do that?

I tried to use command:

samtools view in.bam scaffold1 scaffold2 -b > scaffold1_2.bam

but I don't know how to check if scaffold1_2.bam contains scaffold1 and scaffold2.

Thanks

snp genome • 6.0k views
ADD COMMENT
0
Entering edit mode

I've tried to do the command above, the output contains 2 scaffolds!

samtools view in.bam scaffold1 scaffold2 -b > scaffold1_2.bam

it works

ADD REPLY
2
Entering edit mode
8.3 years ago

Maybe something on these lines?

First prepare a string of required scaffold names. You can extract all the scaffold names for the bam header and use grep to get only those matching a certain patter. In this example get only scaffolds starting with "chr1":

chroms=`samtools view -H in.bam \
| awk '$1 == "@SQ" {sub("SN:", "", $2); print $2}' \
| grep -P '^chr1.*'`

Then pass this string to samtools. If the string is really long, you might need xargs to split it otherwise you exceed the maximum length of a single command (assuming you are on *nixsystem):

echo $chroms | xargs samtools view -b in.bam > scaffold1_2.bam

(Not fully tested...)

ADD COMMENT
0
Entering edit mode

Hi dariober. I've tested your commands, and it works. Thanks for your replay.

ADD REPLY

Login before adding your answer.

Traffic: 2283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6