Custom extract reads from a sam/bam file into a valid sam/bam
2
0
Entering edit mode
6.3 years ago
shwethacm ▴ 240

Hi there,

I am trying to "brute-force" split a bam file using grep. The reason why I am doing this is because I am trying to split by a custom tag that looks like these (bc would be the tag):
bc:B:S,1,1
bc:B:S,0,0
bc:B:S,2,2

(( But I cannot used bamtools because "bc" is not a recognized tag )). My file is small-sih (<3 GB) so I used grep to make my split files and appended the header to each sub-file.

But something is wrong because I cannot index my new sub-files. I ran ValidateSamFile from picard, and it is pointing to a missing tag in the header ("Error parsing SAM header. @RG line missing SM tag.") but not the one that I am interested in.

Does anyone know what I am doing wrong ?

next-gen alignment genome • 3.2k views
ADD COMMENT
2
Entering edit mode
6.3 years ago

ran ValidateSamFile from picard, and it is pointing to a missing tag in the header ("Error parsing SAM header. @RG line missing SM tag."

that because you're ignoring the bam header

use

samtools view -h in.bam | grep  -E '(^@|bc\:B\:S,1,1)' | samtools view -S -b -o out.bam -

EDIT: when using ValidateSamFile you must also specify what should(not) be checked: http://broadinstitute.github.io/picard/command-line-overview.html#ValidateSamFile

e.g:

IGNORE=RECORD_MISSING_READ_GROUP
ADD COMMENT
0
Entering edit mode
6.3 years ago
shwethacm ▴ 240

Gotcha!

I realized that the mistake I was making was in appending the header separately in the end, which resulted in an ad-hoc mush of a samfile. Your method is so much more elegant!

Thanks, Pierre!

ADD COMMENT

Login before adding your answer.

Traffic: 3926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6