Automatic filling of @SQ tags in SAM/BAM with BWA from properly-formatted reference names
1
0
Entering edit mode
8.9 years ago

The SAM/BAM specification says that one can mention the genome assembly of a reference sequence, the species, etc. For this, one only has to use the tags belonging to the record type @SQ, such as AS for the genome assembly identifier.

Can BWA automatically fill these tags based on the names of the reference sequence, given that it is properly formatted?

If yes, how should the reference be formatted?

If no, I guess I should write the header myself and then use "samtools reheader", or do you have another idea?

sam bam bwa @SQ • 2.8k views
ADD COMMENT
1
Entering edit mode
8.9 years ago

The default SQ line should have been created by BWA.

You can provide a custom dict using picard http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary and use "samtools reheader" to change the header.

ADD COMMENT
0
Entering edit mode

Should the "record types" be in that order: @HD, @SQ, @RG, @PG and @CO ? The SAM specification only says "The header line. The first line if present." next to @HD, but nothing for the others.

ADD REPLY

Login before adding your answer.

Traffic: 2023 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6