Is It Possible To Generate A Sam Header Without Any External Header Information?
2
2
Entering edit mode
8.7 years ago
Anima Mundi ★ 2.9k

Hello, I have some SAM files which lack the header. In order to convert them to BAM I need first to add the header, but unfortunately I do not have any header file. Recently, Istvan Albert suggested (see here How to extract unaligned sequences from BAM files obtainend from BWA and here C: Obtaining the consensus sequence from a BAM file in FASTA) to use the Samtools's view option which the -h flag, but I get:

\$ samtools view -S -h my_FILE.bam
[samopen] no @SQ lines in the header.


Maybe I am using the wrong syntax, I got the same error before (again, see here C: Obtaining the consensus sequence from a BAM file in FASTA). I tried also to use Picard Tools, but when I look at ReplaceSamHeader's manual I see that an input for the header is strictly required.

In brief, is it possible to generate a SAM header without any external header information?

samtools picard sam bam • 15k views
1
Entering edit mode
8.7 years ago
Ryan Thompson ★ 3.5k

The main piece of information that you need that is provided by the SAM header is the information on reference sequence lengths (the "@SQ" lines). Do you have the reference to which the files were mapped? If so, you can generate a minimal SAM header from that. You could simply map a single sequence to the reference using your mapper of choice and take the header of the resulting output file.

0
Entering edit mode

I have the same suggestion. You can get a bam file online that used the same reference genome and extract the header from there. Of course, you need to change the sample, readgroup id information but if your present file is not that complicated then it should be easy to do. Also, as your primary purpose is to visualize the bam file it wont really matter even if your header doesnt have the exact library or read group information.

0
Entering edit mode

Thanks guys. The problem is that the genome is unpublished, so I would have to use similar genomes, so even if I succeeded the final output would be not so clean I guess. Summarising, if I understood well to generate a SAM header is unavoidable to grep information somewhere (this makes sense, it would be pointless to add a header to the file format if its information was somehow redundant). I choose this as the accepted answer, because it answers to my question and because it could help people with very similar issues.

0
Entering edit mode
8.7 years ago

you are using the option -S but your input is not SAM but BAM.

0
Entering edit mode

Thanks Pierre but I realised that despite the .bam extension my files are not binary files. They could be BAM-derived SAMs intended to be just visualised.

6
Entering edit mode

If you are sure this is a SAM (you should really change the file extension to avoid confusion), you may run samtools faidx ref.fa; samtools view -ht ref.fa.fai myfile.sam, where ref.fa is the reference genome file you used to generate the SAM.

0
Entering edit mode

I will change the extension (I kept it in order to avoid to fake the output I had, even if I guess it should be not relevant). Unfortunately I do not have any original reference genome file, I received the files I have from colleagues. Thanks anyway!