Question: Is It Possible To Generate A Sam Header Without Any External Header Information?
2
gravatar for Anima Mundi
6.1 years ago by
Anima Mundi2.4k
Italy
Anima Mundi2.4k wrote:

Hello, I have some SAM files which lack the header. In order to convert them to BAM I need first to add the header, but unfortunately I do not have any header file. Recently, Istvan Albert suggested (see here How to extract unaligned sequences from BAM files obtainend from BWA and here C: Obtaining the consensus sequence from a BAM file in FASTA) to use the Samtools's view option which the -h flag, but I get:

$ samtools view -S -h my_FILE.bam
[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!

Maybe I am using the wrong syntax, I got the same error before (again, see here C: Obtaining the consensus sequence from a BAM file in FASTA). I tried also to use Picard Tools, but when I look at ReplaceSamHeader's manual I see that an input for the header is strictly required.

In brief, is it possible to generate a SAM header without any external header information?

picard samtools sam bam • 11k views
ADD COMMENTlink modified 6.1 years ago by Ryan Thompson3.4k • written 6.1 years ago by Anima Mundi2.4k
1
gravatar for Ryan Thompson
6.1 years ago by
Ryan Thompson3.4k
TSRI, La Jolla, CA
Ryan Thompson3.4k wrote:

The main piece of information that you need that is provided by the SAM header is the information on reference sequence lengths (the "@SQ" lines). Do you have the reference to which the files were mapped? If so, you can generate a minimal SAM header from that. You could simply map a single sequence to the reference using your mapper of choice and take the header of the resulting output file.

ADD COMMENTlink written 6.1 years ago by Ryan Thompson3.4k

I have the same suggestion. You can get a bam file online that used the same reference genome and extract the header from there. Of course, you need to change the sample, readgroup id information but if your present file is not that complicated then it should be easy to do. Also, as your primary purpose is to visualize the bam file it wont really matter even if your header doesnt have the exact library or read group information.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Ashutosh Pandey11k

Thanks guys. The problem is that the genome is unpublished, so I would have to use similar genomes, so even if I succeeded the final output would be not so clean I guess. Summarising, if I understood well to generate a SAM header is unavoidable to grep information somewhere (this makes sense, it would be pointless to add a header to the file format if its information was somehow redundant). I choose this as the accepted answer, because it answers to my question and because it could help people with very similar issues.

ADD REPLYlink written 6.1 years ago by Anima Mundi2.4k
0
gravatar for Pierre Lindenbaum
6.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

you are using the option -S but your input is not SAM but BAM.

ADD COMMENTlink written 6.1 years ago by Pierre Lindenbaum119k

Thanks Pierre but I realised that despite the .bam extension my files are not binary files. They could be BAM-derived SAMs intended to be just visualised.

ADD REPLYlink written 6.1 years ago by Anima Mundi2.4k
5

If you are sure this is a SAM (you should really change the file extension to avoid confusion), you may run samtools faidx ref.fa; samtools view -ht ref.fa.fai myfile.sam, where ref.fa is the reference genome file you used to generate the SAM.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by lh331k

I will change the extension (I kept it in order to avoid to fake the output I had, even if I guess it should be not relevant). Unfortunately I do not have any original reference genome file, I received the files I have from colleagues. Thanks anyway!

ADD REPLYlink written 6.1 years ago by Anima Mundi2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1086 users visited in the last hour