Question: "Invalid BAM file header: missing sequence name in file" when trying to visualise .bam file in IGV
0
gravatar for francois
8 months ago by
francois10
francois10 wrote:

I need help with visualising a .bam file.

So, what I did up to now was:

I had one .fastq of 3.9 MB file containing many reads generated by Metrichor (I am sequencing using the Nanopore MinION). I aligned the reads to a reference .fasta sequence using the package ngmlr. I got a .sam file of 4.3 MB. The command was:

ngmlr -t 4 -r ref.fasta -q reads.fastq -o test.sam -x ont

I converted this .sam file to a .bam file using samtools. I got a .bam file of 2 MB. The command I used was:

samtools view -bT ref.fasta test.sam > test.bam

I sorted the .bam file with samtools. The command was:

samtools sort test.bam > test_sorted.bam

I indexed the .bam file using samtools. I got a .bai file of the same name of 96 bytes. The command was:

 samtools index test_sorted.bam

Now I tried opening the .bam file in IGV_2.4.10. But I received the following error message:

Error loading BAM file: htsjdk.samtools.SAMFormatException: Invalid BAM file header: missing sequence name in file

What did I do wrong?

igv bam alignment • 535 views
ADD COMMENTlink modified 8 months ago by WouterDeCoster36k • written 8 months ago by francois10

Ah that would have been great! But same problem... Any other ideas?

The first few lines of my .sam file (just before the conversion to .bam step) looks like that:

@HD VN:1.0 SO:unsorted

@SQ SN: LN:877

@PG ID:ngmlr PN:nextgenmap-lr VN:0.2.6 CL:ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont

01c372ae-0d41-4ba4-bd0c-c56498801f53 4 * 0 0 * TGGGACGCTCGTTCAGTTA

Does that look OK to you?

ADD REPLYlink modified 8 months ago • written 8 months ago by francois10

Ok so I tried adding something in the @SQ SN: header of the .sam file, as it seems like this is the "sequence name". But it does not look like the conversion to .bam worked fine. I have this message:

[W::sam_parse1] urecognized reference name; treated as unmapped

Exactly the same number of times as my number of aligned reads, which does not sound very good.

If I continue anyway with sorting and indexing, the error message from IGV is:

does not contain any sequence names which match the current genome.

File:      PRNP, # that's the name I added

Genome: chr1, chr2, chr3, chr4, ...

I also tried adding chr20 in the .sam file header, exact same.

So I guess the problem does come from there, but I am not sure how I should name the sequence... Also, to be clear, my reference sequence in .fasta I am aligning to is a very short sequence (around 1kb), not a full genome or anything. Is that OK to do?

Thanks

ADD REPLYlink modified 8 months ago • written 8 months ago by francois10
1

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

You should avoid manual tampering with bams.

ADD REPLYlink modified 8 months ago • written 8 months ago by WouterDeCoster36k
1

I also tried adding chr20 in the .sam file header, exact same.

This won't work, because each mapped read will still have a wrong chromosome name. But i wonder how chromosome names don't match, as apparently you used the same reference at all steps. Did you load the same reference genome on IGV before opening the bam file?

Also, what are the outputs of:

grep ">" reference.fasta | head

And

samtools view test.sam | head
ADD REPLYlink modified 8 months ago • written 8 months ago by h.mon23k

I created this reference myself, so it is whatever I called it:

grep ">" ref.fasta | head

106_46_amplicons_ref

After that, the .fasta reference line is just a one line sequence of ~ 1k characters in lower case.

And

samtools view align.sam | head

37d3e25f-875a-4cac-967a-b26d51342613 4 * 0 0 * * 0 0 CACGAAAAAAAA

And many many lines of alignments. It is not giving the header for some reason, is that what you expected? Just opening align.sam in a text editor, I can see the actual header, which looks like:

@HD VN:1.0 SO:unsorted @SQ SN: LN:877 @PG ID:ngmlr PN:nextgenmap-lr VN:0.2.6 CL:ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont

(Notice that there is nothing after SN:).

Just so I summarise the sequence of commands I am doing right now:

  • ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont
  • samtools view align.sam -o align.bam
  • samtools sort align.bam > sort.bam
  • samtools index sort.bam
  • Changing the reference genome in IGV with Genomes > Load genome from file. And I select ref.fasta.
  • Drag and drop sort.bam in IGV.
  • Error message: Error loading BAM file: htsjdk.samtools.SAMFormatException: Invalid BAM file header: missing sequence name in file

Does the name of the reference need to match something in the .sam file by any chance?

ADD REPLYlink modified 8 months ago • written 8 months ago by francois10
2
gravatar for WouterDeCoster
8 months ago by
Belgium
WouterDeCoster36k wrote:

You can simplify your commands substantially and avoid intermediate files:

zcat reads.fastq.gz | ngmlr --presets ont -t 8 -r genome.fa | samtools sort  -o alignment.bam -

Which should work okay in IGV.

ADD COMMENTlink written 8 months ago by WouterDeCoster36k

True. I don't really know how that would help solve my problem though, it is the same as running the commands sequentially.

ADD REPLYlink written 8 months ago by francois10

It's not the same as your commands, since you apparently messed with the header. Anyway, I use that command often and it works fine for me, also in igv.

ADD REPLYlink written 8 months ago by WouterDeCoster36k
1
gravatar for h.mon
8 months ago by
h.mon23k
Brazil
h.mon23k wrote:

I believe you need samtools view -bhT ref.fasta test.sam > test.bam, otherwise the sam header is lost.

-h include header in SAM output

ADD COMMENTlink written 8 months ago by h.mon23k
1

You only need:

samtools view test.sam -o test.bam

(assuming a reasonably recent version of samtools)

ADD REPLYlink written 8 months ago by WouterDeCoster36k

No, -b implies -h, so this cannot be the issue.

ADD REPLYlink written 8 months ago by ATpoint13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 869 users visited in the last hour