I need help with visualising a .bam file.
So, what I did up to now was:
I had one .fastq of 3.9 MB file containing many reads generated by Metrichor (I am sequencing using the Nanopore MinION). I aligned the reads to a reference .fasta sequence using the package ngmlr. I got a .sam file of 4.3 MB. The command was:
ngmlr -t 4 -r ref.fasta -q reads.fastq -o test.sam -x ont
I converted this .sam file to a .bam file using samtools. I got a .bam file of 2 MB. The command I used was:
samtools view -bT ref.fasta test.sam > test.bam
I sorted the .bam file with samtools. The command was:
samtools sort test.bam > test_sorted.bam
I indexed the .bam file using samtools. I got a .bai file of the same name of 96 bytes. The command was:
samtools index test_sorted.bam
Now I tried opening the .bam file in IGV_2.4.10. But I received the following error message:
Error loading BAM file: htsjdk.samtools.SAMFormatException: Invalid BAM file header: missing sequence name in file
What did I do wrong?
Ah that would have been great! But same problem... Any other ideas?
The first few lines of my
.samfile (just before the conversion to.bamstep) looks like that:Does that look OK to you?
Ok so I tried adding something in the
@SQ SN:header of the.samfile, as it seems like this is the "sequence name". But it does not look like the conversion to.bamworked fine. I have this message:Exactly the same number of times as my number of aligned reads, which does not sound very good.
If I continue anyway with sorting and indexing, the error message from IGV is:
I also tried adding
chr20in the.samfile header, exact same.So I guess the problem does come from there, but I am not sure how I should name the sequence... Also, to be clear, my reference sequence in
.fastaI am aligning to is a very short sequence (around 1kb), not a full genome or anything. Is that OK to do?Thanks
Please use
ADD COMMENTorADD REPLYto answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.You should avoid manual tampering with bams.
This won't work, because each mapped read will still have a wrong chromosome name. But i wonder how chromosome names don't match, as apparently you used the same reference at all steps. Did you load the same reference genome on IGV before opening the bam file?
Also, what are the outputs of:
And
I created this reference myself, so it is whatever I called it:
grep ">" ref.fasta | headAfter that, the
.fastareference line is just a one line sequence of ~ 1k characters in lower case.And
samtools view align.sam | headAnd many many lines of alignments. It is not giving the header for some reason, is that what you expected? Just opening
align.samin a text editor, I can see the actual header, which looks like:@HD VN:1.0 SO:unsorted@SQ SN: LN:877@PG ID:ngmlr PN:nextgenmap-lr VN:0.2.6 CL:ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont(Notice that there is nothing after
SN:).Just so I summarise the sequence of commands I am doing right now:
ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ontsamtools view align.sam -o align.bamsamtools sort align.bam > sort.bamsamtools index sort.bamGenomes>Load genome from file. And I selectref.fasta.sort.bamin IGV.Error loading BAM file: htsjdk.samtools.SAMFormatException: Invalid BAM file header: missing sequence name in fileDoes the name of the reference need to match something in the
.samfile by any chance?