I generated a bam file following the below steps:
indexed reference genome
bwa index reference.fasta
aligned using bwa mem
bwa mem ref.fasta f1.fq.gz f2.fq.gz > out.sam
sam to bam
samtools view -Sb out.sam > out.bam
sorting bam
samtools sort out.bam sorted
Now, I am looking at bam file:
samtools view sorted.bam | less
output
ST-E00318:201:HWN2CCCXX:6:1104:25134:16867 163 NC_026145.1 1 39 15S59M18D76M = 221 370 CGCCTGTACGAAGAGATCTTTTTCTTCTCCTACTCGCGCCGTTGCCGCCACTCGTCGTGCCATTCCGCCGCCGTCGGACCACCACCACCCGACTCGCGCTGGGATGACGAAGCCGACCATCCGGTCAGATCATGGATCCGTTCAGTCGCC AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJA7JFFFJJJJ NM:i:19 AS:i:106 XS:i:116
ST-E00318:201:HWN2CCCXX:6:1111:17787:9660 99 NC_026145.1 1 21 6S144M = 409 558 GAAGAGATCTTTTTCTTCTCCTACTCGCGCCGTTGCCGCCACTCGTCGTGCCATTCCGCCGCCGTCCGGCCACCGTTCGCCGGCGGACCACCACCACCCGACTCGCGCTGGGATGACGAAGCCGACCATCCGGTCAGATCATGGATCCGT AAAFFJJFJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJJFJJJJJJFJJJJJJJAJJAJJJJJAFJJJJJFJJJJJJJJJ7JJJJJJJJJJJFJJJJJJJJJJJJFJJJJJJJFFJJ< NM:i:2 AS:i:134 XS:i:140
Problem 1: Where are the headers? i.e. @HD and @SQ sections!!
Now, if I do:
samtools view -H sorted.bam | less
Headers [@SQ] are there, isnt that strange!!
@SQ SN:NC_026145.1 LN:18577331
@SQ SN:NC_026146.1 LN:18500646
@SQ SN:NC_026147.1 LN:24928530
@SQ SN:NC_026148.1 LN:17356267
@SQ SN:NC_026149.1 LN:18898134
@SQ SN:NC_026150.1 LN:25289714
@SQ SN:NC_026151.1 LN:11725536
@SQ SN:NC_026152.1 LN:21523998
@SQ SN:NC_026153.1 LN:12411895
Problem 2: Where is the @HD section where I can see the SO (sorting status)?
My samtools version:
Program: samtools (Tools for alignments in the SAM format) Version: 0.1.18 (r982:295)
My BWA version:
Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.7.5a-r405
Did anybody go through the same problem? Did I miss something?
"Is my software outdated" should be your first reflex/question if you see something odd.