Hi, I have been trying to map over 50 million short (100 bp) reads (referred to as reads.fasta) to 4 reference genes in a file (~1000 bp each) (referred to as reference.fasta).using Bowtie2.
bowtie2-build -f reference.fasta Bowtie.mapping (INDEXING DATAABSE, INDEX NAME) bowtie2 -x Bowtie.mapping -p 16 -f -U reads.fasta -S file.sam (BOWTIE RUN) samtools view -bS file.sam > file.bam (SAM TO BAM) samtools sort file.bam file.bam.sorted (SORTING BAM FILE) samtools index file.bam.sorted.bam (INDEXING BAM FILE)
The .sam file looks like this. I am not sure whether it is correct or not and few of those fields below.
HISEQ:205:C4GL1ACXX:1:1101:8328:2446 4 * 0 0 * * 0 0 GCCATTCGCGGTTGCAGGGCCTCCATCATTTGCTGTGGCTGCACCGCAGGCGCTTCCTGGAACGTCAACCCTCGTTGCGCCCTCACTGCATCATGCTCCTC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
However, the produced indexed bam file was wrong and shows this message in the .bai file.
[bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [main_samview] fail to read the header from "file.sorted.bam.bai".
So, "EOF marker is absent" is a bug in Samtools so not a problem here, but Bam file has no header. Does an extra -h flag help during SAM to BAM conversion toadd the header?
samtools view -bS -h file.sam > file.bam (SAM TO BAM)
UPDATE: I tried with -h flag but it didn't help..!!