Question: Mapping shorts reads using Bowtie
gravatar for bioinfo
4.7 years ago by
bioinfo740 wrote:

Hi, I have been trying to map over 50 million short (100 bp) reads (referred to as reads.fasta) to 4 reference genes in a file (~1000 bp each) (referred to as reference.fasta).using Bowtie2. 

bowtie2-build -f reference.fasta Bowtie.mapping   (INDEXING DATAABSE, INDEX NAME)
bowtie2 -x Bowtie.mapping -p 16 -f -U reads.fasta -S file.sam   (BOWTIE RUN)
samtools view -bS file.sam > file.bam  (SAM TO BAM)
samtools sort file.bam file.bam.sorted   (SORTING BAM FILE) 
samtools index file.bam.sorted.bam  (INDEXING BAM FILE)

The .sam file looks like this. I am not sure whether it is correct or not and few of those fields below.


However, the produced indexed bam file was wrong and shows this message in the .bai file.

[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "file.sorted.bam.bai".

So, "EOF marker is absent" is a bug in Samtools so not a problem here, but Bam file has no header. Does an extra -h flag help during SAM to BAM conversion toadd the header? 

samtools view -bS -h file.sam > file.bam        (SAM TO BAM)

UPDATE: I tried with -h flag but it didn't help..!! 

index bowtie mapping samtools • 1.9k views
ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by bioinfo740

-h isn't necessary for SAM-to-BAM. But I thought 'eof marker is absent' is only a bug when reading from STDIN. If so, something probably went wrong during one of the previous steps. Can you read the header from the unsorted bam (samtools view -H) ?

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by thackl2.8k

Yes. I can read the header from both sorted and unsorted bam files. Somehow only the indexing of the bam file is not working. 

samtools view -H file.bam
@HD     VN:1.0  SO:unsorted
@SQ     SN:AAK95987     LN:976
@SQ     SN:BAH23420     LN:979
@SQ     SN:CAC35342     LN:963
@SQ     SN:ACI32876     LN:1085
@SQ     SN:ABG21674     LN:1085
@SQ     SN:WP_000071895 LN:1085
@SQ     SN:BAB12601|    LN:1085
@PG     ID:bowtie2      PN:bowtie2      VN:2.2.3        CL:"/usr/local/bin/bowtie2/bowtie2-align-s --wrapper basic-0 -x Bowtie.mapping -p 16 -f -S file.sam -U reference.fasta"


ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by bioinfo740

I put your header and the one read line you posted into a file and sorted/index it - as expected, no problems there...

Just to be thorough, you posted:

samtools sort file.bam file.bam.sorted

which produces "file.bam.sorted.bam" but you tried to use the index of "file.sorted.bam". That's probably just a posting issue...

ADD REPLYlink written 4.7 years ago by thackl2.8k

That could be just a typing mistake. I have rerun alI the steps again and am still struggling with indexing the sorted bam file though I can read the headers from sam, unsorted bam and sorted bam files. I am surprised that you made it to work in test run with only header and one read line. I will try that as well first.

ADD REPLYlink written 4.7 years ago by bioinfo740
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1975 users visited in the last hour