Question

Mapping shorts reads using Bowtie

1

Entering edit mode

9.0 years ago

bioinfo ▴ 840

Hi, I have been trying to map over 50 million short (100 bp) reads (referred to as reads.fasta) to 4 reference genes in a file (~1000 bp each) (referred to as reference.fasta).using Bowtie2.

bowtie2-build -f reference.fasta Bowtie.mapping   (INDEXING DATAABSE, INDEX NAME)
bowtie2 -x Bowtie.mapping -p 16 -f -U reads.fasta -S file.sam   (BOWTIE RUN)
samtools view -bS file.sam > file.bam  (SAM TO BAM)
samtools sort file.bam file.bam.sorted   (SORTING BAM FILE) 
samtools index file.bam.sorted.bam  (INDEXING BAM FILE)

The .sam file looks like this. I am not sure whether it is correct or not and few of those fields below.

HISEQ:205:C4GL1ACXX:1:1101:8328:2446    4       *       0       0       *       *       0       0       GCCATTCGCGGTTGCAGGGCCTCCATCATTTGCTGTGGCTGCACCGCAGGCGCTTCCTGGAACGTCAACCCTCGTTGCGCCCTCACTGCATCATGCTCCTC   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   YT:Z:UU

However, the produced indexed bam file was wrong and shows this message in the .bai file.

[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "file.sorted.bam.bai".

So, "EOF marker is absent" is a bug in Samtools so not a problem here, but Bam file has no header. Does an extra -h flag help during SAM to BAM conversion to add the header?

samtools view -bS -h file.sam > file.bam        (SAM TO BAM)

UPDATE: I tried with -h flag but it didn't help!

index bowtie mapping samtools • 3.4k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 9.0 years ago by bioinfo ▴ 840

0

Entering edit mode

-h isn't necessary for SAM-to-BAM. But I thought 'eof marker is absent' is only a bug when reading from STDIN. If so, something probably went wrong during one of the previous steps. Can you read the header from the unsorted bam (samtools view -H)?

ADD REPLY • link updated 20 months ago by Ram 44k • written 9.0 years ago by thackl ★ 3.0k

0

Entering edit mode

Yes. I can read the header from both sorted and unsorted bam files. Somehow only the indexing of the bam file is not working.

samtools view -H file.bam
@HD     VN:1.0  SO:unsorted
@SQ     SN:AAK95987     LN:976
@SQ     SN:BAH23420     LN:979
@SQ     SN:CAC35342     LN:963
@SQ     SN:ACI32876     LN:1085
@SQ     SN:ABG21674     LN:1085
@SQ     SN:WP_000071895 LN:1085
@SQ     SN:BAB12601|    LN:1085
@PG     ID:bowtie2      PN:bowtie2      VN:2.2.3        CL:"/usr/local/bin/bowtie2/bowtie2-align-s --wrapper basic-0 -x Bowtie.mapping -p 16 -f -S file.sam -U reference.fasta"

ADD REPLY • link updated 20 months ago by Ram 44k • written 9.0 years ago by bioinfo ▴ 840

0

Entering edit mode

I put your header and the one read line you posted into a file and sorted/index it - as expected, no problems there...

Just to be thorough, you posted:

samtools sort file.bam file.bam.sorted

which produces "file.bam.sorted.bam" but you tried to use the index of "file.sorted.bam". That's probably just a posting issue...

ADD REPLY • link updated 20 months ago by Ram 44k • written 9.0 years ago by thackl ★ 3.0k

0

Entering edit mode

That could be just a typing mistake. I have rerun all the steps again and am still struggling with indexing the sorted bam file though I can read the headers from sam, unsorted bam and sorted bam files. I am surprised that you made it to work in test run with only header and one read line. I will try that as well first.

ADD REPLY • link updated 20 months ago by Ram 44k • written 9.0 years ago by bioinfo ▴ 840