Mapping shorts reads using Bowtie
0
1
Entering edit mode
7.2 years ago
bioinfo ▴ 830

Hi, I have been trying to map over 50 million short (100 bp) reads (referred to as reads.fasta) to 4 reference genes in a file (~1000 bp each) (referred to as reference.fasta).using Bowtie2. 

bowtie2-build -f reference.fasta Bowtie.mapping   (INDEXING DATAABSE, INDEX NAME)
bowtie2 -x Bowtie.mapping -p 16 -f -U reads.fasta -S file.sam   (BOWTIE RUN)
samtools view -bS file.sam > file.bam  (SAM TO BAM)
samtools sort file.bam file.bam.sorted   (SORTING BAM FILE) 
samtools index file.bam.sorted.bam  (INDEXING BAM FILE)

The .sam file looks like this. I am not sure whether it is correct or not and few of those fields below.

HISEQ:205:C4GL1ACXX:1:1101:8328:2446    4       *       0       0       *       *       0       0       GCCATTCGCGGTTGCAGGGCCTCCATCATTTGCTGTGGCTGCACCGCAGGCGCTTCCTGGAACGTCAACCCTCGTTGCGCCCTCACTGCATCATGCTCCTC   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   YT:Z:UU      

However, the produced indexed bam file was wrong and shows this message in the .bai file.

[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "file.sorted.bam.bai".

So, "EOF marker is absent" is a bug in Samtools so not a problem here, but Bam file has no header. Does an extra -h flag help during SAM to BAM conversion toadd the header? 

samtools view -bS -h file.sam > file.bam        (SAM TO BAM)

UPDATE: I tried with -h flag but it didn't help..!! 

bowtie mapping index samtools • 2.7k views
ADD COMMENT
0
Entering edit mode

-h isn't necessary for SAM-to-BAM. But I thought 'eof marker is absent' is only a bug when reading from STDIN. If so, something probably went wrong during one of the previous steps. Can you read the header from the unsorted bam (samtools view -H) ?
 

ADD REPLY
0
Entering edit mode

Yes. I can read the header from both sorted and unsorted bam files. Somehow only the indexing of the bam file is not working. 

samtools view -H file.bam
@HD     VN:1.0  SO:unsorted
@SQ     SN:AAK95987     LN:976
@SQ     SN:BAH23420     LN:979
@SQ     SN:CAC35342     LN:963
@SQ     SN:ACI32876     LN:1085
@SQ     SN:ABG21674     LN:1085
@SQ     SN:WP_000071895 LN:1085
@SQ     SN:BAB12601|    LN:1085
@PG     ID:bowtie2      PN:bowtie2      VN:2.2.3        CL:"/usr/local/bin/bowtie2/bowtie2-align-s --wrapper basic-0 -x Bowtie.mapping -p 16 -f -S file.sam -U reference.fasta"

 

ADD REPLY
0
Entering edit mode

I put your header and the one read line you posted into a file and sorted/index it - as expected, no problems there...

Just to be thorough, you posted:

samtools sort file.bam file.bam.sorted

which produces "file.bam.sorted.bam" but you tried to use the index of "file.sorted.bam". That's probably just a posting issue...

ADD REPLY
0
Entering edit mode

That could be just a typing mistake. I have rerun alI the steps again and am still struggling with indexing the sorted bam file though I can read the headers from sam, unsorted bam and sorted bam files. I am surprised that you made it to work in test run with only header and one read line. I will try that as well first.

ADD REPLY

Login before adding your answer.

Traffic: 1040 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6