Trouble converting Sam to bam after bowtie alignment
0
0
Entering edit mode
8.1 years ago
thefabnab ▴ 10

Hey everyone,

I'm having a bit of trouble converting a sam file I generated from a bowtie alignment. I believe bowtie is outputting the same file in a legacy format which samtools isn't recognizing. Here my bowtie alignment:

> bowtie --wrapper basic-0 NC_008253 -1 out1.fq -2 out2.fq -n 3 -X 600 -q --al aligned.map --un unmapped.map -t -l 24 --sam output.sam

and my samtools conversion from sam to bam:

>samtools view -bS output.sam > output.bam

with my sam file looking as follows:

@HD VN:1.0  SO:unsorted
@SQ SN:gi|110640213|ref|NC_008253.1|    LN:4938920
@PG ID:Bowtie   VN:1.1.0    CL:"bowtie --wrapper basic-0 --wrapper basic-0 NC_008253 -1 out1.fq -2 out2.fq -n 3 -X 600 -q --al aligned.map --un unmapped.map -t -l 24 --sam output.sam"
gi|110640213|ref|NC_008253.1|_97295_97764_1:0:0_5:0:0_4 77  *   0   TACGCGCTTTACAGGAGAATGGGACATGTTAGTTTGGCTGGCCGAACATATGGTCAAATATTATTCCGGC  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_97295_97764_1:0:0_5:0:0_4 141 *   0   ACAGCGCGGAGGCAACACCCAGTGCAACGACCGACATCCAGAAATACTCCCAACGGGCAACCAGGCCCTG  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_4776109_4776620_6:0:0_0:0:0_b 77  *   TGTTATTATCGGTGTAGGATCATAGTCGGGATCAACTAAACCACTAATATCCTCGTCTGTAGGGCAAGTA  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_4776109_4776620_6:0:0_0:0:0_b 141 *   ATCAACCATTTACCCTGTCTTTATAAATCTAATTGATAACGCAATATACTGGCTTGGGAAAACAACTGGA  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_3259478_3260036_2:0:0_5:0:0_e 77  *   TGAACGACAATCTTACGCTGCCGGGCAATATTGGCTGGATGGATAATGATCCGGTTTGTGATTGTCAGGA  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_3259478_3260036_2:0:0_5:0:0_e 141 *   GTTTTCTCCAGGGCACGCATTTCCTCAACGCTTGAAAATAGCGACTCTTTGCCAAAACAGATCGGATTGT  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_1336979_1337478_2:0:0_5:0:1_f 77  *   GTCAACAAAGCGATGGTAATAAATCAGGAAAATCCTGATTCACCGTTAATAGTTATGTCAGCTGATAACG  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0

The error message I'm getting is the following:

[W::sam_read1] parse error at line 4 [main_samview] truncated file.

I'd appreciate any help

bowtie samtools • 4.6k views
ADD COMMENT
2
Entering edit mode

Your SAM file does not look like a standard one (see page 4). If you have standard fastq reads as your query then the first column for each line should start with query name (which would be the fastq ID). I assume you are aligning against NC_008253 which would be the third field in a SAM record but it happens to be the first in your example. So something in not right in the way you ran the bowtie job.

ADD REPLY
1
Entering edit mode

Oh yeah, there's not even 11 columns, so the columns >= 11: check is failing, making it think that its reached the end since the header was ok.

To be honest, this is bowtie's fault. It shouldnt be possible, no matter what the parameters, to write out something that isnt a valid sam using the --sam flag.

ADD REPLY
1
Entering edit mode

That's what I was thinking as well. The reference is a fasta reference which I built with the following command:

bowtie-build -f NC_008253.fa

ADD REPLY
0
Entering edit mode

I think the problem is you added --wrapper basic-0 to your bowtie command (if the information in the original post is correct). Can you run the command like this?

$ bowtie NC_008253 -1 out1.fq -2 out2.fq -n 3 -X 600 -q --al aligned.map --un unmapped.map -t -l 24 --sam output.sam
ADD REPLY
0
Entering edit mode

That was the original command I ran, the posted command is from the .sam file I produced but just to be sure I ran exactly what you typed again after rebuilding the reference with bowtie-build and ran the resulting .sam through samtools and got the same error again.

ADD REPLY
0
Entering edit mode

Can you show us a few reads from your sequence files?

$ head -4 out1.fq
ADD REPLY
1
Entering edit mode

What kind of trouble? Are you getting an error? It may be those pipe "|" characters in the sequence names.

ADD REPLY
0
Entering edit mode

Sorry, the error is:

[W::sam_read1] parse error at line 4 [main_samview] truncated file.

ADD REPLY
0
Entering edit mode

$ head -4 out1.fq

@gi|110640213|ref|NC_008253.1|_3722720_3723213_2:0:0_2:1:0_0/1
TCTCTTCTAACAAGCCACGGCGGAACGGTGCATCGAGAAACACGATATTATGCGGTGTGCCTTTTAGCGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
ADD REPLY
0
Entering edit mode

I am going out on a limb here but that looks like simulated fastq data created by breaking E.coli genome into "reads" (if all of your data looks like this). Q-scores for each base has been set to 2 which in sanger fastq sequence format would equate to "not reliable".

So you are trying to align these simulated reads against the same genome from which they were derived?
I think it would be best to go back to the genome fasta file, edit it and change the fasta header to just >NC_008253. Re-make the bowtie index followed by mapping.

Those fastq headers on simulated data look ugly as well (see the examples from wikipedia page for proper Illumina headers) but let us see if this works first.

ADD REPLY

Login before adding your answer.

Traffic: 2937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6