Question: Trouble converting Sam to bam after bowtie alignment
0
gravatar for thefabnab
3.4 years ago by
thefabnab10
thefabnab10 wrote:

Hey everyone,

I'm having a bit of trouble converting a sam file I generated from a bowtie alignment. I believe bowtie is outputting the same file in a legacy format which samtools isn't recognizing. Here my bowtie alignment:

> bowtie --wrapper basic-0 NC_008253 -1 out1.fq -2 out2.fq -n 3 -X 600 -q --al aligned.map --un unmapped.map -t -l 24 --sam output.sam

and my samtools conversion from sam to bam:

>samtools view -bS output.sam > output.bam

with my sam file looking as follows:

@HD VN:1.0  SO:unsorted
@SQ SN:gi|110640213|ref|NC_008253.1|    LN:4938920
@PG ID:Bowtie   VN:1.1.0    CL:"bowtie --wrapper basic-0 --wrapper basic-0 NC_008253 -1 out1.fq -2 out2.fq -n 3 -X 600 -q --al aligned.map --un unmapped.map -t -l 24 --sam output.sam"
gi|110640213|ref|NC_008253.1|_97295_97764_1:0:0_5:0:0_4 77  *   0   TACGCGCTTTACAGGAGAATGGGACATGTTAGTTTGGCTGGCCGAACATATGGTCAAATATTATTCCGGC  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_97295_97764_1:0:0_5:0:0_4 141 *   0   ACAGCGCGGAGGCAACACCCAGTGCAACGACCGACATCCAGAAATACTCCCAACGGGCAACCAGGCCCTG  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_4776109_4776620_6:0:0_0:0:0_b 77  *   TGTTATTATCGGTGTAGGATCATAGTCGGGATCAACTAAACCACTAATATCCTCGTCTGTAGGGCAAGTA  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_4776109_4776620_6:0:0_0:0:0_b 141 *   ATCAACCATTTACCCTGTCTTTATAAATCTAATTGATAACGCAATATACTGGCTTGGGAAAACAACTGGA  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_3259478_3260036_2:0:0_5:0:0_e 77  *   TGAACGACAATCTTACGCTGCCGGGCAATATTGGCTGGATGGATAATGATCCGGTTTGTGATTGTCAGGA  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_3259478_3260036_2:0:0_5:0:0_e 141 *   GTTTTCTCCAGGGCACGCATTTCCTCAACGCTTGAAAATAGCGACTCTTTGCCAAAACAGATCGGATTGT  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0
gi|110640213|ref|NC_008253.1|_1336979_1337478_2:0:0_5:0:1_f 77  *   GTCAACAAAGCGATGGTAATAAATCAGGAAAATCCTGATTCACCGTTAATAGTTATGTCAGCTGATAACG  2222222222222222222222222222222222222222222222222222222222222222222222  XM:i:0

The error message I'm getting is the following:

[W::sam_read1] parse error at line 4 [main_samview] truncated file.

I'd appreciate any help

bowtie samtools • 2.0k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by thefabnab10
2

Your SAM file does not look like a standard one (see page 4). If you have standard fastq reads as your query then the first column for each line should start with query name (which would be the fastq ID). I assume you are aligning against NC_008253 which would be the third field in a SAM record but it happens to be the first in your example. So something in not right in the way you ran the bowtie job.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by genomax70k
1

Oh yeah, there's not even 11 columns, so the columns >= 11: check is failing, making it think that its reached the end since the header was ok.

To be honest, this is bowtie's fault. It shouldnt be possible, no matter what the parameters, to write out something that isnt a valid sam using the --sam flag.

ADD REPLYlink written 3.4 years ago by John12k
1

That's what I was thinking as well. The reference is a fasta reference which I built with the following command:

bowtie-build -f NC_008253.fa

ADD REPLYlink written 3.4 years ago by thefabnab10

I think the problem is you added --wrapper basic-0 to your bowtie command (if the information in the original post is correct). Can you run the command like this?

$ bowtie NC_008253 -1 out1.fq -2 out2.fq -n 3 -X 600 -q --al aligned.map --un unmapped.map -t -l 24 --sam output.sam
ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by genomax70k

That was the original command I ran, the posted command is from the .sam file I produced but just to be sure I ran exactly what you typed again after rebuilding the reference with bowtie-build and ran the resulting .sam through samtools and got the same error again.

ADD REPLYlink written 3.4 years ago by thefabnab10

Can you show us a few reads from your sequence files?

$ head -4 out1.fq
ADD REPLYlink written 3.4 years ago by genomax70k
1

What kind of trouble? Are you getting an error? It may be those pipe "|" characters in the sequence names.

ADD REPLYlink written 3.4 years ago by genomax70k

Sorry, the error is:

[W::sam_read1] parse error at line 4 [main_samview] truncated file.

ADD REPLYlink written 3.4 years ago by thefabnab10

$ head -4 out1.fq

@gi|110640213|ref|NC_008253.1|_3722720_3723213_2:0:0_2:1:0_0/1
TCTCTTCTAACAAGCCACGGCGGAACGGTGCATCGAGAAACACGATATTATGCGGTGTGCCTTTTAGCGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
ADD REPLYlink modified 3.4 years ago by genomax70k • written 3.4 years ago by thefabnab10

I am going out on a limb here but that looks like simulated fastq data created by breaking E.coli genome into "reads" (if all of your data looks like this). Q-scores for each base has been set to 2 which in sanger fastq sequence format would equate to "not reliable".

So you are trying to align these simulated reads against the same genome from which they were derived?
I think it would be best to go back to the genome fasta file, edit it and change the fasta header to just >NC_008253. Re-make the bowtie index followed by mapping.

Those fastq headers on simulated data look ugly as well (see the examples from wikipedia page for proper Illumina headers) but let us see if this works first.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by genomax70k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1337 users visited in the last hour