Truncated SAM file - odd characters
1
3
Entering edit mode
9.2 years ago
st.ph.n ★ 2.7k

HI All,

I used bwa to align paired-end reads to a reference. I'm trying to convert the SAM file to a BAM file, but am getting the following error:

[E::sam_parse1] SEQ and QUAL are of different length
[W::sam_read1] parse error at line 25199
[main_samview] truncated file.

It seems the qual line is split on two different lines. I had previous error before this one for line 25197. I found the sequence in the fastq's and the qual line started with 3 '@' symbols for read 1. I removed sequence from both fastq files and it appears there are other offending sequences. The qual and sequence line in the fastq's are the same length, but in the sam file they are not.

This is the sequence from read 1 fastq:

@SALLY:355:C2JMJACXX:2:1101:2330:1996 1:N:0:GGCTAC
GTTGTGAAATAATTAAAATGTTGGCATTGATTGTGCATGTTTGTCACGTGCAAGAGGCATGCA
+
:11A===BB,2CF>BFBGC@CFHEC3ACC+<2A21:*:?G@@GD<?CG?F#0?DGHF1-)=CG

This is the sequence from read 2 fastq:

@SALLY:355:C2JMJACXX:2:1101:2330:1996 2:N:0:GGCTAC
GAAGACACCCGGGGTCATCATGGGATCATTCTGGTACTTTTTATGGGACACACGTGAACATCATGTGATCACATGCTGTGCATGCCTCTTGCACGTGACAA
+
BC<DDADDCDCDFB?1?9::?FBDG@?FFFIGIG9BGEFGIIECGF2FCF==BB@AA1?@@??CBDCCCDAAC@;-;;-5:>@A>@ACDACCCDD288?:>
sampe samtools bwa SAM • 11k views
ADD COMMENT
1
Entering edit mode
9.2 years ago

Your raw data (fastq files) might be a problem. Check them with FastQValidator (http://genome.sph.umich.edu/wiki/FastQValidator)

ADD COMMENT
0
Entering edit mode

Unfortunately it's not the fastq files. They check out with 0 errors from FastQValidator.

screenshot

ADD REPLY
1
Entering edit mode

Do you have bioawk? Please also try this code:

bioawk -c fastx '{if (length($seq)!=length($qual)) print "Offending: " NR}' file.fastq
ADD REPLY
0
Entering edit mode

Had to compile it quick. No output for either fastq file so they look good. I'm using the most recent bwa and samtools. The trouble is this process worked for other samples, but not these data. I know there are more offending lines in the SAM file.

I tried the same if statement on the SAM file, changing the -c flag to SAM. The first offending line number is still 25199, and continues to the end of the file.

I tried picard tools ValidateSamFile, and there are invalid fastq characters, which I thought first looked odd when I looked at the sam file. They boxes with letters and numbers in them.

ADD REPLY
1
Entering edit mode

Maybe the problem was on the mapping step. Did you try running bwa again? Or a different mapper, e.g., bowtie2?

ADD REPLY
0
Entering edit mode

I think I narrowed down the problem to bwa aln. I had -I in flag on the command in a bash script for previous data. These data are phred 33, not 64, which is indicated by the -I flag. I'll know for sure once the alignments and sampe are complete. The other samples however did not raise an error, and they are from the same Illumina run.

ADD REPLY
1
Entering edit mode

It seems your data is from 100bp paired end, according to the BWA faq, bwa mem is preferred over bwa aln:

"There are three algorithms, which one should I choose?

For 70bp or longer Illumina, 454, Ion Torrent and Sanger reads, assembly contigs and BAC sequences, BWA-MEM is usually the preferred algorithm. For short sequences, BWA-backtrack may be better. BWA-SW may have better sensitivity when alignment gaps are frequent."

ADD REPLY
0
Entering edit mode

Is sam output he default for bwa mem?

ADD REPLY
0
Entering edit mode

yes, sam is the output from bwa mem.

ADD REPLY

Login before adding your answer.

Traffic: 868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6