Question: Truncated SAM file - odd characters
3
gravatar for st.ph.n
2.2 years ago by
st.ph.n1.8k
Philadelphia, PA
st.ph.n1.8k wrote:

HI All,

I used bwa to align paired-end reads to a reference. I'm trying to convert the SAM file to a BAM file, but am getting the following error:

[E::sam_parse1] SEQ and QUAL are of different length
[W::sam_read1] parse error at line 25199
[main_samview] truncated file.

It seems the qual line is split on two different lines. I had previous error before this one for line 25197. I found the sequence in the fastq's and the qual line started with 3 '@' symbols for read 1. I removed sequence from both fastq files and it appears there are other offending sequences. The qual and sequence line in the fastq's are the same length, but in the sam file they are not.

This is the sequence from read 1 fastq:

@SALLY:355:C2JMJACXX:2:1101:2330:1996 1:N:0:GGCTAC
GTTGTGAAATAATTAAAATGTTGGCATTGATTGTGCATGTTTGTCACGTGCAAGAGGCATGCA
+
:11A===BB,2CF>BFBGC@CFHEC3ACC+<2A21:*:?G@@GD<?CG?F#0?DGHF1-)=CG

This is the sequence from read 2 fastq:

@SALLY:355:C2JMJACXX:2:1101:2330:1996 2:N:0:GGCTAC
GAAGACACCCGGGGTCATCATGGGATCATTCTGGTACTTTTTATGGGACACACGTGAACATCATGTGATCACATGCTGTGCATGCCTCTTGCACGTGACAA
+
BC<DDADDCDCDFB?1?9::?FBDG@?FFFIGIG9BGEFGIIECGF2FCF==BB@AA1?@@??CBDCCCDAAC@;-;;-5:>@A>@ACDACCCDD288?:>

 

sampe bwa sam samtools • 3.1k views
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by st.ph.n1.8k
1
gravatar for Biomonika (Noolean)
2.2 years ago by
State College, PA, USA
Biomonika (Noolean)3.0k wrote:

Your raw data (fastq files) might be a problem. Check them with FastQValidator  (http://genome.sph.umich.edu/wiki/FastQValidator)

ADD COMMENTlink written 2.2 years ago by Biomonika (Noolean)3.0k

Unfortunately it's not the fastq files. They check out with 0 errors from FastQValidator.

ADD REPLYlink written 2.2 years ago by st.ph.n1.8k
1

Do you have bioawk? Please also try this code: bioawk -c fastx ‘{if (length($seq)!=length($qual)) print “Offending: ” NR}’ file.fastq

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Biomonika (Noolean)3.0k

Had to compile it quick. No output for either fastq file so they look good. I'm using the most recent bwa and samtools. The trouble is this process worked for other samples, but not these data. I know there are more offending lines in the SAM file.

I tried the same if statement on the SAM file, changing the -c flag to SAM. The first offending line number is still 25199, and continues to the end of the file.

I tried picard tools ValidateSamFile, and there are invalid fastq characters, which I thought first looked odd when I looked at the sam file. They boxes with letters and numbers in them.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by st.ph.n1.8k
1

Maybe the problem was on the mapping step. Did you try running bwa again? Or a different mapper, e.g., bowtie2?

ADD REPLYlink written 2.2 years ago by h.mon9.0k

I think I narrowed down the problem to bwa aln. I had -I in flag on the command in a bash script for previous data. These data are phred 33, not 64, which is indicated by the -I flag. I'll know for sure once the alignments and sampe are complete. The other samples however did not raise an error, and they are from the same Illumina run.

ADD REPLYlink written 2.2 years ago by st.ph.n1.8k
1

It seems your data is from 100bp paired end, according to the BWA faq, bwa mem is preferred over bwa aln:

"There are three algorithms, which one should I choose?

For 70bp or longer Illumina, 454, Ion Torrent and Sanger reads, assembly contigs and BAC sequences, BWA-MEM is usually the preferred algorithm. For short sequences, BWA-backtrack may be better. BWA-SW may have better sensitivity when alignment gaps are frequent."

ADD REPLYlink written 2.2 years ago by h.mon9.0k

Is sam output he default for bwa mem?

ADD REPLYlink written 2.2 years ago by st.ph.n1.8k

yes, sam is the output from bwa mem.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by h.mon9.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1300 users visited in the last hour