Question: Bwa Error While Trimming Color Space Data
2
gravatar for Farhat
8.7 years ago by
Farhat2.9k
Pune, India
Farhat2.9k wrote:

I am using the following pipeline for aligning some paired end SOLiD data.

bwa aln -c -n 0.06 -o 2 -t 8 -q 10 ~/genomes/hydra/ACZUJGI/color/hydra ~/hydra/solid/hsamp_F3.fastq.gz > /scratch/hydra/hsamp_F3.sai
bwa aln -c -n 0.06 -o 2 -t 8 -q 10 ~/genomes/hydra/ACZUJGI/color/hydra ~/hydra/solid/hsamp_R3.fastq.gz > /scratch/hydra/hsamp_R3.sai

bwa sampe -P ~/genomes/hydra/ACZUJGI/color/hydra /scratch/hydra/hsamp_F3.sai /scratch/hydra/hsamp_R3.sai ~/hydra/solid/hsamp_F3.fastq.gz ~/hydra/solid/hsamp_R3.fastq.gz | samtools view -bS -|samtools sort - /scratch/hydra/hsamp_solid

On running this I get the following error. Just showing the last few lines from the output here. I only get this error when the -q parameter is nonzero.

[bwa_paired_sw] 91 out of 33101 Q17 discordant pairs are fixed.
[bwa_sai2sam_pe_core] time elapses: 74.36 sec
[bwa_sai2sam_pe_core] refine gapped alignments... 1.53 sec
[bwa_sai2sam_pe_core] print alignments... [samopen] SAM header is present: 20914 sequences.
Parse error at line 20916: sequence and quality are inconsistent

The error happens at the conversion to bam step in the pipeline. If I look at line 20916, it shows

1_29_54 141     *       0       0       *       *       0       0       NNCANGNAANANATCNNCCGGNTANANTTGANTTANNTTN        !!@;!9!:?!;!:>>!!8?57!66!7!8=<9!<<9!!-?!!!&!<!!!!!      XC:i:40

Thus the read is truncated but the quality line is not. Is there a workaround for this?

alignment bwa • 2.7k views
ADD COMMENTlink modified 8.3 years ago by brentp23k • written 8.7 years ago by Farhat2.9k
3
gravatar for brentp
8.7 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

I think the proper answer is don't use -q with colorspace as it's designed for base-space. If you disregard that, you can pipe the output to this command (then to SAM)

$BWA_COMMAND \
| awk 'BEGIN{FS=OFS="\t"} \
      ($1 ~ /^@/){ print $0} \
      ($1 !~ /^@/){ $11 = substr($11, 0, length($10)); print $0}' \
| $SAMTOOLS_COMMAND > $OUT

That makes sure the qualities are the same length as the sequence.

You could also trim your reads with this: https://github.com/brentp/bio-playground/blob/master/solidstuff/solid-trimmer.py

ADD COMMENTlink modified 9 months ago by RamRS27k • written 8.7 years ago by brentp23k

Thanks! I thought there may be some switch I am missing or something like that. The solid_trimmer looks useful!

ADD REPLYlink written 8.7 years ago by Farhat2.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1050 users visited in the last hour