Bowtie2 character encoding error
1
1
Entering edit mode
9.5 years ago
Medhat 9.7k

I am running bowtie2 for mate pairs specifying this parameter:

bowtie2 --local --phred64  --threads 25 -x "rest of the command"

It gives me this error:

Saw ASCII character -126 but expected 64-based Phred qual.
Try not specifying --solexa1.3-quals/--phred64-quals.
terminate called after throwing an instance of 'int'
bowtie2-align died with signal 6 (ABRT)

I removed the parameter --phred64, but it gives another error:

Saw ASCII character -126 but expected 33-based Phred qual.
terminate called after throwing an instance of 'int'
bowtie2-align died with signal 6 (ABRT)

Any idea?

Thanks

alignment software-error next-gen sequence • 5.4k views
ADD COMMENT
0
Entering edit mode

Sounds like you have a flipped bit somewhere in the fastq file. Try to subset it until you find the approximate line number. You can probably then look through the file and quickly identify the weird character (there's probably a quick awk command to do all of this, but I don't know it off-hand).

ADD REPLY
1
Entering edit mode

ASCII 126 is the tilde (~).

Will this work?

awk ' $0 ~ /~/ { print }' <in.fastq

Is that even a good idea?

ADD REPLY
0
Entering edit mode

It's -126, which I guess would be ascii 251 (a square root sign?) if one assumed char to be unsigned.

ADD REPLY
0
Entering edit mode

I am dealing with 8 paired fastq files so I need to search it in all the 18 fastq Files I have?!

ADD REPLY
1
Entering edit mode

You should be able to tell which fastq file is the problem by when the error occurs and what the last alignment written was. Just grep for that last alignment's read name and you'll know the file. The problem read should be within ~128K of that (I think that's the buffer size bowtie2 uses).

ADD REPLY
0
Entering edit mode

Ok I grep-ed the last sequence like that

grep -r "FCC4U5HACXX:1:1101:2896:51676#AGTCAACA" ../

It suggested two files

../s79757-11kb_1/s79757-11kb_1_good_reads_1.trimmed-L18-pair2.fastq:@FCC4U5HACXX:1:1101:2896:51676#AGTCAACA/2
../s79757-11kb_1/s79757-11kb_1_good_reads_1.trimmed-L18-pair1.fastq:@FCC4U5HACXX:1:1101:2896:51676#AGTCAACA/1

using the command

awk ' $0 ~ /~/ { print }' < s79757-11kb_1_good_reads_1.trimmed-L18-pair2.fastq

But no result!

ADD REPLY
1
Entering edit mode

get the line number with grep -n and then use awk to extract the next 100,000 (or so, these are estimated numbers!) reads from each into a new file. One of those files should then cause the error.

ADD REPLY
0
Entering edit mode

And then you can more easily use bowtie2's --upto and -s options to narrow down the region (in fact, you can use that to avoid using awk to subset the file).

ADD REPLY
0
Entering edit mode

That's because I made a mistake. ~ is not the character you are looking for :-)

We should look for a pattern that matches non-Phred64 characters.

ADD REPLY
0
Entering edit mode

OP, did you find a solution to this?

ADD REPLY
3
Entering edit mode
7.9 years ago
c.kraus ▴ 30

I struggled also several times with this kind of problem. Somehow (I don't know why) you may have a non-ascii character in your fastq file.

And for some reason (which I also don't know) bowtie2 explains that it has the ascii code "-126" which does not exist, as all ascii signs have to be positive.

It is relatively likely that you have non-ascii character in your quality lines of your fastq file.

You can search for a line containing a non-ascii character by using the following commands. If the file is gziped use this: zcat FILE.fastq.gz | perl -e '$line = 1; while (<>) {if(/[^[:ascii:]]/) {print "LINE: $line\n$_";} $line++;}'

Otherwise this: perl -e '$line = 1; while (<>) {if(/[^[:ascii:]]/) {print "LINE: $line\n$_";} $line++;}' -f FILE.fastq

Let's say you received such an output:
LINE: 21410096
CCCCCGGGGGGGGGG�GGGG

You non-ascii sign will be depicted as this question-mark symbol.

After receiving the responsible line you can "enlarge" the "surrounding area" by using sed: sed -n '21410093,21410096p' FILE.fastq

So we got something like this as an output:
@sequence HEADER
CTGGCTGGGAAGGGGCTGGCT
+quality HEADER
CCCCCGGGGGGGGGG�GGGG

Such an corrupted entry can easily be removed by using sed. In this case the corrupted read was located at the lines 21410093 to 21410096, so the following command will work:
sed -n '21410093,21410096d' Fasta.fastq > repaired.fastq

This would be problematic if you have paired-end reads. Either remove the read entry at the same position in the corresponding fastq file, or replace the corrupting character in you initial fastq file. I would prefer the latter:

This can be done by using sed:
sed '21410096s/CCCCCGGGGGGGGGG�GGGG/CCCCCGGGGGGGGGGIGGGGG' FILE.fastq >FILE.repaired.fastq

(actually I'm surprised that this one worked).

I hope this helps.

ADD COMMENT

Login before adding your answer.

Traffic: 1044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6