Bowtie2 character encoding error
1
1
Entering edit mode
6.4 years ago
Medhat 8.9k

I am running bowtie2 for mate pairs specifying this parameter :

bowtie2 --local --phred64  --threads 25 -x "rest of the command"

It gives me this error:

Saw ASCII character -126 but expected 64-based Phred qual.
Try not specifying --solexa1.3-quals/--phred64-quals.
terminate called after throwing an instance of 'int'
bowtie2-align died with signal 6 (ABRT)

I removed the parameter --phred64 , but it gives another error:

Saw ASCII character -126 but expected 33-based Phred qual.
terminate called after throwing an instance of 'int'
bowtie2-align died with signal 6 (ABRT)

any Idea?

Thanks,

alignment sequence next-gen software error • 3.9k views
0
Entering edit mode

Sounds like you have a flipped bit somewhere in the fastq file. Try to subset it until you find the approximate line number. You can probably then look through the file and quickly identify the weird character (there's probably a quick awk command to do all of this, but I don't know it off-hand).

1
Entering edit mode

ASCII 126 is the tilde (~).

Will this work?

awk ' 0 ~ /~/ { print }' <in.fastq Is that even a good idea? ADD REPLY 0 Entering edit mode It's -126, which I guess would be ascii 251 (a square root sign?) if one assumed char to be unsigned. ADD REPLY 0 Entering edit mode I am dealing with 8 paired fastq files so I need to search it in all the 18 fastq Files I have?! ADD REPLY 1 Entering edit mode You should be able to tell which fastq file is the problem by when the error occurs and what the last alignment written was. Just grep for that last alignment's read name and you'll know the file. The problem read should be within ~128K of that (I think that's the buffer size bowtie2 uses). ADD REPLY 0 Entering edit mode ok I greped the last sequence like that grep -r "FCC4U5HACXX:1:1101:2896:51676#AGTCAACA" ../ it suggested two files ../s79757-11kb_1/s79757-11kb_1_good_reads_1.trimmed-L18-pair2.fastq:@FCC4U5HACXX:1:1101:2896:51676#AGTCAACA/2 ../s79757-11kb_1/s79757-11kb_1_good_reads_1.trimmed-L18-pair1.fastq:@FCC4U5HACXX:1:1101:2896:51676#AGTCAACA/1 using the command  awk '0 ~ /~/ { print }' < s79757-11kb_1_good_reads_1.trimmed-L18-pair2.fastq

But no result !!!

1
Entering edit mode

get the line number with grep -n and then use awk to extract the next 100,000 (or so, these are estimated numbers!) reads from each into a new file. One of those files should then cause the error.

0
Entering edit mode

And then you can more easily use bowtie2's --upto and -s options to narrow down the region (in fact, you can use that to avoid using awk to subset the file).

0
Entering edit mode

That's because I made a mistake. ~ is not the character you are looking for :-)

We should look for a pattern that matches non-Phred64 characters.

0
Entering edit mode

OP, did you find a solution to this?

3
Entering edit mode
4.8 years ago
c.kraus ▴ 30

I struggled also several times with this kind of problem. Somehow (I don't know why) you may have a non-ascii character in your fastq file.

And for some reason (which I also don't know) bowtie2 explains that it has the ascii code "-126" which does not exist, as all ascii signs have to be positive.

It is relatively likely that you have non-ascii character in your quality lines of your fastq file.

You can search for a line containing a non-ascii character by using the following commands. If the file is gziped use this:  zcat FILE.fastq.gz | perl -e '$line = 1; while (<>) {if(/[^[:ascii:]]/) {print "LINE:$line\n$_";}$line++;}' 

Otherwise this:  perl -e '$line = 1; while (<>) {if(/[^[:ascii:]]/) {print "LINE:$line\n$_";}$line++;}' -f FILE.fastq 

Let's say you received such an output:
 LINE: 21410096 CCCCCGGGGGGGGGG�GGGG 

You non-ascii sign will be depicted as this question-mark symbol.

After receiving the responsible line you can "enlarge" the "surrounding area" by using sed:  sed -n '21410093,21410096p' FILE.fastq 

So we got something like this as an output:
 @sequence HEADER CTGGCTGGGAAGGGGCTGGCT +quality HEADER CCCCCGGGGGGGGGG�GGGG 

Such an corrupted entry can easily be removed by using sed. In this case the corrupted read was located at the lines 21410093 to 21410096, so the following command will work:
 sed -n '21410093,21410096d' Fasta.fastq > repaired.fastq 

This would be problematic if you have paired-end reads. Either remove the read entry at the same position in the corresponding fastq file, or replace the corrupting character in you initial fastq file. I would prefer the latter:

This can be done by using sed:
sed '21410096s/CCCCCGGGGGGGGGG�GGGG/CCCCCGGGGGGGGGGIGGGGG' FILE.fastq >FILE.repaired.fastq

(actually I'm surprised that this one worked).

I hope this helps.