Question: Bowtie2 character encoding error
1
gravatar for Medhat
4.5 years ago by
Medhat8.2k
Texas
Medhat8.2k wrote:

I am running bowtie2 for mate pairs specifying this parameter :

bowtie2 --local --phred64  --threads 25 -x "rest of the command"

It gives me this error:

Saw ASCII character -126 but expected 64-based Phred qual.
Try not specifying --solexa1.3-quals/--phred64-quals.
terminate called after throwing an instance of 'int'
bowtie2-align died with signal 6 (ABRT)

I removed the parameter --phred64 , but it gives another error:

 

Saw ASCII character -126 but expected 33-based Phred qual.
terminate called after throwing an instance of 'int'
bowtie2-align died with signal 6 (ABRT)

any Idea?

 

Thanks,

ADD COMMENTlink modified 2.9 years ago by c.kraus30 • written 4.5 years ago by Medhat8.2k

Sounds like you have a flipped bit somewhere in the fastq file. Try to subset it until you find the approximate line number. You can probably then look through the file and quickly identify the weird character (there's probably a quick awk command to do all of this, but I don't know it off-hand).

ADD REPLYlink written 4.5 years ago by Devon Ryan90k
1

ASCII 126 is the tilde (~).

Will this work?

awk ' $0 ~ /~/ { print }' <in.fastq

Is that even a good idea?

ADD REPLYlink written 4.5 years ago by RamRS21k

It's -126, which I guess would be ascii 251 (a square root sign?) if one assumed char to be unsigned.

ADD REPLYlink written 4.5 years ago by Devon Ryan90k

I am dealing with 8 paired fastq files so I need to search it in all the 18 fastq Files I have?!

ADD REPLYlink written 4.5 years ago by Medhat8.2k
1

You should be able to tell which fastq file is the problem by when the error occurs and what the last alignment written was. Just grep for that last alignment's read name and you'll know the file. The problem read should be within ~128K of that (I think that's the buffer size bowtie2 uses).
 

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Devon Ryan90k

ok I greped the last sequence like that

grep -r "FCC4U5HACXX:1:1101:2896:51676#AGTCAACA" ../

it suggested two files 

../s79757-11kb_1/s79757-11kb_1_good_reads_1.trimmed-L18-pair2.fastq:@FCC4U5HACXX:1:1101:2896:51676#AGTCAACA/2
../s79757-11kb_1/s79757-11kb_1_good_reads_1.trimmed-L18-pair1.fastq:@FCC4U5HACXX:1:1101:2896:51676#AGTCAACA/1

using the command 

 awk ' $0 ~ /~/ { print }' < s79757-11kb_1_good_reads_1.trimmed-L18-pair2.fastq

But no result !!!

ADD REPLYlink modified 4.5 years ago by RamRS21k • written 4.5 years ago by Medhat8.2k
1

get the line number with grep -n and then use awk to extract the next 100,000 (or so, these are estimated numbers!) reads from each into a new file. One of those files should then cause the error.

ADD REPLYlink written 4.5 years ago by Devon Ryan90k

And then you can more easily use bowtie2's --upto and -s options to narrow down the region (in fact, you can use that to avoid using awk to subset the file).

ADD REPLYlink written 4.5 years ago by Devon Ryan90k

That's because I made a mistake. ~ is not the character you are looking for :-)

We should look for a pattern that matches non-Phred64 characters.

ADD REPLYlink written 4.5 years ago by RamRS21k

OP, did you find a solution to this?

ADD REPLYlink written 4.4 years ago by RamRS21k
3
gravatar for c.kraus
2.9 years ago by
c.kraus30
c.kraus30 wrote:

I struggled also several times with this kind of problem. Somehow (I don't know why) you may have a non-ascii character in your fastq file.

And for some reason (which I also don't know) bowtie2 explains that it has the ascii code "-126" which does not exist, as all ascii signs have to be positive.

It is relatively likely that you have non-ascii character in your quality lines of your fastq file.

You can search for a line containing a non-ascii character by using the following commands. If the file is gziped use this: zcat FILE.fastq.gz | perl -e '$line = 1; while (<>) {if(/[^[:ascii:]]/) {print "LINE: $line\n$_";} $line++;}'

Otherwise this: perl -e '$line = 1; while (<>) {if(/[^[:ascii:]]/) {print "LINE: $line\n$_";} $line++;}' -f FILE.fastq

Let's say you received such an output:
LINE: 21410096
CCCCCGGGGGGGGGG�GGGG

You non-ascii sign will be depicted as this question-mark symbol.

After receiving the responsible line you can "enlarge" the "surrounding area" by using sed: sed -n '21410093,21410096p' FILE.fastq

So we got something like this as an output:
@sequence HEADER
CTGGCTGGGAAGGGGCTGGCT
+quality HEADER
CCCCCGGGGGGGGGG�GGGG

Such an corrupted entry can easily be removed by using sed. In this case the corrupted read was located at the lines 21410093 to 21410096, so the following command will work:
sed -n '21410093,21410096d' Fasta.fastq > repaired.fastq

This would be problematic if you have paired-end reads. Either remove the read entry at the same position in the corresponding fastq file, or replace the corrupting character in you initial fastq file. I would prefer the latter:

This can be done by using sed:
sed '21410096s/CCCCCGGGGGGGGGG�GGGG/CCCCCGGGGGGGGGGIGGGGG' FILE.fastq >FILE.repaired.fastq

(actually I'm surprised that this one worked).

I hope this helps.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by c.kraus30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1047 users visited in the last hour