Question: Fastq files with integer instead of acii quality scores
0
gravatar for bgbrink
3 months ago by
bgbrink10
bgbrink10 wrote:

I was going to align a bunch of old fastq files with bwa and got no result. When I looked into the files, I saw that the base quality is reported as integers as opposed to ascii:

@1_21_9:1:2:1565:591
GTGTTGTTTAGAAGCTGAACTACCTTTTTCGCTGAG
+1_21_9:1:2:1565:591
 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 31 5 40 40 1 40 15 40 40 40 40 40 4 2 40 40 15 1 39
@1_21_9:1:2:1307:745
GATCGGAAGAGCTCGTCTGCCGTCTTCTGCTTTGCT
+1_21_9:1:2:1307:745
 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 4 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 -2 1 1 1

Has anyone ever seen this encoding before and knows a tool that can convert this into proper fastq?

Note that there are negative values as well. Could this be old Solexa quality scores?

sequencing • 303 views
ADD COMMENTlink modified 3 months ago by sacha1.2k • written 3 months ago by bgbrink10

That file does not meet fastq format definition. Where did you get this data BTW? Do you know what technology is it from?

ADD REPLYlink written 3 months ago by genomax49k

I have seen GAIIx data that was in separate sequence and score (as integers) files. Maybe somebody just mashed them together without knowing that they need to be encoded...

ADD REPLYlink written 3 months ago by cschu1811.2k

That could be it. I don't have any hard proof from what technology this data is from though. Does it still make sense to try and convert the scores manually?

ADD REPLYlink written 3 months ago by bgbrink10

If you have a clue which encoding/phred scale is used you could convert it to a sane fastq, using some scripting. Alternatively you could just convert it to a fasta file and forget about the quality scores...

ADD REPLYlink written 3 months ago by WouterDeCoster29k
3
gravatar for sacha
3 months ago by
sacha1.2k
France
sacha1.2k wrote:

It seems you are using Solexa+64 encoding ( -5 to 40 ). You can convert to ASCII easily helped by the following picture. enter image description here

ADD COMMENTlink modified 3 months ago • written 3 months ago by sacha1.2k
1
gravatar for sacha
3 months ago by
sacha1.2k
France
sacha1.2k wrote:

I did it for you with awk :

cat myfile.fastq | awk -f convert.awk

// convert.awk

function toascii(score)
{
    return sprintf("%c",score + 64)
}


(NR-1) % 4 == 0{
print $0
}

(NR-1) % 4 == 1{
print $0
}

(NR-1) % 4 == 2{
print "+"
}

(NR-1) % 4 == 3{

for (i=1; i < NF ; i+=1)
    {
        printf(toascii($i))
    }
    printf("\n")
}
ADD COMMENTlink written 3 months ago by sacha1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1414 users visited in the last hour