Converting Quality Scores To Sanger
2
0
Entering edit mode
12.8 years ago
Haiping ▴ 110

My data were generated by Hiseq2000. So I used -F ILMFQ during run novoalign. Should I still need to convert the quality scores to sanger before I used samtool pileup for SNP calling? thanks for all the comments

quality scoring • 9.2k views
ADD COMMENT
0
Entering edit mode

I just found this from novoalign websit:

Question: Does Novoalign support Sanger and Illumina FASTQ. Answer Yes. Sanger and Illumina FASTQ formats are both supported. The quality values are converted to phred values using the Sanger method and used in subsequent alignment routines.

Does it means that we don't need to worry about it?

ADD REPLY
3
Entering edit mode
12.8 years ago

I believe that the Hiseq2000 uses Sanger encoding already. What is called Illumina-mode is now obsolete. This of course means that you would need to rerun your mapping.

Check the post below on how to detect the encoding from your data:

A: Write Script For Selection Of Fastq File With Sanger Format

ADD COMMENT
1
Entering edit mode

To comment on Istvan's answer, you can still found Illumina 1.3 (phred+64) based quality scores. In fact, it depends on the version of the Illumina software which is installed on the machine. So, even if it's from HiSeq 2000, you have to be careful and you have to check. Nevertheless, it's true that latest version generates Illumina 1.9 quality scores which are phred+33 based (like Sanger).

ADD REPLY
0
Entering edit mode

I got my data nearly 1 years ago. And I am sure that it is phred+64. I tried to use the command in links but failed cause of we do not have guess-encoding.py. Anyway, it seems no problem for SNP calling.thanks for the comments.

ADD REPLY
2
Entering edit mode
12.8 years ago
Docroberson ▴ 30

It doesn't depend so much on HiSeq versus GAIIx as which version of the pipeline you're using. HiSeq SHOULD be 1.3+, which does encode phred quality scores with an offset of 64. The best thing you can do is to confirm what version of the pipeline the core that generated your sequence is using.

If it is 1.3+, the option you used in NOVOALIGN is correct as it interally converts to phred, since samtools requires phred scaling. If it was the old format that used the log of the probability ratios you would need to use SLXFQ instead.

ADD COMMENT
0
Entering edit mode

I am sure my data are 1.3+ since the worst quality is B and quality score is 2. So I don.t need to worry about the reliablity of the SNP calling. Thanks for you comments.

ADD REPLY

Login before adding your answer.

Traffic: 2982 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6