I've recently downloaded the simple somatic mutation (SSM) file for clear cell renal cell carcinoma (ccRCC) from the ICGC Data Repository, but I've been having some trouble interpreting the quality score column.
Below is a snippet of my data ( .tsv file)
chromosome chromosome_start chromosome_end chromosome_strand mutation_type reference_genome_allele mutated_from_allele mutated_to_allele quality_score probability total_read_count 1 224822287 224822287 1 single base substitution T T G 223 46 26 1 224822287 224822287 1 single base substitution T T G 223 46 26
However, I'm not sure why the quality score is so high. For every entry the quality score is between 100 and 223. Some have said that Phred scores can in fact range from 0 to infinity (http://gatkforums.broadinstitute.org/discussion/4260/how-should-i-interpret-phred-scaled-quality-scores), while others say that scores in the 200 range probably means that the signal was too low (http://seqanswers.com/forums/showthread.php?t=23770).
The ICGC website has described the quality score column to be that of the mutation call and not that of alignment etc. (http://docs.icgc.org/simple-somatic-mutations-ssm-primary-analysis-file-p).
The rest of the columns say that samtools pileup was used for the raw variant calls among other analysis algorithms such as GATK, Picard, VCF tools etc. For all calls no verification with an orthogonal platform or biological validation was carried out.
Can anyone confirm whether this does in fact infer great quality or if I should be looking out for something else.
Thanks in advance,