Question: Phred score check for BAM files
gravatar for Shicheng Guo
2.3 years ago by
Shicheng Guo7.7k
Shicheng Guo7.7k wrote:

Hi All,

I finish a perl script to check the phred score for bam files. I just want to make sure there is no bugs in it.


Usage: perl input.bam

use strict;

if ($ARGV[0] =~ /^-[h?]/) {
    print "Usage: determine-phred FILE
Reads a bam, possibly gzipped and returns the phred-scale, 
  either 64 or 33, based on a quick scan of the data in the file.
    exit 0;
my $cnt;
my $dphred = 64;
my $input=$ARGV[0];
if ($ARGV[0] =~ /\.bam$/) {
    $ARGV[0] = "samtools view '$ARGV[0]'|";
my $qual;
my $comm;
my $fmt;
if (@ARGV > 1) {
    my @mult = @ARGV;
    for my $f (@mult) {
        @ARGV = ($f);
        print "$f\t$dphred\n";
} else {
    print "$input: $dphred\n";

sub determine {
    $_ = <>;
    # bam
    $fmt = 'bam';
    $qual = (split(/\s+/, $_))[10];
    if (!$qual) {
    die "$input: Unknown file format\n";

    my $ssiz=7000;          # sample size
    while($qual) {
        for (my $i =length($qual)/2; $i < length($qual); ++$i) {
            if (ord(substr($qual,$i,1)) < 64) {
                $dphred = 33;
                $cnt=$ssiz; # last
        $qual = '';
        last if ++$cnt >= $ssiz;    # got enough
        if ($fmt eq 'fq') {
            # fastq
            scalar <>;      # id
            scalar <>;      # read
            scalar <>;      # comment
            $qual = <>;
            chomp $qual;
        } else {
            # sam
            $qual = (split(/\t/, $_))[10];
64 phred bam score 33 • 1.4k views
ADD COMMENTlink modified 2.3 years ago by Biostar ♦♦ 20 • written 2.3 years ago by Shicheng Guo7.7k

I am wondering, isn't this step (know the scoring method) necessary before aligning the reads/alignment ? Since some aligners expect us to provide this information.

ADD REPLYlink written 2.3 years ago by EagleEye6.4k

Yes, the score needs to be known prior to alignment. Also, the sam format specifies that all quality scores be in ASCII-33.

ADD REPLYlink written 2.3 years ago by Brian Bushnell16k

Hi Shicheng Guo, what is the main purpose of this code ? In what kind of situation, this can be used. I assume this can be only used if you want to realign BAM (alignment) file.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by EagleEye6.4k

I hope to achieve the phred score from bam files which download from public database with this script.

ADD REPLYlink written 2.2 years ago by Shicheng Guo7.7k

But what is the use of getting the phred score information after alignment has been done (because BAM/SAM will already have phred converted to +33 base quality). Please correct me if I am wrong.

Please check this SAM specification.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by EagleEye6.4k

The results are not consistent with determine-phred. I'm not sure which one is correct. if there is '?' or numbers, it should be Q33. With this case, determine-phred is right. It seems that the determine-phred accepts a revised sam format ( deleting the lines start with @ ). A phred score reference is here and here. Thank you.

The code is used on condition that you have gotten bam or sam files after mapping. Just change Q63 (if your fastq is) to Q33 for the bam files. @EagleEye .


  1. ~/local/ test.bam will output test.bam: 64

  2. samtools view test.bam | gawk 'BEGIN{FS="\t"}!/^@/{split($11, aa, ""); for (i in aa) data[aa[i]]}END{ asorti(data); for (j in data) printf data[j]; print ""}' will output #&'()*+,-./0123456789:;<=>?@ABCDEFGHIJ, which is Q33.

I cannot read perl so far, it seems there should be bugs in your script or the similar one determine-phred.

ADD REPLYlink modified 23 months ago • written 23 months ago by Zhilong Jia1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 536 users visited in the last hour