Question: Calculation Of Average Mapping Quality Over A Given Genomic Region
1
gravatar for Bioscientist
7.3 years ago by
Bioscientist1.6k
Bioscientist1.6k wrote:

For a given genomic region,how can I calculate:

1.average mapping quality(phred-like) for all mapped reads in the bam file over the region.

2.average sequencing quality(also phred score) for all bases in all mapped reads over the region

Any softwares or packages? I know GATK may do such work, but their website is so confusing and I cannot find the information I need.

Thanks

gatk quality mapping • 6.7k views
ADD COMMENTlink written 7.3 years ago by Bioscientist1.6k
7
gravatar for Aaronquinlan
7.3 years ago by
Aaronquinlan10k
United States
Aaronquinlan10k wrote:

As for the mean MAPQ, you could combine samtools view and awk as follows (I chose an arbitrary 1Mb region on chr22):

samtools view aln.bam chr22:20000000-21000000 | \
    awk '{sum+=$5} END { print "Mean MAPQ =",sum/NR}'

Also, I've hacked (type 'make' in the nawk dir) an elegant change that Heng Li made of awk to allow named columns for common genomiccs formats such as SAM. You can name the mapq column instead of $5 (the idea being that this makes for more readable and better documented pipelines):

samtools view aln.bam chr22:20000000-21000000 | \
    awk -c sam '{sum+=$mapq} END { print "Mean MAPQ =",sum/NR}'

Note that this version of awk will allow the first line of the file to serve as a header for naming the columns (like an R data frame). Just use:

awk -c header

As for the mean BQ, the same concept would work, but you just need a little tool to unwind the ASCII BQ values into Phred scores.

samtools view aln.bam chr22:20000000-21000000 | \ 
   perl -lane '@quals = split(undef, $F[10]); foreach $qual (@quals) {print ord($qual)-33}' | \
   awk '{sum+=$0} END { print "Mean BASEQUAL =",sum/NR}'

Or something like that.

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by Aaronquinlan10k
2

+1 for this hack for awk . I love it ! I need it ! I want it in the standard distribution !

ADD REPLYlink written 7.3 years ago by Pierre Lindenbaum119k
2

@lh3 gets all the credit for that. I just added a bit of laziness to it.

ADD REPLYlink written 7.3 years ago by Aaronquinlan10k

thanks! very helpful

ADD REPLYlink written 7.3 years ago by Bioscientist1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1567 users visited in the last hour