I want to compute the MAPQ distribution in a BAM file (ExomeSeq, paired-end reads, aligned with BWA). In order to extract only the field of interest, "mapq", from the BAM file I am using the R package Rsamtools. I have a couple of doubts about outputs:
a) Retrieving header information in the BAM file.
First of all, I used scanBamHeader(bamfile)[][["targets"]] to obtain the list of names of all references in the bam file header information.
In the retrieved list, in addition to chromosomes names (1,2,3,...X,Y), I found also labels such as GL000207.1, NC_007605 and MT. What is the meaning of these labels? Is it significative to include them when MAPQ distribution is calculated?
b) MAPQ scores.
Once setting params with: param<-ScanBamParam(what="mapq"),
the bamfile is imported in R with: bam<-ScanBam(file=bamfile,param=param).
the list bam[]$mapq contains MAPQ scores of reads in the BAM file.
I cannot understand the meaning of MAPQ when its value is "NA".
Many thanks for the help.