Question

How to compute MAPQ distribution with Rsamtools

0

Entering edit mode

10.0 years ago

Nicola Casiraghi ▴ 500

Hi everybody,

I want to compute the MAPQ distribution in a BAM file (ExomeSeq, paired-end reads, aligned with BWA). In order to extract only the field of interest, "mapq", from the BAM file I am using the R package Rsamtools. I have a couple of doubts about outputs:

a) Retrieving header information in the BAM file.

First of all, I used scanBamHeader(bamfile)[[1]][["targets"]] to obtain the list of names of all references in the bam file header information.

In the retrieved list, in addition to chromosomes names (1,2,3,...X,Y), I found also labels such as GL000207.1, NC_007605 and MT. What is the meaning of these labels? Is it significative to include them when MAPQ distribution is calculated?

b) MAPQ scores.

Once setting params with: param<-ScanBamParam(what="mapq"), the bamfile is imported in R with: bam<-ScanBam(file=bamfile,param=param).

the list bam[[1]]$mapq contains MAPQ scores of reads in the BAM file.

I cannot understand the meaning of MAPQ when its value is "NA".

Many thanks for the help.

MAPQ Rsamtools BAM • 4.3k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 10.0 years ago by Nicola Casiraghi ▴ 500

Ram · Accepted Answer · 2014-04-25

2

Entering edit mode

10.0 years ago

Devon Ryan 104k

a) I see you're using the 1000 genomes reference. The first one you mentioned is an unplaced contig from chromosome 18, the second is herpes, and the last one the mitochondrial DNA. Yes, you should keep these.

b) A MAPQ of NA will occur when the read has a MAPQ of 255, which technically means "not available". What this actually means is aligner dependent. For example, bismark produces this MAPQ score for all alignments and older versions of tophat for "unique" alignments.

ADD COMMENT • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

Dear Devon Ryan, thank you for your precise and very useful replies. Last thing about MAPQ of NA: As you correctly wrote, in BWA "a value 255 indicates that the mapping quality is not available". Do you know the possible read problems that cause this unavailability? Many thanks again.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Nicola Casiraghi ▴ 500

0

Entering edit mode

That I don't know. I'm not overly familiar with the internals of BWA since I rarely use it (don't read that as a critique of BWA, which is a great aligner, I'm just more familiar with tweaking other aligners so I use them more frequently).

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

OK no problem, thank you.

ADD REPLY • link 10.0 years ago by Nicola Casiraghi ▴ 500