Question: How to compute MAPQ distribution with Rsamtools
0
gravatar for Nicola Casiraghi
5.9 years ago by
Germany, Heidelberg, DKFZ EMBL
Nicola Casiraghi450 wrote:

Hi everybody, 

I want to compute the MAPQ distribution in a BAM file (ExomeSeq, paired-end reads, aligned with BWA). In order to extract only the field of interest, "mapq", from the BAM file I am using the R package Rsamtools. I have a couple of doubts about outputs:

a) Retrieving header information in the BAM file. 

First of all, I used scanBamHeader(bamfile)[[1]][["targets"]] to obtain the list of names of all references in the bam file header information.

In the retrieved list, in addition to chromosomes names (1,2,3,...X,Y), I found also labels such as GL000207.1NC_007605 and MT. What is the meaning of these labels? Is it significative to include them when MAPQ distribution is calculated? 

b) MAPQ scores.

Once setting params with: param<-ScanBamParam(what="mapq")

the bamfile is imported in R with: bam<-ScanBam(file=bamfile,param=param).

the list bam[[1]]$mapq contains MAPQ scores of reads in the BAM file.

I cannot understand the meaning of MAPQ when its value is "NA". 

Many thanks for the help.

bam file rsamtools mapq • 2.4k views
ADD COMMENTlink modified 5.9 years ago by Devon Ryan94k • written 5.9 years ago by Nicola Casiraghi450
2
gravatar for Devon Ryan
5.9 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

a) I see you're using the 1000 genomes reference. The first one you mentioned is an unplaced contig from chromosome 18, the second is herpes, and the last one the mitochondrial DNA. Yes, you should keep these.

b) A MAPQ of NA will occur when the read has a MAPQ of 255, which technically means "not available". What this actually means is aligner dependent. For example, bismark produces this MAPQ score for all alignments and older versions of tophat for "unique" alignments.

ADD COMMENTlink written 5.9 years ago by Devon Ryan94k

Dear Devon Ryan, thank you for your precise and very useful replies. Last thing about MAPQ of NA: As you correctly wrote, in BWA "a value 255 indicates that the mapping quality is not available". Do you know the possible read problems that cause this unavailability? Many thanks again.

ADD REPLYlink modified 3 months ago by RamRS26k • written 5.9 years ago by Nicola Casiraghi450

That I don't know. I'm not overly familiar with the internals of BWA since I rarely use it (don't read that as a critique of BWA, which is a great aligner, I'm just more familiar with tweaking other aligners so I use them more frequently).

ADD REPLYlink modified 3 months ago by RamRS26k • written 5.9 years ago by Devon Ryan94k

OK no problem, thank you.

ADD REPLYlink written 5.9 years ago by Nicola Casiraghi450
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1965 users visited in the last hour