Question: Bwa: "Xt:A:U" And Mapq Of 0 At The Same Time
gravatar for Steffi
7.8 years ago by
Steffi570 wrote:

I map RNA-Seq Data with BWA to the genome. The output files from BWA in sam-format contain reads that have on the one hand the tag "XT:A:U" and, on the other hand, as well a mapping quality of 0.

What does this mean? I thought that "XT:A:U" means uniquely best hit?! How does this then go together with a MAPQ of 0?

Best, Stefanie

mapping bwa • 8.7k views
ADD COMMENTlink modified 6.3 years ago by Biostar ♦♦ 20 • written 7.8 years ago by Steffi570

I don't think BWA calculates MAPQ at all, use the ""XT:A:U" for fetching uniquely mapped reads, more over you probably noticed that reads which failed mapping "have" MAPQ 0.

ADD REPLYlink written 7.8 years ago by Zhidkov570

There are also other MAPQ values, like 10, 13, 17, ... . So something is calculated..

ADD REPLYlink written 7.8 years ago by Steffi570

So, as far as I understood, the MAPQ value is also 0 if there are other possible alignments - even with a lower score. So a read might have a "XT:A:U" score but at the same time a MAPQ of 0 - meaning that there are many other possible alignments with a slightly worse score.

ADD REPLYlink written 7.8 years ago by Steffi570
gravatar for Sukhdeep Singh
7.6 years ago by
Sukhdeep Singh9.9k
Sukhdeep Singh9.9k wrote:

From the bwa manual page

Note that XO and XG are generated by BWT search while the CIGAR string by Smith-Waterman alignment. These two tags may be inconsistent with the CIGAR string. This is not a bug.

So, I assume BWA gives a read a uniquely aligned tag but the probability that its aligned correctly is very low. This might be a case of allowing mismatches, it was not able to map earlier but with a allowed number of mistmatches, it could uniquely be mapped at a certain position with very high error rate. When I filter my data, I use the mapq threshold of 1, so that I have uniquely aligned as well has not the worst quality.

You can use samtools view -bq 1 file.bam > file_unique.bam for this.

Someone observed the same scenario posted here in the case of paired-end sequencing data.


ADD COMMENTlink written 7.6 years ago by Sukhdeep Singh9.9k

Do you by any chance know how to get easily only those that have mapping quality of zero? Or alternatively how to subtract one bam file from another? :) I am now parsing SAM file and filtering it based on MAPQ column, but I'd rather use some tool for this.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Biomonika (Noolean)3.1k

For the first question, even if there is some tool, I think you won't gain the speed, grep or awk would be best. Subtracting one bam from another is a different thing, you can use bedtools (subtractBed) for that or try bamtools, filter might work work for you :)

Available bamtools commands:
    convert         Converts between BAM and a number of other formats
    count           Prints number of alignments in BAM file(s)
    coverage        Prints coverage statistics from the input BAM file
    filter          Filters BAM file(s) by user-specified criteria
    header          Prints BAM header information
    index           Generates index for BAM file
    merge           Merge multiple BAM files into single file
    random          Select random alignments from existing BAM file(s), intended more as a testing tool.
    resolve         Resolves paired-end reads (marking the IsProperPair flag as needed)
    revert          Removes duplicate marks and restores original base qualities
    sort            Sorts the BAM file according to some criteria
    split           Splits a BAM file on user-specified property, creating a new BAM output file for each value found
    stats           Prints some basic statistics from input BAM file(s)
ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Sukhdeep Singh9.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1304 users visited in the last hour