Question

Bwa: "Xt:A:U" And Mapq Of 0 At The Same Time

6

Entering edit mode

12.2 years ago

Steffi ▴ 580

I map RNA-Seq Data with BWA to the genome. The output files from BWA in sam-format contain reads that have on the one hand the tag "XT:A:U" and, on the other hand, as well a mapping quality of 0.

What does this mean? I thought that "XT:A:U" means uniquely best hit?! How does this then go together with a MAPQ of 0?

Best, Stefanie

bwa mapping • 12k views

ADD COMMENT • link updated 10.7 years ago by Biostar 20 • written 12.2 years ago by Steffi ▴ 580

0

Entering edit mode

I don't think BWA calculates MAPQ at all, use the ""XT:A:U" for fetching uniquely mapped reads, more over you probably noticed that reads which failed mapping "have" MAPQ 0.

ADD REPLY • link 12.2 years ago by Zhidkov ▴ 600

0

Entering edit mode

There are also other MAPQ values, like 10, 13, 17, ... . So something is calculated..

ADD REPLY • link 12.2 years ago by Steffi ▴ 580

0

Entering edit mode

So, as far as I understood, the MAPQ value is also 0 if there are other possible alignments - even with a lower score. So a read might have a "XT:A:U" score but at the same time a MAPQ of 0 - meaning that there are many other possible alignments with a slightly worse score.

ADD REPLY • link 12.2 years ago by Steffi ▴ 580

score 2 · Answer 1 · 2012-05-05

2

Entering edit mode

12.0 years ago

Sukhi Singh 11k

From the bwa manual page

Note that XO and XG are generated by BWT search while the CIGAR string by Smith-Waterman alignment. These two tags may be inconsistent with the CIGAR string. This is not a bug.

So, I assume BWA gives a read a uniquely aligned tag but the probability that its aligned correctly is very low. This might be a case of allowing mismatches, it was not able to map earlier but with a allowed number of mistmatches, it could uniquely be mapped at a certain position with very high error rate. When I filter my data, I use the mapq threshold of 1, so that I have uniquely aligned as well has not the worst quality.

You can use samtools view -bq 1 file.bam > file_unique.bam for this.

Someone observed the same scenario posted here in the case of paired-end sequencing data.

Cheers

ADD COMMENT • link 12.0 years ago by Sukhi Singh 11k

0

Entering edit mode

Do you by any chance know how to get easily only those that have mapping quality of zero? Or alternatively how to subtract one bam file from another? :) I am now parsing SAM file and filtering it based on MAPQ column, but I'd rather use some tool for this.

ADD REPLY • link 10.5 years ago by Biomonika (Noolean) 3.2k

1

Entering edit mode

For the first question, even if there is some tool, I think you won't gain the speed, grep or awk would be best. Subtracting one bam from another is a different thing, you can use bedtools (subtractBed) for that or try bamtools, filter might work work for you :)

Available bamtools commands:
    convert         Converts between BAM and a number of other formats
    count           Prints number of alignments in BAM file(s)
    coverage        Prints coverage statistics from the input BAM file
    filter          Filters BAM file(s) by user-specified criteria
    header          Prints BAM header information
    index           Generates index for BAM file
    merge           Merge multiple BAM files into single file
    random          Select random alignments from existing BAM file(s), intended more as a testing tool.
    resolve         Resolves paired-end reads (marking the IsProperPair flag as needed)
    revert          Removes duplicate marks and restores original base qualities
    sort            Sorts the BAM file according to some criteria
    split           Splits a BAM file on user-specified property, creating a new BAM output file for each value found
    stats           Prints some basic statistics from input BAM file(s)

ADD REPLY • link 10.5 years ago by Sukhi Singh 11k