BWA mem how to know if a read is mapped uniquely?
2
1
Entering edit mode
6.5 years ago

I have read many posts regarding this question. The XT, XA tag doesn't seem to be used anymore by BWA mem, so I can't use them to identify uniquely mapped reads.

Some posts said that multiple hits are to be assigned a quality score of 0. However, I don't have any links to confirm if this is still true. Does anyone know where I can confirm this?

Does anyone know how to tell if a read is uniquely map from the result of BWA-mem alignment? Thank you.

bwa alignment • 12k views
3
Entering edit mode
6.5 years ago

You can filter out the secondary alignments in your file by excluding reads with the "secondary" flag (0x100 binary or 256 decimal) set. You can use samtools for this:

samtools view -F 256 file.bam
0
Entering edit mode

If you want to ask for reads that have no secondary alignments, then you might need a script that runs through a read-name sorted file and looks for singleton read names.

0
Entering edit mode

Thank you, I'll try that.

I'm just wondering why hasn't there been a tool to perform such a task. This seems to be a common task when handling bam/sam files.

0
Entering edit mode

N.B., this isn't a common task. You'd be well advised to drop the concept of a "unique alignment" and just filter by MAPQ (and the flags that Matt Shirley indicated).

1
Entering edit mode

I am with the OP though that one should be able to select uniquely mapping reads. Even considering that I also agree that most people that want to make use of these "uniquely mapping reads" may not fully understand all the implications of it.

After all the concept of secondary alignment is not any better defined than that of a unique read. What exactly is a secondary alignment? At what point does an alignment become secondary and under what conditions won't get showns at all? FWIW I only have a superficial understanding of how it all works.

As for the original question I believe that bwa mem will produce a mapping quality of 0 if a read aligns in multiple locations. It used to work that way for a long time, even though technically there is no requirement or standard that would require that, it is a bwa specific convention. To filter for uniquely mapped reads pass the -q 1 filtering flag to samtools view.

0
Entering edit mode

It boils down to which definition of "uniquely mapping" you want to go by. I saw one on the samtools-help email list in the last week that would have meant filtering with samtools view -q 40.

0
Entering edit mode

This won't work with output from bwa because it (always?) sticks secondary alignments into the XA tag of the primary alignment instead of recording them as secondary alignments in the samfile.

0
Entering edit mode
6.5 years ago
Lemire ▴ 910

I am using version 0.7.12 and bwa mem reports the XA tag (fake data):

% bwa mem fake.fa  fake.fq
@SQ     SN:fakechr1     LN:1500
@SQ     SN:fakechr2     LN:1500
@PG     ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa mem fake.fa fake.fq
[M::process] read 1 sequences (126 bp)...
[M::mem_process_seqs] Processed 1 reads in 0.000 CPU sec, 0.000 real sec
HISEQ-MFG:110:C75JDANXX:2:1101:13263:2218       0       fakechr2        251     0       126M    *       0       0       ATGGGTCATCCCCTTCACTCCCAGCTCAGAGCCCAGGCCAGGGGCCCCCAAGAAAGGCTCTGGTGGAGAACCTGTGCATGAAGGCTGTCAACCAGTCCATAGGCAAGCCTGGCTGCCTCCAGCTGG      #==BBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBGGGGGGGGGG      NM:i:0  MD:Z:126        AS:i:126   XS:i:126 XA:Z:fakechr1,+251,126M,0;
[main] Version: 0.7.12-r1039
[main] CMD: bwa mem fake.fa fake.fq
[main] Real time: 0.004 sec; CPU: 0.000 sec

0
Entering edit mode

Sorry, it turns out that BWA does still give XA tag..... I was reading the latest SAM file spec and I can't find the XA tag anymore. I thought this means that the XA tag isn't available anymore.

I'm a bit confused about the output though. If there is an XA tag, doesn't mean that it is a non-unique hit and should have a XT:A:R tag as well? Also can you please tell me if you can find any 256 flag in the bam file?

1
Entering edit mode

X, Y and Z tags are not specified by the SAM spec, these tags are reserved for the aligners to define them. See the bwa manual, it has description for XA etc.

http://bio-bwa.sourceforge.net/bwa.shtml

as for what tags are filled in and what are not that again is at the discretion of the aligner, there is no requirement for what should be filled and what should not