Question: BWASW and _alt alleles
0
gravatar for Maximilian Haeussler
3 months ago by
UCSC
Maximilian Haeussler1.3k wrote:

BWA-SW does not find all matches for me, it seems to pick only one random best match. I have no idea how to modify the arguments to have BWA find the secondary matches.

Example:

temp.fa:

>hg38-chr6-test
AACACAGGCCGGACAGAAGCTTGGAAGGTCCTGTCTCCCCAGGGAGGAGGCCCCTGGGACAGTGTGGCTCGTGTCCTTCC
CAACGGCTCCCTCTTCCTTCCGGCTGTCGGGATCCAGGATGAGGGGATTTTCCGGTGCCAGGCAATGAACAGGAATGGAA
AGGAGACCAAGTCCAACTACCGAGTCCGTGTCTACC

This input sequence has five identical matches according to BLAT:

YourSeq   196     1   196   196   100.0%  chr6_ssto_hap7  -     3499060   3499255    196
YourSeq   196     1   196   196   100.0%  chr6_qbl_hap6   -     3412352   3412547    196
YourSeq   196     1   196   196   100.0%  chr6_mann_hap4  -     3494156   3494351    196
YourSeq   196     1   196   196   100.0%  chr6_cox_hap2   -     3622013   3622208    196
YourSeq   196     1   196   196   100.0%  chr6            -    32151332  32151527    196

but just a single match with bwa-sw when I run this command:

bwa bwasw hg38.fa temp.fa

Output:

hg38_dna        16      chr6_GL000253v2_alt     3488536 0       196M    *       0       0       GGTAGACACGGACTCGGTAGTTGGACTTGGTCTCCTTTCCATTCCTGTTCATTGCCTGGCACCGGAAAATCCCCTCATCCTGGATCCCGACAGCCGGAAGGAAGAGGGAGCCGTTGGGAAGGACACGAGCCACACTGTCCCAGGGGCCTCCTCCCTGGGGAGACAGGACCTTCCAAGCTTCTGTCCGGCCTGTGTT    *       AS:i:196        XS:i:196        XF:i:1  XE:i:0  NM:i:0

BWA versions tested: 0.7.5a-r405, 0.7.17-r1194-dirty

I tried with the "-c 0.0001" argument, makes no difference.

blat bwa alignment • 183 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by Maximilian Haeussler1.3k

It has a MAPQ of 0, indicating a multimapper, which fits the BLAT result. This is normal behaviour that multimappers get a MAPQ of 0 and a random (out of all found matches) location being send to output. Is there a specific reason why you need these multimappers?

ADD REPLYlink written 3 months ago by ATpoint13k

Oh wow, yes, it's indeed there, just not documented under bwasw but bwa backtrack: "Repetitive hits will be randomly chosen." manytThanks!!

Well, yeah, I need to know the best match for the input sequence and the _alt match is definitely not a very useful match. I thought that BWA knew better these days how to deal with the _alt chroms.

Non-unique matches are very common as soon as you run into chrom regions that have an _alt... I am surprised that this has not come up before somewhere else. I wonder what the normal way to deal with this is, for bwa mem...

ADD REPLYlink written 3 months ago by Maximilian Haeussler1.3k

I think you should use bwa mem: 1) I am not sure bwa sw is alt-aware, but I am sure bwa mem is, and 2) bwa mem is the recommended algorithm for reads longer than 70bp. Also check this post:

Bwa: How To Get Multiple Hits Using The Algorithm Bwasw?

ADD REPLYlink written 3 months ago by h.mon23k

In this case, my input sequence can be <= 3000 bp.... but you're right, maybe I should use BWA-mem for this? I used bwasw because the input is pretty long, by BWAMEM standards.

ADD REPLYlink written 3 months ago by Maximilian Haeussler1.3k

bwa mem can align long sequences, but minimap2 (also developed by Heng Li) is preferred for such sequences, as it is faster and more sensitive.

edit: the post Minimap2 and the future of BWA is an interesting read

ADD REPLYlink modified 3 months ago • written 3 months ago by h.mon23k
1
gravatar for Maximilian Haeussler
3 months ago by
UCSC
Maximilian Haeussler1.3k wrote:

So the answer seems to be (see comments above):

bwa-sw does not support the XA tag. As such, you cannot get the alternative locations for multi-mapping sequences. This makes bwasw quite useless for most real-world applications that I can imagine, tasks that do not involve mapping reads to a genome.

It does have an option to get alternative locations (-z) but that option seems to be broken, see Bwa: How To Get Multiple Hits Using The Algorithm Bwasw?

I was using it to get the best location of a sequence that the user inputs on a website. No, BLAT is not the best tool here, BLAT is relatively slow to start and no, BLAT's gfServer is not an option either, because that requires a lot of RAM. BWA is quite ideal, because I have the indexes of that already on disk. minimap2 is not an option for me, I try to stay away from very recent algorithms and also I have the bwa indexes already and don't want to re-index all my genomes. I'll try to move to BWA-MEM.

Thanks everyone for the comments! I learned a lot!

ADD COMMENTlink modified 3 months ago • written 3 months ago by Maximilian Haeussler1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1356 users visited in the last hour