Question: bwa mem multiple alignments doesn't work
0
gravatar for chongchu.cs
4.0 years ago by
chongchu.cs10
United States
chongchu.cs10 wrote:

Hi, 

I am using bwa mem to do alignments, and I want to get all the alignments in the output. So I use "-a" option of "bwa mem" to do this. But from the output, it seems not all the alignments are reported.

To validate, I run a simple test, I put one query A with length 600bp in a file "read.fq", and then I put three string A1, A2, B in the subject file "ref.fa", where A1 and A2 are exactly same as A, and B (length 2800bp) contained A (not exactly, with some mismatches).  Then I run "bwa mem -a ref.fa read.fq" (ref.fa is indexed firstly), but only two alignments(A and A1 as primary, A and A2 as supplementary) are reported. 

Any one can help to explain why the alignment with A and B is not reported? Does "bwa mem -a" only reports the higher scored alignments? Or I did some thing wrong? Thank you.

Best,

Chong 

sequencing bwa alignment • 2.6k views
ADD COMMENTlink modified 4.0 years ago by Brian Bushnell16k • written 4.0 years ago by chongchu.cs10
1

Would be good if you can provide the example. EDIT: actually no need. That is because the hit to B has been removed at the seeding phase. Try "-D.1".

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by lh331k

Hi @lh3, 

Thank you for the reply. It works when using "-D 0.1". 

ADD REPLYlink written 4.0 years ago by chongchu.cs10

Will it be that the number of mismatch cause the alignment score lower than the default threshold?

The default threshold is 30 (-T) and the mismatch score is defined using -A and -B options. So maybe because of the long length of the query, the alignment score of the A2 are lower than the threshold, which will causes the alignment to not be reported

ADD REPLYlink written 4.0 years ago by Sam2.3k
3
gravatar for Brian Bushnell
4.0 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

Unfortunately, short-read aligners are in a speed race.  So if an aligner can identify some site as strictly worse than the currently best-known site, the old site can be discarded, which saves a huge amount of wasted computation on considering sites that are known to be inferior to the most likely source of the read.  This may involve probability, and thus may not be strictly deterministic.

To determine empirically how likely it is that a read came from some specific location requires exponential compute time, meaning that in practice, it is impossible.  So, people rely on heuristics which are not completely accurate, but allow the computations to finish in finite time.

If you run some aligner and it does not report inferior location X as a possible alignment, don't be surprised.  In the general case, it is impossible to report all possible alignments (in finite time).  It is similarly impossible to report all alignments with scores above X.

No matter what aligner you run, it cannot not give you what you are asking for in general, though in special cases, it may.  I suggest you ask a different question.

ADD COMMENTlink written 4.0 years ago by Brian Bushnell16k
1

P.S.  Blast is very slow but does a pretty good job of showing inferior secondary alignments.  Also, bbmapskimmer (in BBTools) is designed for that, and is much faster than Blast.

ADD REPLYlink written 4.0 years ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 755 users visited in the last hour