Does bwa-mem allow query sequence mapping to two or more same references at the same time?
1
0
Entering edit mode
5.3 years ago
Junmh • 0

I have a thought that if exist 2 conserved reference sequence, even are the same sequence. I guess, the pair-end read should mapping to both of them, and with the same mapping status. So I made up a reference to test it. But in practice, it failed, only got one primary alignment, and the other is second alignment. The forward read and reverse read are mapping to the same reference in the primary alignment. But in the second alignment, cross-mate occurs, forward read and reverse read are mapping to different reference. Why would produce this strange result? If I wanna mapping to multiple conserved reference, if bwa-mem can overcome it? How does bwa-mem deal with this situation? Or there is other method can do that? Below are the reference and result.

1) Reference Sequence:

>2B
GTGCCAGTTTCGAATTATCACAACCCAGAGCTGGATGTGGCGGCCTTTGACCAGCAGGGCAGGAAGTTCGATAACTTCAGCTCTTTGAGTATGATCTGGGAGTCCTCCAAAGTTTCCCTGGCGAGCATCGAACCAACCATGCCCATGCAGCTACATGTACACGAGGATGACAACAAACAGAAAAAACT
>1A
GTGCCAGTTTCGAATTATCACAACCCAGAGCTGGATGTGGCGGCCTTTGACCAGCAGGGCAGGAAGTTCGATAACTTCAGCTCTTTGAGTATGATCTGGGAGTCCTCCAAAGTTTCCCTGGCGAGCATCGAACCAACCATGCCCATGCAGCTACATGTACACGAGGATGACAACAAACAGAAAAAACT

2) Mapping Result:

@SQ SN:2B   LN:188
@SQ SN:1A   LN:188
@PG ID:bwa  PN:bwa  VN:0.7.16a-r1181    CL:bwa mem -a -t 10 same_ref.fas 1_R1.fq 1_R2.fq
E00477:505:H3F5MCCX2:1:1102:4970:41849  97  **1A**  1   0   1S143M  **=**   152 188 GGTGCCGGTTTC... ...
E00477:505:H3F5MCCX2:1:1102:4970:41849  353 **2B**  1   0   1H143M  **1A**  152 0   *   *   NM:i:2  MD:Z:5A35G101   MC:Z:37M107H    AS:i:133
E00477:505:H3F5MCCX2:1:1102:4970:41849  145 **1A**  152 0   37M107S **=**   1   -188    TACATGTACACGAGGATGACAA... ...
E00477:505:H3F5MCCX2:1:1102:4970:41849  401 **2B**  152 0   37M107H **1A**  1   0   *   *   NM:i:0  MD:Z:37 MC:Z:1H143M AS:i:37
alignment • 2.5k views
ADD COMMENT
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

Thanks. But I still don't know how to do that. Is it correct now?

ADD REPLY
0
Entering edit mode

If you use a numbered list like that then the code option does not work. I think we can go with what you have for now.

While this question is about bwa mem let me post an alternative. If you use bbmap.sh from BBMap suite you can choose to align multi-mapping reads to all positions that they align to equally well using ambig=all option. A guide for BBMap is available.

ADD REPLY
0
Entering edit mode

Thank you! You give a good alternative. As for formatting, I am appreciate that you give me a lot of tips.

ADD REPLY
1
Entering edit mode
5.3 years ago

the pair-end read should mapping to both of them, and with the same mapping status.

No. There should only be a single primary alignment. If they map equally well, I believe bwa picks which is primary randomly, so if you had a thousand reads like this, half would have 1A as their primary alignment, half 2B.

Here is the sam format specs. Emphasis mine

https://samtools.github.io/hts-specs/SAMv1.pdf

Multiple mapping The correct placement of a read may be ambiguous, e.g., due to repeats. In this case, there may be multiple read alignments for the same read. One of these alignments is considered primary. All the other alignments have the secondary alignment flag set in the SAM records that represent them. All the SAM records have the same QNAME and the same values for 0x40 and 0x80 flags. Typically the alignment designated primary is the best alignment, but the decision may be arbitrary

But I think I see the problem here, which may or may not be a problme in a real set-up.

bwa takes all your reference contigs and makes them into one long string, and remembers which coordinates belong to which reference. Your references start with 1 G, but your reads seem to start with 2. This is causing some differences between alignments that overlap the first sequence a bit, and reads which align to the first sequence with hard clipping . Put some n's as buffers at the start and end of your sequences, and bwa might behave a little more like you'd expect.

ADD COMMENT
0
Entering edit mode

Yes, It was better when I add some 'N' on the both sides. The CIGAR strings were the same at least in the forward read mapping.

Thanks, I also read this sentence in the SAMv1.pdf

Typically the alignment designated primary is the best alignment, but the decision may be arbitrary

That helps me a lot. But why said 'the decision may be arbitrary'? Is there has any special case?

Because of the repeat region, the mapping may be ambiguous. But if repeat had been masked before, could I say it will be more accurate to map? Oh, I have another question. Would you mind explaining what is difference between secondary alignment and supplementary alignment? Thanks in advance.

ADD REPLY

Login before adding your answer.

Traffic: 3205 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6