bwa: Produces poor mapping, reports it as good?
1
1
Entering edit mode
8.6 years ago
Marvin ▴ 220

This is my read file (containing 1 read):

@my_read
AACGCCGATCTCAGGACCAAAAAGGGGCATACCGAGACTACAGCACGAGATCTTACAACAATGGTAGTGTTCTGCGTAGATTCGTAAATTAAGATGATAACCTCTCGCATCCCTGTTTTATCTATGAATCGCTTCATACAGCCAAATGGCAGCTGCTCTGGATTTTGGTC
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

These are the first 50 lines from my reference file:

>my_ref
TAACTCTCGAAAACACGATGCAGACCAAAATCCAGAGCAGCTGCCATTTGGCTGTATGAA
GCGATTCATAGATAAAACAGGGATGCGAGAGGTTACCATCTTAATTTACGGATCTACGCA
GAACACTACCATTGTTGTAAGATCTCGTGCTGTAGTCTCGGTATGCCCCTTTTTGGTCCT
GAGATCGGCGTCCCAGACCCCCTCCCTGATGTCGTCAATATGGTTTTTATTTGGACCGGC
TGGGAATGCCAGCGAGGACAAGTGTACTCAGCTCATCAAATAATACTCAATGCACACTCG
TGATGCTAGGCCGCGCAGGGTACACTACATCGTCGGCGTGCACAATCGCTTCGGGATAGC
CGCTGAGATATATCGAGACGTCGGTGGGATGTAGGTTGCTCATAGGCGGAGCTTGGTCAA
TCCCGGTATTAAAGATCGTAATTTCTCGTCGAAGCAGTCAAGGGCCTTATGTCCTGTACG
GAGATCGATAGCAGACAAAAGCGTGGCGCAGCCTGGATACCGTCGACGTCGAGAAGCTTA
GATTTTGTAGTGGGTTTTAACATACATACCATAGGAAACGTTGTTTTGCGGTGAGGTCTA
GAATAGGGCTCCTGTGTGTCACCGCGACTATGTTTGCACGGCTTATCATGACCTTGTACG
CTGCCGAGTTCTTGTCCTGAGCGTCGGACCCATGCTGTGCTAGTCGGCACATATCAGTTC
GATTCGGTGATGGACTGGAATTTGCGGTGGCGTGCAATAAGTCCATCACACTGACCGTTA
TTTTTGCAGGCTCAGATAGGGCCTGGGTGCGTAGCAAAGCTTTTCGCCAGCATCGGGTGT
GTCTAGCAAGTGAAGAGGAGCTGCTATCACTTTCTCTTAGGTAATCGCATGTAGGAGATA
TACTGCTATACCCAGTGCTCTTGTACCGGATGTACTCGGGTGGTTCCTGCTCACGTTGGT
GTTAAACCCGTTGGTCTGGTCGGATTACCCACGAATCCGACAACCCGCTTCTGACGTGAT
GTGTGGATGCTACTCATCCAAAGCTATGATCCGTGATCTCAAAATGTAGGCCTCTGCACT
CTAGTTACGGTGTTATGTAGCGGACACAGCCTAAGAATGGTACGCAGATCGACGTTGTCG
TCTTGGTGACTCGTTTCGGGTGACGCTAGCAAATCGCGATGAAACGATGAATACGATACG
TAGTCCTAAGCGACCTACAACGGTGGATTTGACTGCCACAACGCGAGAAGACTCGGGGCA
TTTCCCCCCGGCGGCCTCCTTGAGATAGCCGTGGGGAGCTAATTTTAGTCTTCGTAAATC
GTTACAAAAATGACAATCCAAGGACATGTGGTTCATTGGAGCGATCATGATCTATATGAG
TAGTTACTCGGCGCACGATCCAGCTATGAGCTACCGGCTTAGGGGGTACTACCCGGACTT
AGGCATGAATACCAAACACCTGTCGTTTGACATTCTTTACCACGGACTGAGGGGGGTCTG
AGGGACCCATCTGCCCATCTGCGCGCTCACTCAAGCTCCAAGAGTTTCTTGTACGTACGG
CGTCATTTACTGAACGCTTCCTAATTGAAGATGAAGATTTGCGGATTCCTCGCTCGGTGA
AATCCTAACCAGTGTATGCTTTGTAGCGTACTGACCCGAAGAAATAAATGAGGTTAGCAA
CAGCAGCAACCAAGACACCCTTAAGTGGATCCCTCCCGGAACTCTGATCCAGTCACTGCA
GCAGACACTTTGTGCAGCCTGGCCCGTCATATATCCATAGAAGCCTGCAACATCACAAGC
CGGAATCAGCTCGCTGGCTCCTACGTCATGTGAGATCACACGTTCGCTACATGGATTGGG
CATTAAGCTCAAACTGAAGTACACTGCGGCTCTTTTCGTTAACTCGTGGTGCAATCACGC
ACTAACACGAGAGATGGACGGTACAGATACCGCTTAGTTCAAGAACATTCCACGCGTGCC
ACAAATCTCTTACATTATCCAAGGTAGCTTCTGCGGAATCACAAATTACGAGCTTGAATC
AACTGTCGCCCTATCAGCGGTTCCTCCGTATGACCCCGTTGTACGCTACATGTCTGAGTT
CCTGGGGTGAAAGCTCTCAATAGGATGAAGGTACGTACCCCGAGGGCGTGCACTTCTATC
TCCAGGGAAGGCCGCCGCTTCCCCGGAGTGAATCCATTATTCTGGCGGGGTCCAGGTTAA
CTCAGTCTTGAAAAGGTGCCTGGGAATGGTCGACTAGAGTTGCCCGATCGTTCCGCCTAT
CATTGGAACACTTATCCGTACATTAGCTTTAGGTTAACGGGGTAGATTGGGGGTCCCGTA
TCCTGAAGTTTAGGGGGAATTTTCAACCTATTCAACCGCTTGGTATATTTACGTAGGTCT
TGCGACCGCCGGGACGCCCTCAGGCTCCACATACTTCCCGCCACGTTGATGGGAGATTGT
TAACCAGGATAGATACAGAACCGTCGTCTCGGACATTCCAAACCTTTTCTGAGCCTCGCG
AGCAATACATACCGCAAGCACATACTGTTTAGTATTGTCTTAAGCGGAATGGATATACAC
TCCCAAGCCTTGCACAGGCGGTTGGTGTTATCGTCCGACGACGCAAATTATAAAACTTGA
GCCTTACCGAAGGCTCGTCTAGAGCAAATAGCCAGCTGATTTTAACTGGTCACGACGGTT
GCTTATGCACGGGATGTAGCTCCAGACCCTATTGCTTGGCGGACAACAATCGGCACCCTG
AGTCAACAAGTTGGGCTCTCATGCACAATCTATCGAGGCTACGACGATATTGTAAGGCAT
TTATCTCTAATAGTGAACATTAAAAGGCAAGTCTGGGTACGGTCGAAGGTCACAAAAGAA
CGGTAGATGCAGGGGCGCCGGGCCAGGGACTCGGTCACTAGTGGACCGATATTGTCGATT

If you now map that one read against the reference with this command:

bwa mem <ref_file> <read_file>

Then it will map at position 23.

The CIGAR string will be 170M.

Comparing the sequence of the reference with the read sequence I notice that the identity between the 2 sequences is only 31%. But bwa says it maps at position 23 and the mapping quality is 60. Why does it do that?

bwa • 1.4k views
ADD COMMENT
1
Entering edit mode
8.6 years ago

BWA mapping quality (and short read mapping quality, in general) is not actually a "quality of the mapping", but a measure of the likelihood that the read maps to a given location. See How Mapq In Bwa Is Calculated? for a discussion of what is actually represents. In other words, for your read, it is highly likely that it maps to position 23 and not somewhere else. The MAPQ does not say that the mapping to position 23 is a really good alignment, though, as your question points out.

ADD COMMENT
0
Entering edit mode

Hm okay but how can I force bwa mem to only output "really good" alignments? I cannot see a parameter like "minimum base identity"?

ADD REPLY
0
Entering edit mode

I have found this thread from 2015: http://sourceforge.net/p/bio-bwa/mailman/message/34275912/
The developer suggests to use the -T parameter (alignment score parameter).

But that's not the same as saying "I want reported alignments to have at least 90% identity" ...

ADD REPLY
1
Entering edit mode

You definition of "really good" is very different from the standard. Most people want, "likely to be correct", whereas you're asking for "low edit distance" (if such an alignment existed, it would have been reported instead). I suspect you'll have to post-process the alignments for what you want.

ADD REPLY

Login before adding your answer.

Traffic: 2548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6