Question: BBMAP fails to find secondary alignment
0
gravatar for mark.rose
4 months ago by
mark.rose30
United States
mark.rose30 wrote:

I am using BBMAP to align to a reference genome. I am allowing secondary hits because it is my intention to evaluate the mapping quality of those hits to identify and discard query sequences that produce multiple hits with MAPQs above a certain threshold compared to the top hit (i.e. I only want to keep reads with top hits sufficiently dissimilar to secondary hits). I've tried a number of variations on this cmdline:

bbmap.sh  trimq=13 mintrimlength=100 minaveragequality=15 outm=test.aligned.sam outu=test.unaligned.sam  in=test.fq ref=MAIZE_B73_REF_5_GENOME secondary=t maxsites=2 k=8 ambiguous=all

test.fq contains two sequences for illustration.

the SAM file produced (minus header)

K00333:82:HTWHKBBXX:1:1101:22597:1543 1:N:0:TGCAACCGGTCC        16      MAIZE_B73_REF_5_GENOME|3        87398937        44      150=
    *       0       0       AAAAGGCGAAGAAGCTCCTAAGGGAGGCATACAGCGAAAAATAAACGCTGATACCGCCGAAATAGAAGGCGAAGAAGCTCCTAAGGGAGGCTTGCATCAAAAAGTCAGCGCTGATCTTCGGCACAAATATCATTTTGCATAACATCACAT  JJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFAFAA  NM:i:0  AM:i:44 NH:i:1

K00333:82:HTWHKBBXX:1:1101:15057:1806 1:N:0:TGCAACCGGTCC        0       MAIZE_B73_REF_5_GENOME|4        45893970        40      150=
    *       0       0       ATCCGGGGGCTCGGAAGGTGGAAAGACACGATGCATAAGGGAGCACGAAGACGTGGTCGCCTTTCAAAGGGGTCGCCCTCCTTTTAAAGGCGACTCTCCCTACTCGCGTCCCCAACCGTCGTGGGCTGAGTCTTCTCCAACACGCTCCAA  AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ  NM:i:0  AM:i:40 NH:i:2

K00333:82:HTWHKBBXX:1:1101:15057:1806 1:N:0:TGCAACCGGTCC        256     MAIZE_B73_REF_5_GENOME|3        115620153       38      52=1X14=1X82=   *       0       0       *       *       NM:i:2  AM:i:38 NH:i:2

This seems to indicate that secondary hits are being found (K00333:82:HTWHKBBXX:1:1101:15057:1806) but not for K00333:82:HTWHKBBXX:1:1101:22597:1543

If I then take K00333:82:HTWHKBBXX:1:1101:22597:1543 and blast it against this reference I get many hits. For example

Top blast hit

> MAIZE_B73_REF_5_GENOME|3 SRS link 
Length=235667834

 Score = 278 bits (150),  Expect = 5e-73
 Identities = 150/150 (100%), Gaps = 0/150 (0%)
 Strand=Plus/Minus

Query  1         ATGTGATGTTATGCAAAATGATATTTGTGCCGAAGATCAGCGCTGACTTTTTGATGCAAG  60
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  87399086  ATGTGATGTTATGCAAAATGATATTTGTGCCGAAGATCAGCGCTGACTTTTTGATGCAAG  87399027

Query  61        CCTCCCTTAGGAGCTTCTTCGCCTTCTATTTCGGCGGTATCAGCGTTTATTTTTCGCTGT  120
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  87399026  CCTCCCTTAGGAGCTTCTTCGCCTTCTATTTCGGCGGTATCAGCGTTTATTTTTCGCTGT  87398967

Query  121       ATGCCTCCCTTAGGAGCTTCTTCGCCTTTT  150
                 ||||||||||||||||||||||||||||||
Sbjct  87398966  ATGCCTCCCTTAGGAGCTTCTTCGCCTTTT  87398937

Second best blast hit

> MAIZE_B73_REF_5_GENOME|10 SRS link 
Length=150982314

 Score = 257 bits (139),  Expect = 6e-67
 Identities = 145/148 (98%), Gaps = 0/148 (0%)
 Strand=Plus/Plus

Query  1         ATGTGATGTTATGCAAAATGATATTTGTGCCGAAGATCAGCGCTGACTTTTTGATGCAAG  60
                 ||||||||||||||||||||||||||||||||||||||||||||||||||| | ||||||
Sbjct  15816931  ATGTGATGTTATGCAAAATGATATTTGTGCCGAAGATCAGCGCTGACTTTTCGCTGCAAG  15816990

Query  61        CCTCCCTTAGGAGCTTCTTCGCCTTCTATTTCGGCGGTATCAGCGTTTATTTTTCGCTGT  120
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  15816991  CCTCCCTTAGGAGCTTCTTCGCCTTCTATTTCGGCGGTATCAGCGTTTATTTTTCGCTGT  15817050

Query  121       ATGCCTCCCTTAGGAGCTTCTTCGCCTT  148
                 | ||||||||||||||||||||||||||
Sbjct  15817051  AAGCCTCCCTTAGGAGCTTCTTCGCCTT  15817078

So my question is, is there someway to parameterize BBMAP to allow it to find such other sequences in the reference or, if not, is there another short read aligner that would and produce meaningful MAPQ scores that I could use for my filtering?

Thanks

secondary bbmap • 200 views
ADD COMMENTlink modified 3 months ago by Biostar ♦♦ 20 • written 4 months ago by mark.rose30

Generally mapq scores are set to zero when a read multi-maps. Since there is no standard of how these scores are assigned it is hard to generalize how each aligner uses them.

I suggest that you just map without additional parameters like trimq=13 mintrimlength=100 minaveragequality=15 maxsites=2 (I don't use them personally) and see what happens. I recall that bbmap does not report every alignment if a read multi-maps (beyond a certain limit). @Brian may have done that conciously to keep the size of the alignment files in check.

ADD REPLYlink written 4 months ago by genomax63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1052 users visited in the last hour