My first post on here.........
I am using BWA-mem in quite an unusual situation, I am aligning small sequences of around 150bp to small reference sequences of variable length. When I give a reference file with multiple similar reference sequences my overall alignment rates increase considerably.
>seq1 CAGGCTCTGCTCTTCATAATCATACCTTTGTGACTCAGGATGCTGT >seq2 CAGGCTCTGCTCTTAATATCTGGCCGTCGTATTCCACCTCTGCGACTCATGATGCTGT (100,000 aligned) >seq3 CAGGCTCTGCTCTTCATAATTTCTATCTTGCCCACCCTACTCGACACAGAGCAAAAATCCAACACTCCCAATATTGCCGTGGCTTCGACCTCTTGCTCAGATTTTCTTGTTACCTTTGTGACTCAGGATGCTGT >seq4 CAGGCTCTGCTCTTCATAACCCTCCCTGCGAGTCCTTAAGTCTGACTCGGATCCTTAAACAACCTTTTCTTACCTTTGTGACTCAGGATGCTGT
>seq2 CAGGCTCTGCTCTTAATATCTGGCCGTCGTATTCCACCTCTGCGACTCATGATGCTGT (25,000 aligned)
My fastq files align at higher numbers to ref1.fasta than ref2.fasta, but allow a far greater number of deletions and mis-matches with ref1.fasta.
I realize this is not what BWA-mem was really designed to do, but would be really grateful if you could help explain this activity, could it be something to do with the initial seeding of the alignment?
Many thanks, Steve W.