Question: BWA mapping with selected mismatch allowed?
3.0 years ago by
lim24m0 wrote:

This was a problem presented to me from a biologist: I have .fastq file with about 1 million DNA short reads. The design of the DNA templates that I sent for sequencing was something like: ... ...GGTATNNNNNNNNNATGT... ... where the N's are randomized sequence of 9 nucleotide bases, A T G or C.

I have to align them right now, either de novo or to a reference genome (we have the reference genome) without looking at the 9 randomized sequence of bases.

How should I go about this? What tools can I use? Are there any existing DNA/RNA alignment tools out there that can do this for me?

Thank you!

Do these randomized nucleotides have a meaning? Something UMI like? Do you still need them downstream?

  • Are the nine nucleotides barcodes to identify samples?
  • Maybe you want to trim or split your sequences to remove those nucleotides?
  • Concerning the title of the question: There is a parameter to allow a certain number of mismatches in bwa aln (see -n), but not in bwa mem.
