I have a set of amplicon sequences for antimicrobial resistance determinants. Each amplicon sequence varies in length between 125 and 275 bp.
I also have a .fastq file with metagenomic reads. Each metagenomic read varies in length between 25 and 473 bp.
I would like to map the amplicon sequences to the metagenomic sample and determine a count for how often each amplicon sequence occurred in that metagenomics .fastq sample. However, some of the metagenomic reads are longer than the amplicon sequences. I believe this means that if a particular metagenomic read has an antimicrobial resistance gene in it that matches a particular amplicon sequence, there will still be an overhang of the metagenomic read that is not part of (but is simply adjacent) to the antimicrobial resistance gene of interest.
I am concerned that I might select a mapper that penalizes too greatly for the overhang (i.e. the fact that the metagenomic reads contain more than just the antimicrobial resistance gene), and that the mapper might not map these cases at all because it will penalize the overhang, or that it might do something bizarre with the overhang.
My question is: What (free) mapper can I appropriately use in this situation to map long metagenomic reads (that might contain antimicrobial resistance genes) to amplicon sequences, without having to worry about any negative results from the possible overhangs?
Thank you for sharing any of your advice!