Question

Mapping metagenome information to amplicon sequences

0

Entering edit mode

5.7 years ago

LRStar ▴ 200

Hello all,

I have a set of amplicon sequences for antimicrobial resistance determinants. Each amplicon sequence varies in length between 125 and 275 bp.

I also have a .fastq file with metagenomic reads. Each metagenomic read varies in length between 25 and 473 bp.

I would like to map the amplicon sequences to the metagenomic sample and determine a count for how often each amplicon sequence occurred in that metagenomics .fastq sample. However, some of the metagenomic reads are longer than the amplicon sequences. I believe this means that if a particular metagenomic read has an antimicrobial resistance gene in it that matches a particular amplicon sequence, there will still be an overhang of the metagenomic read that is not part of (but is simply adjacent) to the antimicrobial resistance gene of interest.

I am concerned that I might select a mapper that penalizes too greatly for the overhang (i.e. the fact that the metagenomic reads contain more than just the antimicrobial resistance gene), and that the mapper might not map these cases at all because it will penalize the overhang, or that it might do something bizarre with the overhang.

My question is: What (free) mapper can I appropriately use in this situation to map long metagenomic reads (that might contain antimicrobial resistance genes) to amplicon sequences, without having to worry about any negative results from the possible overhangs?

Thank you for sharing any of your advice!

ampliseq mapper • 1.3k views

ADD COMMENT • link updated 5.7 years ago by btsui ▴ 300 • written 5.7 years ago by LRStar ▴ 200

score 1 · Answer 1 · 2018-08-17

1

Entering edit mode

5.7 years ago

btsui ▴ 300

Q: My question is: What (free) mapper can I appropriately use in this situation to map long metagenomic reads (that might contain antimicrobial resistance genes) to amplicon sequences, without having to worry about any negative results from the possible overhangs?

A: Recommended soln: Use cutadapt to trim the overhang sequence if you know the sequence, and then align using bowtie with sensitive. If you want to keep the overhang in the alignment file, use local alignment with soft clipping, it will focus on only the region that maps.

ADD COMMENT • link 5.7 years ago by btsui ▴ 300

0

Entering edit mode

Thanks for your advice! I did now use bowtie2 --sensitive-local for my samples (n=35).

The 35 samples varied widely in terms of their total alignment, ranging between 0.09% and 94% alignment! Much of the reads that aligned did so to more than one location, with 1-time alignment ranging between 0.01% and 40.45%. The samples fall somewhat uniformly between these extremes. There are about 1300 amplicon sequences, and each metagenomic read sample has about 400,000 sequences.

Is it surprising that there is such a range in the alignment rate and the 1-time alignment rate? Are there other software or preprocessing I should also look into? I do want to keep the overhang sequences.

I tried using Trim Galore! but most of the trimming only trimmed off 1-2 bases. Cutadapt would not work because I do not know the sequence. I do not believe the amplicon sequences require trimming, but could be wrong.

Any advice on additional measures to try would be greatly appreciated! Thanks again!

ADD REPLY • link 5.7 years ago by LRStar ▴ 200