Analyzing non overlapping ITS reads
Entering edit mode
9 weeks ago


I have a dataset obtained with Illumina Miseq with 2*250 sequencing reads on the full lenght fungal ITS region. As the amplicon is longer than 600 bp, my R1 and R2 reads do not overlap. My objective is to obtain OTU/ASV table with taxonomic classification

I have already performed "classical" picking and taxonomy assignment (USEARCH) independentrly for the R1 and R2 reads, with overall comparable results. The taxonomy classification, however, is not particularly good as lots of OTUs got classified in SH at the sp. level.

I would like to improve taxonomic classification (if possible) over using only the R1, and I was looking around for methods for mapping non-overlapping reads on the UNITE database retaining taxonomic information (as for now, candidates could be CENTRIFUGE or KRAKEN as far as I understood).

I was wondering, before really beginning to test, if this approach would actually give me better taxonomic classification results or if I would end up with results very similar to what already obtained for the R1 alone.

Does anyone have experience with something similar? Did anyone obtain better taxonomic classification (i.e. higher number of species level identification) with read alignment respect to R1 alone?

Thanks, Francesco.

merging amplicon sequencing fungi • 250 views
Entering edit mode

You may be able to use some of the tools mentioned in this thread. Be careful though in not over extending the reads: How to extend contigs from single-end reads?

Entering edit mode
8 weeks ago
patrickdm ▴ 40

Hello, you could run DADA2 on both R1 and R2, using mergePairs(..., justConcatenate=TRUE) option:

(it) allows the paired reads to be joined without any overlap, but with 10Ns inserted in between the forward and reverse reads. The chimera removal and assignTaxonomy functions will handle such merged reads, although some other functions may fail (eg. addSpecies)

I've been using it on a recent project. I was obtaining very few mergers, so I started with an R1-alone first, then I tried R1-R2 concatenation and I could obtain classification improvements compared to the analysis of R1 alone.



Login before adding your answer.

Traffic: 2446 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6