Question

Trouble preparing paired-end (MiSeq 2x250), double-tagged and dual-indexed reads from metabarcoding study for DADA2 pipelin

1

Entering edit mode

6.4 years ago

jack1120 ▴ 30

I have paired-end reads (file_name_R1.fastq & file_name_R2.fastq) from a metabarcoding study that were originally amplified with forward and reverse primers that each had tags (6-8 bp with unique combinations for each sample) on the 5' ends, and which were also dual-indexed prior to sequencing. Thus, I had amplicons from multiple samples that were tagged on both ends, pooled together and then indexed when preparing the libraries. I then pooled the libraries and sent them for sequencing.

For example, say I had 50 soil samples that I extracted eDNA from, each of which were used to amplify fungal (ITS2) and bacterial (V3-V4) DNA. Fungal ITS2 DNA from sample 1 was amplified using Tag1a--forward_fungal_primer and Tag1a--reverse_fungal_primer, sample 2 with Tag2a--forward_fungal_primer and Tag2a--reverse_fungal_primer, etc. All 50 fungal PCR products were pooled together and dual-indexed with Index A to make Library A. Likewise, bacterial DNA from sample 1 was amplified using Tag1b--forward_bacterial_primer and Tag1b--reverse_bacterial_primer, sample 2 with Tag2b--forward_bacterial_primer and Tag2b--reverse_bacterial_primer, etc. All 50 bacterial PCR products were pooled together and dual-indexed with Index B to make Library B. Library A and Library B were then pooled together and sequenced on one lane (MiSeq V3 2x250, paired-end).

The sequencing center did one round of demultiplexing based on the indices so that I received Fungal_R1.fastq and Fungal_R2.fastq files, as well as Bacterial_R1.fastq and Bacterial_R2.fastq files. I need to do a second round of demultiplexing based on the tags to separate the fastq file by samples so that I have individual R1.fastq and R2.fastq files for each sample i.e. fungal_sample1_R1.fastq, fungal_sample1_R2.fastq, fungal_sample2_R1.fastq, fungal_sample2_R2.fastq, etc. However, I have only been able to demultiplex and split the original fastq files after they have been merged. DADA2 requires that there are R1.fastq and R2.fastq for each sample, but I have repeatedly failed to achieve this and don't understand why.

I have tried to demultiplex the respective R1 and R2 files based on (1) the forward tag only, (2) the reverse tag only and (3) the forward and reverse tags together. I have tried to do this in their original orientation (assumed mixed) and after orienting everything 5'-3'. It makes sense that I can only demultiplex the merged files based on the both the forward and reverse tags because they are on either ends of the R1 and R2 files, but I don't see why I can't demultiplex the R1 file just based on the forward tag or the R2 file just based on the reverse tag.

Any recommendations on how to demultiplex these properly so that they can be funneled into the DADA2 pipeline? I'll continue smashing my head into the wall while waiting, which seems to be just as effective as my current bioinformatics approach. Thanks in advance!

sequencing next-gen metabarcoding DADA2 amplicon • 2.2k views

ADD COMMENT • link updated 5.2 years ago by magda.wutkowska • 0 • written 6.4 years ago by jack1120 ▴ 30

score 0 · Answer 1 · 2019-02-11

0

Entering edit mode

5.2 years ago

magda.wutkowska • 0

Did you find answer to your question?

ADD COMMENT • link 5.2 years ago by magda.wutkowska • 0