We used the Illumina '16S Metagenomic Sequencing Library Preparation' guide to prepare 16S gene amplicon (metagenomic) sequencing libraries from DNA and cDNA extracts derived from water samples.
We sent the libraries to a company for paired-end sequencing on Illumina MiSeq with these primers, which target the V3-V4 region of the 16S gene ...
16S Amplicon PCR Forward Primer:
16S Amplicon PCR Reverse Primer:
The 5' part of these primers is a Nextera adaptor which doesn't bind to the 16S gene. The 3' part binds to the V3-V4 region of the 16S gene.
Some of the libraries we sent were replicates (i.e. the same library sent to be sequenced twice) and some were controls. We got two demultiplexed fastq.gz files back for each library (forward and reverse reads).
The company's raw data report says 'Adapters are not trimmed away from the reads.', but as far as I can tell from looking at the fastq files, the adapters HAVE been removed, because the sequences in the fastq files seem to start with the 3' part of the above primers.
Sample 1: Forward CCACGGGGGGCTGCAGAAGATACGGCAAGACTAGAGAGTGCAAGAGGCTTATGGCACGAACGGTGTAGGGGTGAAATCCGTTGATATCGTTCGGAACACCAAAGGCGAAGGCAGTAAGCTGGTGCATTTCTGACCCTGAGGCACGAAAGCGTGGGGAGCAAAAAGGATTAGATACCCCGGTAGTCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAAGGCGAAGCTCGTATGCCGTCTTCTGCTTGAAAAAAAACAAATATACAAAGAAACGAGTTACGGGGTGGACGAGAAGGAGA
Sample 1: Reverse ACTACCGGGGTATCTAATCCTTTTTGCTCCCCACGCTTTCGTGCCTCAGGGTCAGAAATGCACCAGCTTACTGCCTTCGCCTTTGGTGTTCCGAACGATATCAACGGATTTCACCCCTACACCGTTCGTTCCATAAGCCTCTTGCACTCTCTAGTCTTGCCGTATCTTCTGCAGCCCCCCGTGGCTGTCTCTTAACACATCTGACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGGCGCCGGAGCTCTTAAAAAGAAGCAACAGAGTACTGGTGGTACAGACAGCTAGCTAGTCGA
During the analysis, I trimmed these primer parts away during denoising with DADA2 (in QIIME2).
We're hoping to submit a manuscript soon, and I want to submit the raw sequencing data to the ENA. I know it's possible to submit two fastq.gz files for each library, but I still have questions about the details.
- What data should I upload to the ENA exactly? Do I include replicates and controls? What's the norm?
- How do I prepare the raw data for upload to the ENA? Should I trim the reads, and if so, how (trimmomatic, cutadapt?), in what way, and by how much? Is it enough to just remove the primer sequence at the start of each read?
I've submitted this query to EMBL-EBI support already, but no response yet.