Question

Preparing 16S gene amplicon (metagenomic) sequencing data for submission to ENA

0

Entering edit mode

3.5 years ago

KQUB • 0

Background:

We used the Illumina '16S Metagenomic Sequencing Library Preparation' guide to prepare 16S gene amplicon (metagenomic) sequencing libraries from DNA and cDNA extracts derived from water samples.

We sent the libraries to a company for paired-end sequencing on Illumina MiSeq with these primers, which target the V3-V4 region of the 16S gene ...

16S Amplicon PCR Forward Primer:

5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG CCTACGGGNGGCWGCAG-3'

16S Amplicon PCR Reverse Primer:

5'-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG GACTACHVGGGTATCTAATCC-3'

The 5' part of these primers is a Nextera adaptor which doesn't bind to the 16S gene. The 3' part binds to the V3-V4 region of the 16S gene.

Some of the libraries we sent were replicates (i.e. the same library sent to be sequenced twice) and some were controls. We got two demultiplexed fastq.gz files back for each library (forward and reverse reads).

The company's raw data report says 'Adapters are not trimmed away from the reads.', but as far as I can tell from looking at the fastq files, the adapters HAVE been removed, because the sequences in the fastq files seem to start with the 3' part of the above primers.

Sample 1: Forward CCACGGGGGGCTGCAGAAGATACGGCAAGACTAGAGAGTGCAAGAGGCTTATGGCACGAACGGTGTAGGGGTGAAATCCGTTGATATCGTTCGGAACACCAAAGGCGAAGGCAGTAAGCTGGTGCATTTCTGACCCTGAGGCACGAAAGCGTGGGGAGCAAAAAGGATTAGATACCCCGGTAGTCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAAGGCGAAGCTCGTATGCCGTCTTCTGCTTGAAAAAAAACAAATATACAAAGAAACGAGTTACGGGGTGGACGAGAAGGAGA

Sample 1: Reverse ACTACCGGGGTATCTAATCCTTTTTGCTCCCCACGCTTTCGTGCCTCAGGGTCAGAAATGCACCAGCTTACTGCCTTCGCCTTTGGTGTTCCGAACGATATCAACGGATTTCACCCCTACACCGTTCGTTCCATAAGCCTCTTGCACTCTCTAGTCTTGCCGTATCTTCTGCAGCCCCCCGTGGCTGTCTCTTAACACATCTGACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGGCGCCGGAGCTCTTAAAAAGAAGCAACAGAGTACTGGTGGTACAGACAGCTAGCTAGTCGA

During the analysis, I trimmed these primer parts away during denoising with DADA2 (in QIIME2).

We're hoping to submit a manuscript soon, and I want to submit the raw sequencing data to the ENA. I know it's possible to submit two fastq.gz files for each library, but I still have questions about the details.

Questions:

What data should I upload to the ENA exactly? Do I include replicates and controls? What's the norm?
How do I prepare the raw data for upload to the ENA? Should I trim the reads, and if so, how (trimmomatic, cutadapt?), in what way, and by how much? Is it enough to just remove the primer sequence at the start of each read?

Note:

I've submitted this query to EMBL-EBI support already, but no response yet.

16S metagenomics sequencing ENA • 1.5k views

ADD COMMENT • link updated 3.5 years ago by antonioggsousa 3.2k • written 3.5 years ago by KQUB • 0

score 0 · Answer 1 · 2020-11-05

0

Entering edit mode

3.5 years ago

antonioggsousa 3.2k

Hi,

You should upload the 4 fastq demultiplexed files - 2 replicates files plus forward and reverse for each of them;
The data should be demultiplexed (and so without barcodes) and as raw as possible. Although I usually remove adapters and primers if possible. Because they contain degenerated bases and so errors and unless that you provide that information in your submission, e.g., in the project description, it may be difficult to know exactly the sequences that you've used, but you might let them there, because "raw reads" means reads not processed and, therefore, without have been trimmed.

António

ADD COMMENT • link 3.5 years ago by antonioggsousa 3.2k

0

Entering edit mode

Thanks for the response. Do you also upload controls? I've walked through much of the submission process, and I didn't see any mention of uploading controls. Is it even possible? ENA says all technical sequences should be removed (adaptors, barcodes, index sequences, primers), so I'm planning to remove my primer remnants. How do you usually remove your adapters and primers? Do you use trimmomatic or cutadapt or something else?

ADD REPLY • link 3.5 years ago by KQUB • 0

0

Entering edit mode

I usually use cutadapt. I think there is nothing wrong of submitting controls, but to be honest I don't know the ENA guidelines for that. But let's say that you will submit this and you say in your paper: "I did not find any contaminants on my control" (or something like this), if someone wants to check and goes to ENA and if you don't submit the controls, they can't, even if it is a negative control. Therefore, unless that ENA guidelines express that you shouldn't, I think you should submit your controls because they make part of your sequence project/study.

ADD REPLY • link 3.5 years ago by antonioggsousa 3.2k

0

Entering edit mode

I agree, and I think ENA should allow submission of controls. I just don't know if they do. I'll have to look into it.

ADD REPLY • link 3.5 years ago by KQUB • 0