Question about 16S rRNA reads we got from a sequencing company
0
0
Entering edit mode
10 months ago
KQUB • 0

Hi there,

We recently had some DNA samples sequenced at Novogene UK: amplicon sequencing of the V3–V4 subregions of the 16S rRNA gene.

The final sequencing data was provided to us in 'RawData' and 'CleanData' formats.

The main thing I’m confused about is that the ‘RawData’ doesn’t actually seem to be raw, and I couldn’t find any information about exactly what processing has been done on the ‘RawData’. I’ve sent Novogene an email, but no reply yet.

For 16S amplicon data, Novogene does 2 x 250 bp paired-end sequencing on the Illumina NovaSeq 6000 system.

However, the length of the ‘RawData’ reads they sent (at least for my samples) isn’t 250 nt. It’s 227 nt for forward reads, and 224 for reverse reads.

Based on this, I am assuming they have already trimmed the primers away from the sequences, and an extra 6 nt in each case too (see calculations below):

F primer: CCTAYGGGRBGCASCAG (17 nt)

R primer: GGACTACNNGGGTATCTAAT (20 nt)

250 – 17 = 233

250 – 20 = 230

250 – 17 – 6 = 227

250 – 20 – 6 = 224

The extra 6 nt could have been taken from the 5’ or 3’ end of the read, or a combination of both. We don’t know.

When I ran DADA2 (with no extra trimming) most of the forward and reverse reads did successfully merge, so that’s further confirmation that the initial trimming was likely done (at least mostly) at the beginning of the reads, since there is apparently still enough overlap for merging.

Even though the primers appear to have been trimmed away already, I found primer sequences in some of the reads, but not right at the start, which is weird to me (see below). When we did 16S sequencing with a different company earlier (Macrogen), the primers were left in the reads when we received the data, but they were right at the beginning in each case, so could easily be removed by trimming away the first 20 or so bases. The situation here seems a bit different.

For example, the primer sequence CCTAYGGGRBGCASCAG is found in these forward reads:

@A01426:481:HFYHFDRX2:1:2103:18222:23938 1:N:0:TACGACGT+CCATGAAC GGCGACGATCCTTAGCTGGTCTGAGAGGATGATCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATCTTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGAGTGATGAAGGCGTTAGGGGGGTAAAGCTCTTTTGGCCGGGAGGATGATGGCAGTGCCGGGCGGGTCTGGTCGGGGGGCGGCGGGGGCGGGGGGG + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF,FFFFFFFFFFFFFFFF,FF:FFFFFF,:FFFF:FFFFF,F,FFFFFF::FFFFFFFFFFFFFF,FF:FFF:FFF,:FFF,F:F:F,,F::FF,,,,:FFF,,F,,,:F,,,,,F,,,:,FFF,,F:,,,:F,,,F:,,,F:F,F,,,,:F,F,F:F,,,,,:,,FFF:,:FF,F,,

@A01426:481:HFYHFDRX2:1:2115:12843:7294 1:N:0:TACGACGT+CCATGAAC ATACCCCCGTAGTCCCGAAACAGATACCCATGTAGTCCGCTCATAGAGAGGGATGCTCTTCCGATGTCCTATGGGAGGCAGCAGTGGGGAATCTTGCACAATGGAGGAAACTCTGATGCAGCGATGCCGCGTGAGTGAAGACGGCCTTTGGGTTGTAAAGCTCTTTTGTAGGGGAAGATAATGACTGTAACCTAAGAATAAGGTCCGGCTAACTTCGTGCCCGCAGC + FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF:FFFFFF,FFFFFF:FF:FFFFFFFFFFF,FF:,F,F,F:FF:FF:FFFFFFFFFFFFFFFF::FFFFFFF,FFFFFF:FF,FF:F,,FFFFF,,FFFF:FFFF,FFFFFFF:,FFFFF

And the primer sequence GGACTACNNGGGTATCTAAT is found in these reverse reads:

@A01426:481:HFYHFDRX2:1:2128:10022:6543 2:N:0:TACGACGT+CCATGAAC GTTTCGGGACTACACGGGTATCTAATCCTGTTTGATCCCCACGCTTTCGTGTCTCAGCGTCAGTTACAATTTAGCAAGCTGCCTTCGCAATCGGTGTTCTGTGTGATCTCTAAGCATTTCACCGCTACACCACACATTCCGCCTACTTCAATTGTACTCAAGAATATCAGTTTCTATGGCAGTTCTACAGTTAAGCTGTAGGCTTTCACCACTGACTTAATACC + FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:F:FFF,FFFFFFF:FFFFFFFFFFF:FFFF:FFFF

@A01426:481:HFYHFDRX2:1:2134:2871:31203 2:N:0:TACGACGT+CCATGAAC TCTGCTGCATCCAGGAGGCTGATCGAGTTGTTTAGGGACTACGAGGGTATCTAATACTGTTTGCTCCCCACGCTTTCGTGCATGAGCGTCATTGTTATCCCAGGGGGCTGCCTTCGCCAATGGTATTCCTCCACAGATCTACGCATTTCACTGCTAAACGTGGAATTCTACAACCCTATGACACACACTAGATATACAGTCACATGCGCAATACCCAGGTTAAG + ,FFF:FFFFF:F::,F,,,FF:FFFF:,:,,F,,FF:FF,F:FF,,FFFFF,F:F,:F,F,F:,:FFFFF:,,FF,,,F::F:,F,FF,FF,,::FFF:FFF:F,FFF:,::F,,FF,F,FF:,FFFF,FFF,FF,FF,F:FFFF,FFF:,F,FFF,F,FFFF,FF,F,FF,,FFF,,F,F:FFF,,,F:,::FFFFF,F,FFF:FF,F,FF,,F,FF,F:F:F

Does anyone know what's going on here? All help appreciated!

Cheers,

Kevin

16S adapters metabarcoding primers amplicon • 851 views
ADD COMMENT
0
Entering edit mode

I can't imagine that strangers on the internet could provide a better answer to this question than a company itself. If this was a 10-year old sample, I would understand why the company records may not be available.

ADD REPLY
0
Entering edit mode

I've asked the company, but they haven't replied yet, so I posted here too because I'd like to get moving on this project ASAP and would like to avoid unnecessary waiting around if possible.

ADD REPLY

Login before adding your answer.

Traffic: 2469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6