Question

Size Variation Of The 16S Gene

0

Entering edit mode

10.7 years ago

xapple ▴ 230

I have a bunch of sequences obtained from a metagenomic experiment targeting the 16S gene of bacteria. The primers are designed to span from V3 to V4. The 16 rDNA gene is relatively well conserved, nonetheless I expect some variation in the size of this region. However, I'm seeing pretty large variations in the size of these fragments as you can see in the graph below. I would have thought that the distribution was going to be more compact. Where could I check for the sanity of this data ? I can't seem to find any sources on the amplitude of the naturally occurring length variation of the bacterial 16S gene.

hist

EDIT - A few clarification to answer question from @Mabeuf:

The source sample is a lake sediment core (highly phosphorus-saturated sediments) from which DNA was extracted.
The primers used are 341F 5' -CCTACGGGNGGCWGCAG-3' and 805R 5' -GACTACHVGGGTATCTAATCC-3' and was run with no other sample on an Illumina MiSeq.
Subtracting those two numbers 805 - 341 = 464 base pairs would fall right between the two right peaks on the graph.

• 7.0k views

ADD COMMENT • link updated 7.1 years ago by fanli.gcb ▴ 730 • written 10.7 years ago by xapple ▴ 230

score 1 · Answer 1 · 2013-08-05

My initial thought is that you are picking up multiple taxonomic kingdoms i.e Bacteria, Archaea, Eukaryotes.

A few questions which might point in the right direction:

What was your source samples? Could they include some eukaryotic 18S (I've seen this happen in my own data)
Which primers did you use and which peak is the one which you would be expecting? Do they have a history cross amplification? 357f-518r do (although that's too small for this data)
Is there any chance that the sequencers that you used (if an external company) ran your samples on different machines/runs. I had a dataset return with two peaks because one was ran on 454 and one on 454+. Unlikely, but just mentioning it.

score 0 · Answer 2 · 2017-04-05

0

Entering edit mode

7.1 years ago

fanli.gcb ▴ 730

An old question, but recently came up in some of our work using the V1V2 (27F-338R) primer set. There seems to be a lot of length variation in our amplicons: enter image description here

This paper also reports substantial length variation in the V3V4 amplicon (341F-534R): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643231/

ADD COMMENT • link 7.1 years ago by fanli.gcb ▴ 730

0

Entering edit mode

I think such variation is completely normal. In e.g. current RDP (large file) the longest bacterial 16S rRNA is 2,487 bp. I imagine the majority of the length difference between that and the 1,541 bp reference E. coli 16S is due to hyper variable regions. If it's important, you could align a random subset of RDP to the E. coli reference to see for yourself..

ADD REPLY • link 7.1 years ago by 5heikki 11k