Question: Size Variation Of The 16S Gene
0
gravatar for xapple
6.3 years ago by
xapple230
UU
xapple230 wrote:

I have a bunch of sequences obtained from a metagenomic experiment targeting the 16S gene of bacteria. The primers are designed to span from V3 to V4. The 16 rDNA gene is relatively well conserved, nonetheless I expect some variation in the size of this region. However, I'm seeing pretty large variations in the size of these fragments as you can see in the graph below. I would have thought that the distribution was going to be more compact. Where could I check for the sanity of this data ? I can't seem to find any sources on the amplitude of the naturally occurring length variation of the bacterial 16S gene.

hist

EDIT - A few clarification to answer question from @Mabeuf:

  • The source sample is a lake sediment core (highly phosphorus-saturated sediments) from which DNA was extracted.
  • The primers used are 341F 5' -CCTACGGGNGGCWGCAG-3' and 805R 5' -GACTACHVGGGTATCTAATCC-3' and was run with no other sample on an Illumina MiSeq.
  • Subtracting those two numbers 805 - 341 = 464 base pairs would fall right between the two right peaks on the graph.
• 4.4k views
ADD COMMENTlink modified 2.6 years ago by fanli.gcb690 • written 6.3 years ago by xapple230
1
gravatar for Daniel
6.3 years ago by
Daniel3.7k
Cardiff University
Daniel3.7k wrote:

My initial thought is that you are picking up multiple taxonomic kingdoms i.e Bacteria, Archaea, Eukaryotes.

A few questions which might point in the right direction:

  • What was your source samples? Could they include some eukaryotic 18S (I've seen this happen in my own data)
  • Which primers did you use and which peak is the one which you would be expecting? Do they have a history cross amplification? 357f-518r do (although that's too small for this data)
  • Is there any chance that the sequencers that you used (if an external company) ran your samples on different machines/runs. I had a dataset return with two peaks because one was ran on 454 and one on 454+. Unlikely, but just mentioning it.
ADD COMMENTlink written 6.3 years ago by Daniel3.7k

I edited my question to provide more clarifications ! So you do think that the spread is too large to be of natural cause ?

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by xapple230
0
gravatar for fanli.gcb
2.6 years ago by
fanli.gcb690
Los Angeles, CA
fanli.gcb690 wrote:

An old question, but recently came up in some of our work using the V1V2 (27F-338R) primer set. There seems to be a lot of length variation in our amplicons: enter image description here

This paper also reports substantial length variation in the V3V4 amplicon (341F-534R): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643231/

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by fanli.gcb690

I think such variation is completely normal. In e.g. current RDP (large file) the longest bacterial 16S rRNA is 2,487 bp. I imagine the majority of the length difference between that and the 1,541 bp reference E. coli 16S is due to hyper variable regions. If it's important, you could align a random subset of RDP to the E. coli reference to see for yourself..

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by 5heikki8.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 680 users visited in the last hour