Question: Normalization of certain gene abundance in a metagenome
gravatar for bioinfo
5.9 years ago by
bioinfo790 wrote:

Hi I have analysed two metagenomes (whole-genome shotgun; 5 and 8 million reads respectively, 101 bp reads) to look for antibiotic resistance genes and will then compare the results. I have got the raw resistance gene counts from each metagenome (e.g. 10 gotA, 20 fotG etc.). Now I want to normalize the data sets to compare them.

I have planned to use the number of 16S rRNA genes counts in each metagenome to normalize.  So, I have extracted the 16S rRNA sequences from the metagenomes and assigned them to taxonomic classes (based on SILVA database). Then I realised If I want to normalise the resistance genes counts for each metagenome, I need the full length 16S rRNA gene counts but the short 101 reads can only say how many of those reads belong to 16S sequences if i dont assemble the reads (e.g. for metagenome 1, out of 5 million reads 40000 reads are 16S sequences and for Metagenome 2, out of 8 million sequences 45000 are 16S sequences ). That means I can't normalise the gene counts by the 16S read counts as multiple short 101 bp reads can be from the same 16S full length gene (~1582 bp). That means the 40000 16S gene counts could be from 10000 16S rRNA genes. But I dont have that full length info. How do you people normalize? do you consider gene length as well?

metagenome 16s rrna • 3.0k views
ADD COMMENTlink modified 5.9 years ago by Brian Bushnell17k • written 5.9 years ago by bioinfo790
gravatar for Brian Bushnell
5.9 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

You may be able to improve the situation by looking at only the V4 region of 16S, which is much shorter; maybe 250bp, or a bit longer.  If your reads are paired, and the insert size is largely under 200bp, you can merge them into single longer reads (up to ~190bp for 2x101bp) with BBMerge, which will greatly increase specificity.

V4 is not as good as the full 16S, of course, but it is commonly used as a proxy when full-length information is unavailable.  For metagenomes, if you want full-length 16S, you have to go with PacBio rather than Illumina.  And for V4 on Illumina, it is MUCH better to go with longer read lengths of 2x150, 2x250, or ideally 2x300bp, which you can do on MiSeq.

ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by Brian Bushnell17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1240 users visited in the last hour