Question: Shotgun rarefactions - metagenomics (microbiome), MetaPhlAn2
gravatar for robert.kwapich
2.9 years ago by
United States/New York
robert.kwapich30 wrote:

Hi there community!

For some time I was working os 16S rRNA gene survey data. For this type of analysis one could use a rarefaction approach in order to have the same depth for each sample. Having different depths for each sample is sometimes referred to as searching 1 square meter of amazon jungle and 1 square kilometer of mojave desert and then comparing OTUs, taxons, etc... It is relatively easy to employ a rarefaction, as it is implemented in many software packages: qiime, mothur.

I have now a shotgun dataset - a whole genome sequencing of microbiome. For the start I am using a microbiome helper SOP. For taxonomy assignement I use MetaPhlAn2 approach. MetaPhlAn2 wiki doesn't even mention rarefaction. Since this step might be crucial for comparative analyses, where I have two groups/categories, each containing around 30 samples I want to have each sample as "standardized" as possible. Are there any approaches two rarefy WGS data? Is there a reason why I has not been yet implemented in for example MetaPhlAn2?

I'd be grateful for any insight, comments and suggestions.

ADD COMMENTlink modified 7 weeks ago by sapuizait0 • written 2.9 years ago by robert.kwapich30

Hi, Did you find any solution to this problem? Any suggestion on how to compute diversity with followed by metaphlan2?

ADD REPLYlink written 6 months ago by biobiu110

Thanks for this post robert.kwapich, this is a critical step if u wanna compare groups of samples that have been shotgun-metagenome sequenced! My intitial instinct was to rarefy based on single copy housekeeping bacterial genes or the ykaryotic contamination but i dont wanna reinvent the wheel if there is already a method available! Cheers!

ADD REPLYlink written 7 weeks ago by sapuizait0
gravatar for robert.kwapich
6 months ago by
United States/New York
robert.kwapich30 wrote:

I followed some methods from the paper: "Unexplored diversity and strain-level structure of the skin microbiome associated with psoriasis",

I remember also checking "Nonpareil" software to estimate the saturation/redundancy of my samples, and each was reaching a nice high percentage for all samples, but one or two, that were discarded.

See Nonpareil:

What I did later was to convert relative abundances (i.e. percentages) to pseudo-counts, i.e. multiply percentages by the number of reads per sample.

This would produce microbial profiles that have different number of observations (i.e. counts) reflecting different sequencing depths. For taxonomy abundance analysis you could then use edgeR implementation of GLMs (see . This method can account of different number of observations.

For alpha and beta diversity I normalized the counts/observations to the same total number of observations, like the maximum. Since all my samples had comparable number of sequences and reached comparable saturation, perhaps this wouldn't introduce many errors.

Nevertheless, the nature paper above uses unique species count for each sample as a measure of richness, and for this, if you have reached similar and high saturation of each sample, we'd not expect much difference. But evenness with Inversed Simpson for example needs to use this normalized pseudo-counts stratified at some level, ex. species.

But it has been some time, and many papers published since then that I didn't follow. So, that is it. If you find out something better, please let me know.

ADD COMMENTlink modified 6 months ago • written 6 months ago by robert.kwapich30

Thanks! I'll update... I'm really stuck with rarefying metaphlan2. Well I can subsample reads but it will take forever to subsample different cuttoffs but with repeats it will take forever to re-run metaphlan2. As for the diveristy- observed OTU (species) and shannon can be calculated easily with relative abundance. Also Jaccard distance, so I believe I'll start with them... Did you used metaphlan2 to get the relative abundance? The problem with pseudocounts is that the number of reads is different (the main reason I want to rarefy). Thanks again for your detailed answer

ADD REPLYlink written 6 months ago by biobiu110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1009 users visited in the last hour