Question: Differential Abundance Analysis wih DESeq2 and metabolites as continuous predictor variables
gravatar for greenm11
4 weeks ago by
greenm110 wrote:

Hi all,

I'm currently working with 16S data from an experiment involving two different strains of mice, at 4 different time points between P17 and P84. I'm attempting to analyze the differentially abundant taxa between genotypes at each postnatal age sampled. In addition, I'm hoping to make use of some previously acquired metabolite data to extract some differentially abundant taxa using SCFA levels as a continuous predictor variable. I have a small sample size due to the pilot nature of the study, amounting to 12 fecal and 12 cecal samples for each sampling group. I've worked through my data using both ALDEx2 and DESeq pipelines, although I am uncertain that my analyses are optimized for the experimental questions I want to answer.

My metabolite data is derived from NMR spectroscopy, so all of the values are relative intensities ranging from 0-1. I'm concerned that this is confounding my results, given the log fold change values that DESeq outputs are per unit of a continuous predictor.

I've provided an example of my code for the fecal samples from the P28 timepoint below, looking at differential taxa in relation to butyrate levels. This has been repeated for all timepoints independently after subsetting data by age.

#dat_pr is an un-normalized sequence count table of ASVs for each sample 
dat_pr_fecal_28_ap = subset_samples(dat_pr_fecal_clean_met, Age == 28)
dds_fecal_28 = phyloseq_to_deseq2(dat_pr_fecal_28_ap, ~butyrate_levels)
dds_fecal_28 = DESeq(dds_fecal_28)

res_fecal_28 = results(dds_fecal_28, name = 'butyrate_levels', independentFiltering = FALSE)

res_df_fecal_28 = data.frame(res_fecal_28)
res_df_fecal_28 = (res_df_fecal_28
          %>% rownames_to_column('ASV'))

The output of this code not only returns no significantly differentially abundant taxa (no adjusted p-values < 0.01), but the volcano plot reveals some odd behavior of the -log p-values that seem to plateau out at a certain ceiling below this threshold (see here)

Please let me know if my model is constructed correctly, and if there is anything I'm missing that may be impinging upon my results!

Thanks so much in advance :)

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by greenm110


ADD REPLYlink written 4 weeks ago by Kevin Blighe66k
gravatar for Kevin Blighe
4 weeks ago by
Kevin Blighe66k
Kevin Blighe66k wrote:

Hi, you should use nominal / un-adjusted p-values for the volcano plot. Then, it will look more typical. The reason why many metabolites are grouped at the same horizontal line is that their adjusted p-values were set to the same value.

Any reason why you set independentFiltering = FALSE?

Can you elaborate on what is butyrate_levels, I mean, I know what is butyrate, but is this a continuous or categorical variable?

What is the output of resultsNames(dds_fecal_28)?


ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe66k

Hi, Thanks for your rapid reply! The independent filtering variable being set to false is the default (I believe, but could be mistaken) and the argument was present in a differential abundance analysis template provided to my lab by a statistician in a enighbouring lab. Should this argument be changed to TRUE? As I mentioned the butyrate_levels variable is a vector of relative intensities between 0 and 1 that reflect the butyrate levels in the sample as measured via NMR. resultsNames(dd_fecal_28) simply outputs the following: [1] "Intercept" "butyrate_levels" which I then used to determine the correct argument to use for name in the results function. It does not provide any actual results itself.


ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by greenm110

Thank you! I think that the default for independentFiltering is TRUE, but this may depend on the version of DESeq2 that you are using.

I'm attempting to analyze the differentially abundant taxa between genotypes at each postnatal age sampled

I am not sure that your current code permits that you do this. In your current code, you just subset your data to include the 'age 28' samples and then, via DESeq2, you are effectively testing butyrate levels against gene expression for just this 'age 28' group. So, there is no information about genotype here.

If you want to include genotype information, then you may need to modify the formula in the following command:

dds_fecal_28 = phyloseq_to_deseq2(dat_pr_fecal_28_ap, ~butyrate_levels)

, for example:

dds_fecal_28 = phyloseq_to_deseq2(dat_pr_fecal_28_ap, ~ genotype:butyrate_levels)
ADD REPLYlink written 4 weeks ago by Kevin Blighe66k


Sorry I should have made it more clear: I repeat this process for each timepoint that has its own subset of the data- this code chunk was just to provide an example. In addition, here I'm not necessarily concerned about the genotype of the mice in this analysis, given we already know from the metabolite data that butyrate levels are different between strains and thus the taxa driving this difference should appear in a differential abundance analysis that looks only at butyrate levels. If I did, however, want to include genotype information in the model, what would be the difference between using ~ genotype:butyrate_levels and ~ genotype+butyrate_levels?

ADD REPLYlink written 4 weeks ago by greenm110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1520 users visited in the last hour