Question: METAPHLAN: Relative abundance to read count
gravatar for dpc
6 months ago by
dpc150 wrote:

Hi there!!! I have imported metaphlan output which contains relative abundance of taxa into phyloseq. Now, when I tried to do alpha diversity, it is reversing an error. It seems that it needs only read count not relative abundance. Is there any way (any code or something) I can convert the relative abundance into read count from and then import into phyloseq? Or is there any other way I can follow for the analysis within phyloseq?

Thanks dpc

phyloseq metaphlan • 781 views
ADD COMMENTlink modified 6 months ago by antonioggsousa1.9k • written 6 months ago by dpc150
gravatar for antonioggsousa
6 months ago by
antonioggsousa1.9k wrote:


To calculate alpha-diversity within phyloseq you need a table of absolute abundance. The only way to reverse from relative to absolute abundances is if you have the total no. of reads per sample. I guess that you do not have this information. If you have you can multiply the relative abundance of each taxa/OTU/ASV by the total no. of reads per sample. This should yield a absolute abundance table (it may not, because relative abundances could have been rounded).

Still you can just calculate the basic alpha-diversity metric, the distinct no. of observed taxa/OTUs/ASVs/features per sample, that are those higher than zero. Though you can't do it in phyloseq, the R code to do it is quite simple. Let me know if you want to do this.

I believe that you can do beta-diversity analysis. You might need to transform to absolute counts by multiplying each taxa abundance by one million - counts per million: GP1 = transform_sample_counts(GP1, function(x) 1E6 * x/sum(x)). In your case since you have relative abundance, you just need: GP1 = transform_sample_counts(GP1, function(x) 1E6 * x) (have a look at phyloseq).

I hope this helps,


ADD COMMENTlink written 6 months ago by antonioggsousa1.9k

Hi Antonio !!!

  1. Let me know if you want to do this.

Yes. I want to do that. please tell me.

  1. Is it possible to do other statistical analyses e.g. wilcoxon ranksum test, spearman's rank correlation analysis, random forest regression etc with the relative abundance data (not read counts) in phyloseq? I just want to know this because if I proceed and after sometime I see it is not possible to analyse these things, it would be waste of time.

Thanks DC7

ADD REPLYlink modified 6 months ago • written 6 months ago by dpc150

Hi @dpc,

I'll add the code tomorrow, because I need to write a few lines of code in order to make it work and right now I do not have the time to do it (sorry, if someone has, please help).

As far as I know is not possible to do any of the analyses that you mentioned using phyloseq functions. Though it is possible to do it inside R, but you don't need to have a phyloseq-class object. Please read the phyloseq tutorials, since they provide the type of analyses that you can do with phyloseq: mostly alpha- and beta-diversity analyses:


ADD REPLYlink written 6 months ago by antonioggsousa1.9k

ok. No problem. I'll wait for your code. And, basically you are saying to use phyloseq only for diversity analysis and rest of the statistical analysis only inside R without phyloseq. Right?

Thanks for your constant help. I'm grateful. DC7

ADD REPLYlink written 6 months ago by dpc150

Hi @dpc,

So, I created a new function to calculate the no. of distinct OTUs/taxa/genes per sample, i.e., the sum of features that have a occurrence different from zero. This does the same as the phyloseq function, but in this case it works with relative abundance. Here is the function and a comparison with phyloseq function. The result is the same. You give as input the phyloseq object that you have to this new function.

## new function to determine no. of distinct OTUs/genes/features occurrence per sample: 
# it returns a data frame and a ggplot2 plot
count_no_features <- function(physeq) {

  # physeq: a phyloseq-class object


  # check if taxa are rows; if not transpose 
  if ( taxa_are_rows(physeq) ) { #returns mtx
    otu_tbl <- as( otu_table(physeq), "matrix" )
  } else { # transpose; returns mtx
    otu_tbl <- t( otu_table(physeq) ) 

  # sum no. of distinct OTUs observed by sample (different than 0)
  no_otus <- apply( otu_tbl, 2, function(x) sum(x != 0) )

  # save the result into a df
  no_otus_df <- data.frame("Samples" = names(no_otus),
                           "Type_alpha" = rep("Observed", length(no_otus)),
                           "Alpha_div" = no_otus, 
                           row.names = names(no_otus))
  colnames(no_otus_df) <- c("Samples", "Type of alpha-diversity", "Alpha-diversity measure") # change column names

  ## plot results
  plot_rich <- ggplot(data = no_otus_df, aes(x = Samples, y = `Alpha-diversity measure`)) + 
    geom_point() + 
    facet_wrap(~ `Type of alpha-diversity`) +
    theme(axis.text.x = element_text(angle = 90, vjust = .5, hjust = 1)) 

  ## return a list 
  list_results <- list("data" = no_otus_df, 
                       "plot" = plot_rich)


## Testing:

## phyloseq
no_of_otus_phyloseq <- phyloseq::estimate_richness(physeq = GlobalPatterns, measures = c("Observed"))

## built-in function
no_of_otus_new_fun <- count_no_features(physeq = GlobalPatterns)

## Cecking if it returns the same values for all samples: 
all(no_of_otus_phyloseq$Observed == no_of_otus_new_fun$data$`Alpha-diversity measure`) # returns TRUE

The function requires the R packages phyloseq and ggplot2 and it returns a list: (1) with one data frame with the observed no. of different taxa/OTUs per sample, and (2) a ggplot2 similar to phyloseq. To access to the data (using the example above): no_of_otus_new_fun$data, to plot the plot: no_of_otus_new_fun$plot

I hope this helps,


ADD REPLYlink modified 6 months ago • written 6 months ago by antonioggsousa1.9k

Many many thanks Antonio. I will check this and let you know how it works... dpc

ADD REPLYlink written 6 months ago by dpc150

Hi Antonio!!! I have just run your script with my data and it seems it works really good except that the phyloseq is not running in the richness estimation step. And thus "##Testing:" part did not run. Here's the command and error:

> no_of_otus_phyloseq <- phyloseq::estimate_richness(physeq, measures = c("Observed"))
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 't': function accepts only integers (counts)
In addition: Warning message:
In phyloseq::estimate_richness(physeq, measures = c("Observed")) :
  The data you have provided does not have
any singletons. This is highly suspicious. Results of richness
estimates (for example) are probably unreliable, or wrong, if you have already
trimmed low-abundance taxa from the data.

We recommended that you find the un-trimmed data and retry.
ADD REPLYlink written 6 months ago by dpc150

What is the physeq file? Is your relative abundance physeq file? If it's of course that does not work, because this function is from phyloseq. In the testing what I did was to test a physeq object using the phyloseq function estimate_richness() (that only works with absolute counts) with the function that I wrote, just to see if my function would yield the same result. But in the testing I've used a absolute abundance phyloseq data (that you can import, because it comes with the phyloseq package, by running the code - data(GlobalPatterns) - I don't remember if GlobalPatterns needs quotes or not - try with and witouth).


ADD REPLYlink modified 6 months ago • written 6 months ago by antonioggsousa1.9k

Here Physeq is phyloseq object that has been formed from imported relative abundance file. Thus, it contains only relative abundance information. I think that's why it's not running. However, as it seems your script works fine with relative abundance value there is no need of thinking about phyloseq for this purpose. Also, if you please don't mind, I will like to draw your attention for this question.

Many thanks, dpc

ADD REPLYlink modified 6 months ago • written 6 months ago by dpc150

Hi @dpc,

I received a message at this post from you, but I can't find it.

The message was:

Sir, can we measure "shannon" diversity index from here and plot it?

To calculate Shannon diversity you need absolute counts.


ADD REPLYlink written 5 months ago by antonioggsousa1.9k

Yes. I got the answer. Actually I asked if we can calculate the Shannon diversity index from this relative abundance data. But eventually I realised that for that, we need read count data. So, your script calculates the "observed" alpha diversity. Right?

ADD REPLYlink written 5 months ago by dpc150

Exactly, my script determines the "observed" alpha-diversity, i.e., the distinct number of taxa/OTUs/ASVs/features that you have in each sample. That can be done with absolute or relative count data. Shannon only with absolute counts.


ADD REPLYlink written 5 months ago by antonioggsousa1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1001 users visited in the last hour