Question: METAPHLAN: Relative abundance to read count
1
dpc140 wrote:

Hi there!!! I have imported metaphlan output which contains relative abundance of taxa into phyloseq. Now, when I tried to do alpha diversity, it is reversing an error. It seems that it needs only read count not relative abundance. Is there any way (any code or something) I can convert the relative abundance into read count from and then import into phyloseq? Or is there any other way I can follow for the analysis within phyloseq?

Thanks dpc

phyloseq metaphlan • 357 views
modified 8 weeks ago by antonioggsousa1.3k • written 8 weeks ago by dpc140
1
antonioggsousa1.3k wrote:

Hi,

To calculate alpha-diversity within `phyloseq` you need a table of absolute abundance. The only way to reverse from relative to absolute abundances is if you have the total no. of reads per sample. I guess that you do not have this information. If you have you can multiply the relative abundance of each taxa/OTU/ASV by the total no. of reads per sample. This should yield a absolute abundance table (it may not, because relative abundances could have been rounded).

Still you can just calculate the basic alpha-diversity metric, the distinct no. of observed taxa/OTUs/ASVs/features per sample, that are those higher than zero. Though you can't do it in `phyloseq`, the R code to do it is quite simple. Let me know if you want to do this.

I believe that you can do beta-diversity analysis. You might need to transform to absolute counts by multiplying each taxa abundance by one million - counts per million: `GP1 = transform_sample_counts(GP1, function(x) 1E6 * x/sum(x))`. In your case since you have relative abundance, you just need: `GP1 = transform_sample_counts(GP1, function(x) 1E6 * x)` (have a look at phyloseq).

I hope this helps,

António

Hi Antonio !!!

1. Let me know if you want to do this.

Yes. I want to do that. please tell me.

1. Is it possible to do other statistical analyses e.g. wilcoxon ranksum test, spearman's rank correlation analysis, random forest regression etc with the relative abundance data (not read counts) in phyloseq? I just want to know this because if I proceed and after sometime I see it is not possible to analyse these things, it would be waste of time.

Thanks DC7

1

Hi @dpc,

I'll add the code tomorrow, because I need to write a few lines of code in order to make it work and right now I do not have the time to do it (sorry, if someone has, please help).

As far as I know is not possible to do any of the analyses that you mentioned using `phyloseq` functions. Though it is possible to do it inside R, but you don't need to have a phyloseq-class object. Please read the phyloseq tutorials, since they provide the type of analyses that you can do with phyloseq: mostly alpha- and beta-diversity analyses: https://joey711.github.io/phyloseq/

António

ok. No problem. I'll wait for your code. And, basically you are saying to use phyloseq only for diversity analysis and rest of the statistical analysis only inside R without phyloseq. Right?

Thanks for your constant help. I'm grateful. DC7

1

Hi @dpc,

So, I created a new function to calculate the no. of distinct OTUs/taxa/genes per sample, i.e., the sum of features that have a occurrence different from zero. This does the same as the phyloseq function, but in this case it works with relative abundance. Here is the function and a comparison with phyloseq function. The result is the same. You give as input the phyloseq object that you have to this new function.

``````## new function to determine no. of distinct OTUs/genes/features occurrence per sample:
# it returns a data frame and a ggplot2 plot
count_no_features <- function(physeq) {

# physeq: a phyloseq-class object

require("phyloseq")
require("ggplot2")

# check if taxa are rows; if not transpose
if ( taxa_are_rows(physeq) ) { #returns mtx
otu_tbl <- as( otu_table(physeq), "matrix" )
} else { # transpose; returns mtx
otu_tbl <- t( otu_table(physeq) )
}

# sum no. of distinct OTUs observed by sample (different than 0)
no_otus <- apply( otu_tbl, 2, function(x) sum(x != 0) )

# save the result into a df
no_otus_df <- data.frame("Samples" = names(no_otus),
"Type_alpha" = rep("Observed", length(no_otus)),
"Alpha_div" = no_otus,
row.names = names(no_otus))
colnames(no_otus_df) <- c("Samples", "Type of alpha-diversity", "Alpha-diversity measure") # change column names

## plot results
plot_rich <- ggplot(data = no_otus_df, aes(x = Samples, y = `Alpha-diversity measure`)) +
geom_point() +
facet_wrap(~ `Type of alpha-diversity`) +
theme(axis.text.x = element_text(angle = 90, vjust = .5, hjust = 1))

## return a list
list_results <- list("data" = no_otus_df,
"plot" = plot_rich)

return(list_results)
}

## Testing:

## phyloseq
no_of_otus_phyloseq <- phyloseq::estimate_richness(physeq = GlobalPatterns, measures = c("Observed"))

## built-in function
no_of_otus_new_fun <- count_no_features(physeq = GlobalPatterns)

## Cecking if it returns the same values for all samples:
all(no_of_otus_phyloseq\$Observed == no_of_otus_new_fun\$data\$`Alpha-diversity measure`) # returns TRUE
``````

The function requires the R packages phyloseq and ggplot2 and it returns a list: (1) with one data frame with the observed no. of different taxa/OTUs per sample, and (2) a ggplot2 similar to phyloseq. To access to the data (using the example above): `no_of_otus_new_fun\$data`, to plot the plot: `no_of_otus_new_fun\$plot`

I hope this helps,

António

Many many thanks Antonio. I will check this and let you know how it works... dpc

Hi Antonio!!! I have just run your script with my data and it seems it works really good except that the phyloseq is not running in the richness estimation step. And thus `"##Testing:"` part did not run. Here's the command and error:

``````> no_of_otus_phyloseq <- phyloseq::estimate_richness(physeq, measures = c("Observed"))
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 't': function accepts only integers (counts)
In phyloseq::estimate_richness(physeq, measures = c("Observed")) :
The data you have provided does not have
any singletons. This is highly suspicious. Results of richness
estimates (for example) are probably unreliable, or wrong, if you have already
trimmed low-abundance taxa from the data.

We recommended that you find the un-trimmed data and retry.
``````

What is the `physeq` file? Is your relative abundance `physeq` file? If it's of course that does not work, because this function is from phyloseq. In the testing what I did was to test a physeq object using the phyloseq function `estimate_richness()` (that only works with absolute counts) with the function that I wrote, just to see if my function would yield the same result. But in the testing I've used a absolute abundance phyloseq data (that you can import, because it comes with the phyloseq package, by running the code - `data(GlobalPatterns)` - I don't remember if GlobalPatterns needs quotes or not - try with and witouth).

António

Here `Physeq` is phyloseq object that has been formed from imported relative abundance file. Thus, it contains only relative abundance information. I think that's why it's not running. However, as it seems your script works fine with relative abundance value there is no need of thinking about phyloseq for this purpose. Also, if you please don't mind, I will like to draw your attention for this question.

Many thanks, dpc

Hi @dpc,

I received a message at this post from you, but I can't find it.

The message was:

Sir, can we measure "shannon" diversity index from here and plot it?

To calculate Shannon diversity you need absolute counts.

António

Yes. I got the answer. Actually I asked if we can calculate the Shannon diversity index from this relative abundance data. But eventually I realised that for that, we need read count data. So, your script calculates the "observed" alpha diversity. Right?

1

Exactly, my script determines the "observed" alpha-diversity, i.e., the distinct number of taxa/OTUs/ASVs/features that you have in each sample. That can be done with absolute or relative count data. Shannon only with absolute counts.

António