METAPHLAN: Relative abundance to read count
1
1
Entering edit mode
2.9 years ago
dpc ▴ 240

Hi there!!! I have imported metaphlan output which contains relative abundance of taxa into phyloseq. Now, when I tried to do alpha diversity, it is reversing an error. It seems that it needs only read count not relative abundance. Is there any way (any code or something) I can convert the relative abundance into read count from and then import into phyloseq? Or is there any other way I can follow for the analysis within phyloseq?

Thanks dpc

metaphlan phyloseq • 4.5k views
1
Entering edit mode
2.9 years ago

Hi,

To calculate alpha-diversity within phyloseq you need a table of absolute abundance. The only way to reverse from relative to absolute abundances is if you have the total no. of reads per sample. I guess that you do not have this information. If you have you can multiply the relative abundance of each taxa/OTU/ASV by the total no. of reads per sample. This should yield a absolute abundance table (it may not, because relative abundances could have been rounded).

Still you can just calculate the basic alpha-diversity metric, the distinct no. of observed taxa/OTUs/ASVs/features per sample, that are those higher than zero. Though you can't do it in phyloseq, the R code to do it is quite simple. Let me know if you want to do this.

I believe that you can do beta-diversity analysis. You might need to transform to absolute counts by multiplying each taxa abundance by one million - counts per million: GP1 = transform_sample_counts(GP1, function(x) 1E6 * x/sum(x)). In your case since you have relative abundance, you just need: GP1 = transform_sample_counts(GP1, function(x) 1E6 * x) (have a look at phyloseq).

I hope this helps,

António

0
Entering edit mode

Hi Antonio !!!

1. Let me know if you want to do this.

Yes. I want to do that. please tell me.

1. Is it possible to do other statistical analyses e.g. wilcoxon ranksum test, spearman's rank correlation analysis, random forest regression etc with the relative abundance data (not read counts) in phyloseq? I just want to know this because if I proceed and after sometime I see it is not possible to analyse these things, it would be waste of time.

Thanks DC7

1
Entering edit mode

Hi @dpc,

I'll add the code tomorrow, because I need to write a few lines of code in order to make it work and right now I do not have the time to do it (sorry, if someone has, please help).

As far as I know is not possible to do any of the analyses that you mentioned using phyloseq functions. Though it is possible to do it inside R, but you don't need to have a phyloseq-class object. Please read the phyloseq tutorials, since they provide the type of analyses that you can do with phyloseq: mostly alpha- and beta-diversity analyses: https://joey711.github.io/phyloseq/

António

0
Entering edit mode

ok. No problem. I'll wait for your code. And, basically you are saying to use phyloseq only for diversity analysis and rest of the statistical analysis only inside R without phyloseq. Right?

Thanks for your constant help. I'm grateful. DC7

1
Entering edit mode

Hi @dpc,

So, I created a new function to calculate the no. of distinct OTUs/taxa/genes per sample, i.e., the sum of features that have a occurrence different from zero. This does the same as the phyloseq function, but in this case it works with relative abundance. Here is the function and a comparison with phyloseq function. The result is the same. You give as input the phyloseq object that you have to this new function.

## new function to determine no. of distinct OTUs/genes/features occurrence per sample:
# it returns a data frame and a ggplot2 plot
count_no_features <- function(physeq) {

# physeq: a phyloseq-class object

require("phyloseq")
require("ggplot2")

# check if taxa are rows; if not transpose
if ( taxa_are_rows(physeq) ) { #returns mtx
otu_tbl <- as( otu_table(physeq), "matrix" )
} else { # transpose; returns mtx
otu_tbl <- t( otu_table(physeq) )
}

# sum no. of distinct OTUs observed by sample (different than 0)
no_otus <- apply( otu_tbl, 2, function(x) sum(x != 0) )

# save the result into a df
no_otus_df <- data.frame("Samples" = names(no_otus),
"Type_alpha" = rep("Observed", length(no_otus)),
"Alpha_div" = no_otus,
row.names = names(no_otus))
colnames(no_otus_df) <- c("Samples", "Type of alpha-diversity", "Alpha-diversity measure") # change column names

## plot results
plot_rich <- ggplot(data = no_otus_df, aes(x = Samples, y = Alpha-diversity measure)) +
geom_point() +
facet_wrap(~ Type of alpha-diversity) +
theme(axis.text.x = element_text(angle = 90, vjust = .5, hjust = 1))

## return a list
list_results <- list("data" = no_otus_df,
"plot" = plot_rich)

return(list_results)
}

## Testing:

## phyloseq
no_of_otus_phyloseq <- phyloseq::estimate_richness(physeq = GlobalPatterns, measures = c("Observed"))

## built-in function
no_of_otus_new_fun <- count_no_features(physeq = GlobalPatterns)

## Cecking if it returns the same values for all samples:
all(no_of_otus_phyloseq$Observed == no_of_otus_new_fun$data$Alpha-diversity measure) # returns TRUE  The function requires the R packages phyloseq and ggplot2 and it returns a list: (1) with one data frame with the observed no. of different taxa/OTUs per sample, and (2) a ggplot2 similar to phyloseq. To access to the data (using the example above): no_of_otus_new_fun$data, to plot the plot: no_of_otus_new_fun\$plot

I hope this helps,

António

0
Entering edit mode

Many many thanks Antonio. I will check this and let you know how it works... dpc

0
Entering edit mode

Hi Antonio!!! I have just run your script with my data and it seems it works really good except that the phyloseq is not running in the richness estimation step. And thus "##Testing:" part did not run. Here's the command and error:

> no_of_otus_phyloseq <- phyloseq::estimate_richness(physeq, measures = c("Observed"))
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 't': function accepts only integers (counts)
In phyloseq::estimate_richness(physeq, measures = c("Observed")) :
The data you have provided does not have
any singletons. This is highly suspicious. Results of richness
estimates (for example) are probably unreliable, or wrong, if you have already
trimmed low-abundance taxa from the data.

We recommended that you find the un-trimmed data and retry.

0
Entering edit mode

What is the physeq file? Is your relative abundance physeq file? If it's of course that does not work, because this function is from phyloseq. In the testing what I did was to test a physeq object using the phyloseq function estimate_richness() (that only works with absolute counts) with the function that I wrote, just to see if my function would yield the same result. But in the testing I've used a absolute abundance phyloseq data (that you can import, because it comes with the phyloseq package, by running the code - data(GlobalPatterns) - I don't remember if GlobalPatterns needs quotes or not - try with and witouth).

António

0
Entering edit mode

Here Physeq is phyloseq object that has been formed from imported relative abundance file. Thus, it contains only relative abundance information. I think that's why it's not running. However, as it seems your script works fine with relative abundance value there is no need of thinking about phyloseq for this purpose. Also, if you please don't mind, I will like to draw your attention for this question.

Many thanks, dpc

0
Entering edit mode

Hi @dpc,

I received a message at this post from you, but I can't find it.

The message was:

Sir, can we measure "shannon" diversity index from here and plot it?

To calculate Shannon diversity you need absolute counts.

António

0
Entering edit mode

Yes. I got the answer. Actually I asked if we can calculate the Shannon diversity index from this relative abundance data. But eventually I realised that for that, we need read count data. So, your script calculates the "observed" alpha diversity. Right?

1
Entering edit mode

Exactly, my script determines the "observed" alpha-diversity, i.e., the distinct number of taxa/OTUs/ASVs/features that you have in each sample. That can be done with absolute or relative count data. Shannon only with absolute counts.

António

0
Entering edit mode

Hi Antonio, @antonioggsousa I have my total read count but have a slightly different question. Would the total read count here mean only R1 values or a merged R1 R2 file values?

Basically would we consider the paired end value or from a single file?

0
Entering edit mode

Hi @pdhrati02,

It depends on what you gave as input to metaphlan. As far as I know the metaphlan software does not support paired-end mode as input, meaning that if you have paired-end fastq files, you just provide your forward or you concatenate your forward and reverse.

If you merge the forward and reverse, and by merging I mean the 1st read on the forward and the 1st read in the reverse will be merged in a single read/sequence and so on along the file depending if the match or not, and provide that as input the absolute counts refers to the number of sequences merged. If you provide only the forward, the absolute counts refer to that, if you concatenate forward and reverse and use that, the number of reads refer to the fragments of forward plus reverse.

I hope this helps, António