Hi there!!! I have imported metaphlan output which contains relative abundance of taxa into phyloseq. Now, when I tried to do alpha diversity, it is reversing an error. It seems that it needs only read count not relative abundance. Is there any way (any code or something) I can convert the relative abundance into read count from and then import into phyloseq? Or is there any other way I can follow for the analysis within phyloseq?
Hi Antonio !!!
Yes. I want to do that. please tell me.
Is it possible to do other statistical analyses e.g. wilcoxon ranksum test, spearman's rank correlation analysis, random forest regression etc with the relative abundance data (not read counts) in phyloseq? I just want to know this because if I proceed and after sometime I see it is not possible to analyse these things, it would be waste of time.
I'll add the code tomorrow, because I need to write a few lines of code in order to make it work and right now I do not have the time to do it (sorry, if someone has, please help).
As far as I know is not possible to do any of the analyses that you mentioned using
phyloseqfunctions. Though it is possible to do it inside R, but you don't need to have a phyloseq-class object. Please read the phyloseq tutorials, since they provide the type of analyses that you can do with phyloseq: mostly alpha- and beta-diversity analyses: https://joey711.github.io/phyloseq/
ok. No problem. I'll wait for your code. And, basically you are saying to use phyloseq only for diversity analysis and rest of the statistical analysis only inside R without phyloseq. Right?
Thanks for your constant help. I'm grateful. DC7
So, I created a new function to calculate the no. of distinct OTUs/taxa/genes per sample, i.e., the sum of features that have a occurrence different from zero. This does the same as the phyloseq function, but in this case it works with relative abundance. Here is the function and a comparison with phyloseq function. The result is the same. You give as input the phyloseq object that you have to this new function.
The function requires the R packages phyloseq and ggplot2 and it returns a list: (1) with one data frame with the observed no. of different taxa/OTUs per sample, and (2) a ggplot2 similar to phyloseq. To access to the data (using the example above):
no_of_otus_new_fun$data, to plot the plot:
I hope this helps,
Many many thanks Antonio. I will check this and let you know how it works... dpc
Hi Antonio!!! I have just run your script with my data and it seems it works really good except that the phyloseq is not running in the richness estimation step. And thus
"##Testing:"part did not run. Here's the command and error:
What is the
physeqfile? Is your relative abundance
physeqfile? If it's of course that does not work, because this function is from phyloseq. In the testing what I did was to test a physeq object using the phyloseq function
estimate_richness()(that only works with absolute counts) with the function that I wrote, just to see if my function would yield the same result. But in the testing I've used a absolute abundance phyloseq data (that you can import, because it comes with the phyloseq package, by running the code -
data(GlobalPatterns)- I don't remember if GlobalPatterns needs quotes or not - try with and witouth).
Physeqis phyloseq object that has been formed from imported relative abundance file. Thus, it contains only relative abundance information. I think that's why it's not running. However, as it seems your script works fine with relative abundance value there is no need of thinking about phyloseq for this purpose. Also, if you please don't mind, I will like to draw your attention for this question.
Many thanks, dpc
I received a message at this post from you, but I can't find it.
The message was:
To calculate Shannon diversity you need absolute counts.
Yes. I got the answer. Actually I asked if we can calculate the Shannon diversity index from this relative abundance data. But eventually I realised that for that, we need read count data. So, your script calculates the "observed" alpha diversity. Right?
Exactly, my script determines the "observed" alpha-diversity, i.e., the distinct number of taxa/OTUs/ASVs/features that you have in each sample. That can be done with absolute or relative count data. Shannon only with absolute counts.
Hi Antonio, @antonioggsousa I have my total read count but have a slightly different question. Would the total read count here mean only R1 values or a merged R1 R2 file values?
Basically would we consider the paired end value or from a single file?
It depends on what you gave as input to metaphlan. As far as I know the metaphlan software does not support paired-end mode as input, meaning that if you have paired-end fastq files, you just provide your forward or you concatenate your forward and reverse.
If you merge the forward and reverse, and by merging I mean the 1st read on the forward and the 1st read in the reverse will be merged in a single read/sequence and so on along the file depending if the match or not, and provide that as input the absolute counts refers to the number of sequences merged. If you provide only the forward, the absolute counts refer to that, if you concatenate forward and reverse and use that, the number of reads refer to the fragments of forward plus reverse.
I hope this helps, António