Question: Importing MetaPhlAn3 profile table into phyloseq to use decontam
0
gravatar for plicht
5 months ago by
plicht0
plicht0 wrote:

Hi there,

I am new to R and would like to import the taxonomy profile table of MetaPhlAn3 into the R package phyloseq to make use of the package decontam.

Therefore I merged several metaphlan analyses with the metaphlan internal command "merge_table". Then I imported the data into R using the read.table command:

merged_metaphlan <- read.table("/media/sf_projects/microbiome/Analysis_of_microbiome/WiP/KneadData/firsttry/Validation_Samples_PL018/PL0183103_5/subsamples/visualization/merged_subsamples_samples_1,2,5.txt", header = TRUE)

After that, I wanted to assign this to an otu_table and consequently load this into the phyloseq-class object:

otu_table(merged_metaphlan, taxa_are_rows = TRUE)

But it seems that phyloseq expects a matrix. Since the MetaPhlAn tabel comes with characters (the clade names and the according relative abundances), I receive the following error:

Error in validObject(.Object) : invalid class “otu_table” object: 
 Non-numeric matrix provided as OTU table.
Abundance is expected to be numeric.

Is there a way to directly import MetaPhlAn tables into phyloseq? Or do I need a work around, and if so, how can I do it?

R software error • 430 views
ADD COMMENTlink modified 4 months ago • written 5 months ago by plicht0

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Using software error tag is pretty meaning less. MetaPhlAn3 would be a more useful tag to add.

ADD REPLYlink written 5 months ago by GenoMax94k

metaphlanToPhyloseq.R

refer to Import the metaphlan data to phyloseq section

ADD REPLYlink modified 5 months ago • written 5 months ago by cpad011214k

Hi, did you manage to figure this out? I'm trying to do the same thing and I'm getting this error!

ADD REPLYlink written 5 months ago by c.e.chong40

Please add some demo/ example data to your post. @ c.e.chong

ADD REPLYlink written 5 months ago by cpad011214k

I have run metaphlan3 and have a merged abundance output file that looks like this :

clade_name  healthy_mphlan  dandruff_mphlan dandruffhealthy_mphlan
k__Bacteria 91.40268    71.86512    89.7509
k__Bacteria|p__Actinobacteria   86.36566    49.51296    77.30806
k__Bacteria|p__Actinobacteria|c__Actinobacteria 86.36566    49.51296    77.30806
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales  0.1044  0.11737 0.62909
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae  0.1044  0.11737 0.62909
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinobaculum 0.01359 0.03944 0.09785

I want to input this into phyloseq. I used the command

metaphlan <- read.table("statemerged_abundance_table_reformatted.txt", header = TRUE)
otu_table(metaphlan, taxa_are_rows = TRUE)

This gave me the error:

Error in validObject(.Object) : invalid class "otu_table" object:  Non-numeric matrix provided as OTU table. Abundance is expected to be numeric.

I want to get this data into phyloseq so I can analyse it with deseq2 afterwards. I'm not sure how to create phyloseq objects from the metaphlan table. Do you have any expertise in this?

Thanks in advance!

ADD REPLYlink modified 5 months ago • written 5 months ago by c.e.chong40
1
library(phyloseq)
df=read.csv("test.txt", sep="\t", strip.white = T, stringsAsFactors = F, row.names = 1)

test

copy/pasted from https://github.com/wipperman/wipperman/blob/master/R/microbiota.R:
##########################################################################################
> metaphlanToPhyloseq <- function(
    tax,
    metadat=NULL,
    simplenames=TRUE,
    roundtointeger=FALSE,
    split="|"){
    ## tax is a matrix or data.frame with the table of taxonomic abundances, rows are taxa, columns are samples
    ## metadat is an optional data.frame of specimen metadata, rows are samples, columns are variables
    ## if simplenames=TRUE, use only the most detailed level of taxa names in the final object
    ## if roundtointeger=TRUE, values will be rounded to the nearest integer
    xnames = rownames(tax)
    shortnames = gsub(paste0(".+\\", split), "", xnames)
    if(simplenames){
        rownames(tax) = shortnames
    }
    if(roundtointeger){
        tax = round(tax * 1e4)
    }
    x2 = strsplit(xnames, split=split, fixed=TRUE)
    taxmat = matrix(NA, ncol=max(sapply(x2, length)), nrow=length(x2))
    colnames(taxmat) = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species", "Strain")[1:ncol(taxmat)]
    rownames(taxmat) = rownames(tax)
    for (i in 1:nrow(taxmat)){
        taxmat[i, 1:length(x2[[i]])] <- x2[[i]]
    }
    taxmat = gsub("[a-z]__", "", taxmat)
    taxmat = phyloseq::tax_table(taxmat)
    otutab = phyloseq::otu_table(tax, taxa_are_rows=TRUE)
    if(is.null(metadat)){
        res = phyloseq::phyloseq(taxmat, otutab)
    }else{
        res = phyloseq::phyloseq(taxmat, otutab, phyloseq::sample_data(metadat))
    }
    return(res)
}
##########################################################
> metaphlanToPhyloseq(df)
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 6 taxa and 3 samples ]
tax_table()   Taxonomy Table:    [ 6 taxa by 6 taxonomic ranks ]
ADD REPLYlink modified 5 months ago • written 5 months ago by cpad011214k

Thank you so much for your help. I have managed to recreate what you did with this function. The function on wipperman GitHub however does not create a sample_data() as well as the tax table and out table which I need. The function from the Waldron lab that you linked first on this post does, but I cannot get this to work. Do you have any experience with this function? I have put my issues on this post (https://www.biostars.org/p/456397/#456575).

I'm very grateful for your help!

ADD REPLYlink written 5 months ago by c.e.chong40

Unless input/example files and expected out put are added to the post, it is difficult to address the post @ c.e.chong

ADD REPLYlink modified 5 months ago • written 5 months ago by cpad011214k

Hey c. e. chong,

sorry for getting back so lately. Did you manage to get MetaPhlAn into Phyoloseq? Do you use total read counts (-t rel_ab_with_read_counts) or du you use relative abundances (-t rel_ab) when using the calculations in that package? Do you also plan to make use of Decontam?

I guess we are working on quite the same topics, so maybe we should join forces here?

Best Philipp

ADD REPLYlink written 4 months ago by plicht0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1539 users visited in the last hour
_