Issues removing unwanted taxa from phyloseq object
1
0
Entering edit mode
11 weeks ago

Hi everyone,

I am currently in the process of removing unwanted taxa (Kingdom="Eukaryota", Family="Mitochondria", and Order="Chloroplast") from a phyloseq object I created. This phyloseq object was created using my outputs from DADA2 (OTU table, taxonomy table, and metadata file).

I have saved a taxonomy table in CSV format at every step where I create a new phyloseq object (i.e. removing (1) eukaryotes, (2) mitochondria, and (3) chloroplast -- all unwanted taxa; so 3 new taxonomy tables created) to verify for myself that the correct amount of ASVs was being removed. When I removed the eukaryotes (first step) and printed out the new phyloseq object, the total # of taxa dropped from 2551 to 2506 (2551 - 2506 = 45 ASVs removed). Alternatively, when I use the "ctrl + f" feature in Excel to identify the amount of cells with the value "Eukaryota" in the taxonomy table, only 40 cells/taxa are found. Likewise, when I create a phyloseq object that only contains ASVs that are eukaryotes, only 40 are found.

Why are 45 ASVs removed when only apparently 40 should be? Am I missing something?

I'm very new to this platform, so I've attached my code and outputs (indicated via # sign) below, but do let me know if I can supply something else. I am using R (4.1.1 - Kick Things) and phloseq (version 1.38.0).

original phyloseq object (ps):

ps <- phyloseq(otu_table(st, taxa_are_rows=FALSE), sample_data(samdf), tax_table(taxtab))
ps

# phyloseq-class experiment-level object
# otu_table()   OTU Table:         [ 2551 taxa and 95 samples ]
# sample_data() Sample Data:       [ 95 samples by 5 sample variables ]
# tax_table()   Taxonomy Table:    [ 2551 taxa by 6 taxonomic ranks ]
# refseq()      DNAStringSet:      [ 2551 reference sequences ]

removing eukaryotes (ps.euk):

ps.euk <- subset_taxa(ps, Kingdom !="Eukaryota")
ps.euk

# phyloseq-class experiment-level object
# otu_table()   OTU Table:         [ 2506 taxa and 95 samples ]
# sample_data() Sample Data:       [ 95 samples by 5 sample variables ]
# tax_table()   Taxonomy Table:    [ 2506 taxa by 6 taxonomic ranks ]
# refseq()      DNAStringSet:      [ 2506 reference sequences ]

keeping only eukaryotes (ps.euk2):

ps.euk2 <- subset_taxa(ps, Kingdom == "Eukaryota")
ps.euk2

# phyloseq-class experiment-level object
# otu_table()   OTU Table:         [ 40 taxa and 95 samples ]
# sample_data() Sample Data:       [ 95 samples by 5 sample variables ]
# tax_table()   Taxonomy Table:    [ 40 taxa by 6 taxonomic ranks ]
# refseq()      DNAStringSet:      [ 40 reference sequences ]

image from Excel showing that only 40 cells were found:

Screenshot from Excel that shows when looking for cells that contain the value "Eukaryota", only 40 are found.

Any help would be so appreciate; thanks

-H

ASV phyloseq taxonomy dada2 • 291 views
ADD COMMENT
0
Entering edit mode
12 days ago
Christina • 0

This is indeed really strange. Maybe you should submit it as an issue to the phyloseq github page, it is far more likely that you will get a reply.

ADD COMMENT

Login before adding your answer.

Traffic: 1335 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6