So I've been doing some analysis using MetaPhlAn for 8 datasets and has merged them into a single merged file using the merge_metaphlan_tables.py script.
I want to do a Stacked barplot visualization using ggplot2. Could anyone here please advise me as to how to proceed? I'm fairly new to this field.
I did try the first few set of codes mentioned in the link below, but I'm confused when i reached the middle.
The ID column of the merged table includes multiple annotation levels. For example, the row where ID is k__Bacteria gives the proportion of reads for each sample that can be assigned to the kingdom Bacteria, and the row where ID is k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales gives the proportion of reads for each sample that can be assigned to the order Bifidobacteriales. Stacked barplots only make sense if you are considering a single taxonomic level. This is why the first step after data import is filtering rows by ID values that end with g__, which gives all genus-level annotations in your dataset. You can repeat this process to subset tables for any taxonomic level.
You've used MetaPhlAn to do you taxonomic profiling of your metagenomic data. If you ever decide to try Kraken instead, I made a shiny app to go directly from merged abundance tables to stacked barplots.
I tried doing the pre-processing part and received an error. Could you please tell what's wrong here! Copy pasting the output here:
Traceback (most recent call last):
File "/home/axv183/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'NCBI_tax_id'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "python2.py", line 9, in <module>
df_k_genus = df_k[df_k['NCBI_tax_id'].str.contains(r'\g__[^|]*$')]
File "/home/axv183/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 3506, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/axv183/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'NCBI_tax_id'
Hii, yes thank you for the clarification!
I tried doing the pre-processing part and received an error. Could you please tell what's wrong here! Copy pasting the output here:
Traceback (most recent call last): File "/home/axv183/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'NCBI_tax_id'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "python2.py", line 9, in <module> df_k_genus = df_k[df_k['NCBI_tax_id'].str.contains(r'\g__[^|]*$')] File "/home/axv183/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 3506, in __getitem__ indexer = self.columns.get_loc(key) File "/home/axv183/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc raise KeyError(key) from err KeyError: 'NCBI_tax_id'
The command i gave was:
df_k = pd.read_table("merged_output.txt", sep='\t') print(df_k.head())
df_k_genus = df_k[df_k['NCBI_tax_id'].str.contains(r'\g__[^|]*$')] df_k_genus.reset.index(drop=True,inplace=True)
print(df_k_genus.head())
Thanks