Last step of metagenome analysis before visualization
4 weeks ago
Ayda Ecem • 0

I am trying to make a metagenome analysis for plant species. Since qiime2 uses Silva database and that specific database is commonly used for bacteria I customized all of my codes. Rn I have app. 11k row taxon ids that I get from NCBI database , but Im a having trouble doing a taxonomy match with those taxonomy ids. I need to match the taxonomy and filter the plant species and plot a pie chart for those plant species. I am told that NCBI does not have an API to use it to get the taxonomy names.

How can I solve my problem? Also, my code can be found below:

import pandas as pd import bs4

from Bio import Entrez

Initialize the NCBI email account = "email_address"

def get_taxonomic_info(accession_number): """ Queries the NCBI database for taxonomic information of a given accession number.

- accession_number (str): The NCBI accession number.

- str: The taxonomic information as a string.
handle = Entrez.efetch(db="nuccore", id=accession_number, rettype="gb", retmode="text")
record =, "genbank")

# Extracting the taxonomic information
taxonomic_lineage = ""
for feature in record.features:
    if feature.type == "source":
        taxonomic_lineage = feature.qualifiers["db_xref"][0].split(":")[1]

return taxonomic_lineage

def main():

# Load the Excel file
df = pd.read_excel(r"file_path")

# Extract the accession numbers
accession_numbers = df.iloc[:, 1].tolist()  # Assuming the accession numbers are in the second column

# Prepare the output file
with open(r"output_path", "w") as outfile:
    for accession_number in accession_numbers:
        taxonomic_info = get_taxonomic_info(accession_number)

if __name__ == "__main__": main()

metagenome python analyis • 111 views

