Entering edit mode
9 months ago
mirwa.zidi93
•
0
Hi,
I have an Excel table with gene symbols listed under the column "Symbol" (e.g., AKR1A1, adh) and their corresponding functions listed under the column "Function" (e.g., alcohol dehydrogenase (NADP+) [EC:1.1.1.2]). These symbols represent genes across various organisms. I would like to utilize the KEGG API to retrieve the cellular communities associated with these gene symbols and incorporate the results back into the Excel table. but I encounter this error
IndexError Traceback (most recent call last)
<ipython-input-17-1bf6ed03288a> in <cell line: 43>()
41
42 # Add a new column for cellular community information
---> 43 df['Cellular Community'] = df['Symbol'].apply(get_cellular_community)
44
45 # Save the updated DataFrame to a new Excel file
4 frames
<ipython-input-17-1bf6ed03288a> in get_cellular_community(symbol)
24 if result:
25 if len(result) > 0: # Check if the result list is not empty
---> 26 kegg_id = result[0].split(':')[1]
27 # Get the pathways associated with the gene from KEGG
28 pathways = k.get_pathway_by_gene(kegg_id)
IndexError: list index out of range
and here is the script I used
import pandas as pd
from bioservices import KEGG
# Initialize the KEGG object
k = KEGG()
# Read the Excel file
try:
df = pd.read_excel('/content/sample_data/function 1-5.xlsx')
except FileNotFoundError:
print("Error: Excel file not found.")
exit()
# Check if the 'Symbol' column exists in the DataFrame
if 'Symbol' not in df.columns:
print("Error: 'Symbol' column not found in the Excel file.")
exit()
# Function to get cellular community information for a gene symbol
def get_cellular_community(symbol):
cellular_community = ""
# Search for the gene symbol in KEGG
result = k.find('genes', symbol)
if result:
if len(result) > 0: # Check if the result list is not empty
kegg_id = result[0].split(':')[1]
# Get the pathways associated with the gene from KEGG
pathways = k.get_pathway_by_gene(kegg_id)
# Extract cellular community information from pathways
for pathway_id, pathway_info in pathways.items():
if 'Categories' in pathway_info:
categories = pathway_info['Categories']
if 'Cellular community - eukaryotes' in categories:
cellular_community = 'eukaryotes'
break
elif 'Cellular community - prokaryotes' in categories:
cellular_community = 'prokaryotes'
break
return cellular_community
# Add a new column for cellular community information
df['Cellular Community'] = df['Symbol'].apply(get_cellular_community)
# Save the updated DataFrame to a new Excel file
output_file = 'output_with_cellular_community.xlsx'
df.to_excel(output_file, index=False)
print(f"Updated Excel file saved as: {output_file}")
It might be because the gene symbol is not found in the KEGG database. Add some extra error handling before parsing the results or do a manual check