Entering edit mode
5 months ago
Luqman
•
0
I am trying to fetch protein sequences for a list of gene IDs using from BioMart API using the Python code:
def fetch_protein_sequences(gene_ids):
dataset = Dataset(name='taestivum_eg_gene', host='http://plants.ensembl.org')
# Fetch protein sequences
response = dataset.query(
attributes=[
'ensembl_gene_id',
'peptide'
],
filters={'gene_id': gene_ids}
)
return response
# Read gene IDs from file
with open('gene_ids.txt') as f:
gene_ids = [line.strip() for line in f]
# Fetch protein sequences
protein_sequences = fetch_protein_sequences(gene_ids)
Bu keep getting the error message:
BiomartException: Query ERROR: caught BioMart::Exception::Usage: WITHIN Virtual Schema : default, Dataset taestivum_eg_gene NOT FOUND
Verified the existence of data taestivum_eg_gene, attributes and filter using:
server = Server(host='http://plants.ensembl.org')
for dataset in server.marts['plants_mart'].datasets:
print(dataset)
Could someone please help me understand why I am seeing this error?
Thank you in advance!
The dataset you want (taestivum_eg_gene) exists within the "plants_mart" schema, not the default schema. Your current code assumes the default schema.
The most straightforward fix is to explicitly tell the Dataset object to use the "plants_mart" schema
I am using pybiomart which has Server inplace of BiomartServer, I used that as per above but still getting the same error. Also, when I am checking filters for the dataset it works and prints available filters, same for when checking attributes. Somehow the fetching function returns error.
Could share some examples for gene_ids?
Yes please see below: