I am using this Python script to fetch human protein sequences, but it's not fetching any sequence:
import requests
def fetch_protein_sequence_from_uniprot(protein_name):
#Declaring UniProt API
uniport_api_url = f"https://www.uniprot.org/uniprot/?query={protein_name}&format=fasta&organism:9606"
response = requests.get(uniport_api_url)
# Parse the response to extract sequence
sequence = ""
if response.ok:
lines = response.text.split("\n")
for line in lines:
if not line.startswith(">"): # Exclude header lines
sequence += line
return sequence
# Read protein names from file into a list
with open("prot.txt", "r") as file:
protein_names = file.read().splitlines()
# Example usage:
protein_names = ["BRCA1", "TP53"] # Replace with your list of protein names
for name in protein_names:
sequence = fetch_protein_sequence_from_uniprot(name)
print(f"Protein Name: {name}")
print(f"Human Protein Sequence: {sequence}\n")
Is there any problem with the API URL?
what kind of uniprot id are you using ?
I am using just gene names to fetch the sequences
so it just won't work. Look at:
https://www.uniprot.org/uniprot/?query=KCNH2&format=fasta&organism:9606
read the API doc.
Get all human proteins as a single file: https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/Eukaryota/UP000005640/UP000005640_9606.fasta.gz