UniprotKB BLAST pipeline troubles
1
2
Entering edit mode
3 months ago

I have been trying to make a BLAST pipeline that takes a fasta sequence and puts it through BLAST and returns the Uniprot ID of the matching sequence and puts that into a text file.

I have been testing it only on one sequence for now (Glycine cleavage protein P in Hydra Vulgaris), its taken at time of writing over an hour and it outputs that it cannot find the sequence. If i search for it manually on Uniprot BLAST under the UniprotKB reference proteomes+Swissprot database it can find the sequence in under 2 mins.

So it definitely is a problem with my code. Am I searching in the wrong database? Thank you in advance for any help.

from Bio import SeqIO
from Bio.Blast import NCBIWWW, NCBIXML

def run_blast(sequence, output_file):
    # Perform BLAST search against UniProtKB
    result_handle_uniprotkb = NCBIWWW.qblast("blastp", "uniprotkb", sequence.seq)
    blast_records_uniprotkb = NCBIXML.read(result_handle_uniprotkb)

    # Extract UniProt IDs with the lowest E-value
    best_hit_uniprotkb = min(blast_records_uniprotkb.alignments, key=lambda x: x.hsps[0].expect, default=None)

    if best_hit_uniprotkb:
        accession_id = best_hit_uniprotkb.accession
        e_value = best_hit_uniprotkb.hsps[0].expect
        database_used = "UniProtKB"
    else:
        accession_id = "N/A"
        e_value = "N/A"
        database_used = "No hits"

    # Write the result to the output file
    with open(output_file, "a") as f:
        f.write(f">{sequence.id}\t{accession_id}\t{database_used}\t{e_value}\n")

    result_handle_uniprotkb.close()

# Read the input FASTA file
input_file = "un.fasta"
output_file = "blast_results.txt"

with open(output_file, "w") as f:
    f.write("")

for record in SeqIO.parse(input_file, "fasta"):
    run_blast(record, output_file)

print("BLAST search completed. Results written to", output_file)
biopython BLAST uniprot • 426 views
ADD COMMENT
3
Entering edit mode
3 months ago
Mensur Dlakic ★ 27k

I think you are searching the wrong database. To the best of my knowledge, remote BLAST doesn't have UniProtKB as an option. When I change your code and use swissprot instead of uniprotkb and SwissProt instead of UniProtKB, it works without a problem. The largest database you can search using this approach appears to be nr.

You may want to download the UniProtKB database for local searching, after which you can parse the output files using a similar strategy to what you already have. That's assuming you have disk and CPU resources.

ADD COMMENT
4
Entering edit mode

I addition to this,

I have been testing it only on one sequence for now

if you were planning to use the web blast for many (hundreds) sequences then don't. Web blasting via command line is not meant to be used as a replacement for an install of stand alone local blast+.

ADD REPLY
0
Entering edit mode

Ok thank you, I didn't know this either.

ADD REPLY
0
Entering edit mode

Ah I see now, nr could be an option, Uniprot's ID mapping tool could convert the NCBI ascension numbers to Uniprot IDs. I wont have the computational resources for a local blast of that scale so I'll have to figure something else out regarding that. Thank you for your help

ADD REPLY

Login before adding your answer.

Traffic: 1443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6