Hi,
I am new to BioPython and am writing a program to BLAST human amino acid sequences. My search query is:
NCBIWWW.qblast(
program="blastp",
database="nr",
sequence=BLASTSeq,
entrez_query="txid9606[ORGN]",
matrix_name='BLOSUM62',
word_size='2',
expect='10',
gapcosts='11 1',
composition_based_statistics='no adjustment')
but the alignment title of the BLAST result looks like:
('sequence:', u'gi|964750848|ref|NP_001304891.1| periodic tryptophan protein 1 homolog isoform 2 [Homo sapiens] >gi|332241720|ref|XP_003270028.1| PREDICTED: periodic tryptophan protein 1 homolog isoform X2 [Nomascus leucogenys] >gi|194382424|dbj|BAG58967.1| unnamed protein product [Homo sapiens]')
I was wondering why the titles include multiple protein names and organisms, and how I can change my code so that my program only returns one human protein.
My full BLAST method is:
def callBLAST (BLASTSeq):
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
result_handle = NCBIWWW.qblast(program="blastp", database="nr", sequence=BLASTSeq, entrez_query="txid9606[ORGN]", matrix_name='BLOSUM62',word_size='2',expect='10',gapcosts='11 1',composition_based_statistics='no adjustment')
blast_record = NCBIXML.read(result_handle)
E_VALUE_THRESH = 0.04
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
if (hsp.expect < E_VALUE_THRESH):
print('****Alignment****')
print('sequence:', alignment.title)
break
Thanks!