How to use python blast to search for protein homologs and retrieve 10 of it?
1
1
Entering edit mode
2.5 years ago
Wong • 0

I am new for bioinformatics, and I want to use python to blast for protein homolog. After I save the result in xml file, I don't know how to retrieve the nucleotide sequence from it. I am using the blast on python through internet.

retrieve sequence homologs Python • 2.2k views
ADD COMMENT
0
Entering edit mode
2.5 years ago

I will be using the GitHub Copilot to answer this question:

I started out by writing the following comment:

# parse XML blast output with BioPython

at that point the Copilot wrote me the following code:

def parse_blast(blast_file):
    """
    Parse XML BLAST output with BioPython
    """
    from Bio.Blast import NCBIXML
    blast_records = NCBIXML.parse(open(blast_file))
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                print('****Alignment****')
                print('sequence:', alignment.title)
                print('length:', alignment.length)
                print('e value:', hsp.expect)
                print('identities:', hsp.identities)
                print('positives:', hsp.positives)
                print('query:', hsp.query)
                print('match:', hsp.match)

I think it is a really good start and can be adapted to your needs.

A new era has dawned upon us in computing.

See also: how to split RNA-seq into codons by using python

ADD COMMENT
0
Entering edit mode

I want to save the retrieved homologs in the fasta file but when I run it, my fasta file is empty.

ADD REPLY
0
Entering edit mode
result_handle = open("myblast.xml")
blast_records = NCBIXML.parse(result_handle)
i=0
with open("result.fasta", "w") as savefasta_handle:
    for blast_record in blast_records:
        for i in range(10):
             for alignment in blast_record.alignments:
                 for hsp in alignment.hsps:
                     savefasta_handle.write('>%s\n' % (hsp.query))
savefasta_handle.close()
ADD REPLY
0
Entering edit mode

Can you help me check which step did I get it wrong as I have tried many ways and still couldn't work.

ADD REPLY
0
Entering edit mode

show some sections of your XML file, first we need to establish that you have what you want to get out, print each alignment in the loop, see what is inside them

do also note that blast does not find homologs, it finds aligned regions using a local alignment

ADD REPLY
0
Entering edit mode
record = open("Escherichia coli.fasta").read()
result_handle = NCBIWWW.qblast("blastx", "nt", record)
with open("my_blast.xml", "w") as out_handle:
    out_handle.write(result_handle.read())
out_handle.close()
ADD REPLY
0
Entering edit mode

you seem to be opening a FASTA file and parsing it as XML. You need to open the blast result file

ADD REPLY

Login before adding your answer.

Traffic: 2929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6