Question

Finding Pdb Codes Of Sequences Homologous To An Input Sequences

0

Entering edit mode

10.5 years ago

amenabanu • 0

I need to write a script that can do a blastp with an input xml file, then find and display the PDB codes of sequences that are homologous. The definition I'm using for homology is: 1. the HSP in the alignment with the highest e-value is less than .001; and 2. the total alignment length (not counting gaps) is 60% or more of the query length.

I'm still new to programming, and I'm quite stumped as to how to do this.

I'd really appreciate your help. Thanks!

python blast pdb • 2.9k views

ADD COMMENT • link updated 10.5 years ago by Hamish ★ 3.2k • written 10.5 years ago by amenabanu • 0

score 0 · Answer 1 · 2013-10-25

0

Entering edit mode

10.5 years ago

João Rodrigues ★ 2.5k

You can use Biopython to this end, have a look at the tutorial and browse the web for examples, there are plenty around.

ADD COMMENT • link 10.5 years ago by João Rodrigues ★ 2.5k

0

Entering edit mode

Thanks for responding! I am looking through Biopython. Would you be able to point me to something specific?

ADD REPLY • link 10.5 years ago by amenabanu • 0

0

Entering edit mode

I just did :) I linked you to the tutorial on the section to parse BLAST XML results. Here are two more links, google search for 'parse blast xml biopython':

http://stackoverflow.com/questions/13835912/parse-only-top-3-hits-from-blast-output-with-ncbixml http://www.biotnet.org/sites/biotnet.org/files/documents/25/biopython_blast.pdf

ADD REPLY • link 10.5 years ago by João Rodrigues ★ 2.5k

score 0 · Answer 2 · 2013-10-25

Due to the PDB chain identifiers being case sensitive, and NCBI BLAST+ not supporting case sensitive identifiers, I would tend to use one of the programs from the FASTA suite for this. Since the PDB sequence database is small, this also means that more rigorous search methods can be used, for example:

SSEARCH: for local/local alignment (Smith-Waterman)
GGSEARCH: for global/global alignment (Needleman-Wunsch)
GLSEARCH: for global/local alignment

These programs and the PDB sequence databases are available in the EMBL-EBI Sequence Similarity Search services. Since these are also available as Web Services and EMBL-EBI provide sample clients for these services, you can download a suitable client from the EMBL-EBI Web Services pages and run the search providing the appropriate parameters, for example:

wget http://www.ebi.ac.uk/Tools/webservices/download_clients/perl/lwp/fasta_lwp.pl
./fasta_lwp.pl --email email@example.org --program ssearch --database pdb --stype protein --expupperlim 0.01 --sequence mysequence.txt

Once you have the results then you can use libraries such as BioPerl, BioPython or BioRuby to parse the FASTA format results, or use the other formats provided with your own parsing/extraction code to implement your coverage filter. Typically I would use the "EBI Application XML" result with some XSLT and/or XPath to extract the required information from the result.