Easy Retrieval Of Mutant Pdb Structures
1
2
Entering edit mode
12.2 years ago
Chris ★ 1.6k

I'm looking for an automatic way to find all mutant structures to a given one. More specifically, the task would be to return all pdb ids to a given one that differ only by one amino acid. I already know about the MutaProt server, which seems to be outdated however (last update in 2006). Are there other more recent servers? I'm sure I could get the task done by parsing the whole pdb, but I'd rather like to avoid this. Thanks.

pdb mutation protein protein structure • 4.4k views
ADD COMMENT
4
Entering edit mode
12.2 years ago
Neilfws 49k

*EDIT: for anyone interested, I expanded on this answer in a blog post*

I don't know of a server or database for which this information has been pre-computed or can be retrieved by a search.

However, I think it is not too much work to craft a solution using a few tools. I'd do something like this:

(1) First, retrieve sequences of PDB chains in FASTA format:

wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz
gunzip pdbaa.gz

(2) Next, cluster the sequences using CD-HIT, choosing a high value for -c so as to obtain clusters of highly similar sequences:

cd-hit -i pdbaa -o pdb99 -c 0.99 -n 5

Example cluster:

>Cluster 13
0       1676aa, >gi|319443753|pdb|3PRX|A... *
1       1676aa, >gi|190016356|pdb|3CU7|A... at 99.94%

(3) Parse that file to extract the GIs or PDB IDs from each cluster and create a new FASTA file with sequences for each cluster.

(4) Then, do an all-versus-all global alignment for each new FASTA file, using something like needleall from the EMBOSS suite.

needleall -aformat3 pair -stdout -auto -asequence cluster1.fa \
          -bsequence cluster1.fa > cluster1.needleall

(This is a bit dumb, since for sequences A, B needleall will generate 4 alignments: AA twice and AB twice - but you get the general idea)

A portion of the alignment file (for the first case of chain aligned to self):

# Aligned_sequences: 2
# 1: 1VS5R
# 2: 1VS5R
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 75
# Identity:      75/75 (100.0%)
# Similarity:    75/75 (100.0%)
# Gaps:           0/75 ( 0.0%)
# Score: 385.0

(5) Parse the alignment output to extract the PDB ID of sequence 1, sequence 2 where, for example:

Length = 100
Identity = 99/100
# therefore differ by 1 amino acid
ADD COMMENT
0
Entering edit mode

Awesome answer and blog post. I'm curious, how many mutant pairs does this find?

ADD REPLY
0
Entering edit mode

Added that info at end of blog post. There are 12 912 pairs of PDB chains that differ by 1 residue. Of those, 1 914 pairs differ due to one gap in the alignment; the other 10 998 are due to 1 amino acid change.

ADD REPLY
0
Entering edit mode

Thanks neilfws for that nice post. Also for your blog post. Looks like a good start to approach the problem. I should have mentioned in my initial post that I'm specifically interested in structural perspective of the problem, i.e. what are the deviations between two structures due to a single aa exchange. Since seqres might not be equal to the sequence in the atom records, there might be more work involved.

ADD REPLY
0
Entering edit mode

True. In which case I guess you'd want to extract sequence from the PDB record (or something derived from it). So, parsing the whole PDB it is!

ADD REPLY

Login before adding your answer.

Traffic: 2409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6