Question: Pdb Remove Similar
2
gravatar for Stef
8.2 years ago by
Stef50
Stef50 wrote:

Hello everyone.

I have noticed that pdb.org has an option "remove similar". Can someone explain how it works? My assumption until now was that it compares pairwise all against all and if they have more than X% identity it removes one of the two in the pair.

Or does it remove both?

If it removes only one then which one does it choose?

Also how does it calculate pairwise %identity? With local or global alignment?

pdb sequence • 1.3k views
ADD COMMENTlink written 8.2 years ago by Stef50
2
gravatar for Pierre Lindenbaum
8.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

The "Help" button redirects to this page: http://www.rcsb.org/pdb/statistics/clusterStatistics.do

Algorithm for Removing Similar Sequences

The query implementation for removing similar sequences is based on pre-calculated clusters of protein chains. All protein chains of at least 20 amino acids are clustered by blastclust at 100%, 95%, 90%, 70%, 50%, 40%, and 30% sequence similarity....

(....)

ADD COMMENTlink written 8.2 years ago by Pierre Lindenbaum124k
1

Ah thanks a lot. But we should be careful of this:

"Sequence similarity is defined on a chain basis, but results are returned on a structure basis."

Which means that between step A.3 and A.4 we can and will end up with chains in our proteins over our threshold.

So the answer is: Chooses the highest quality protein from every cluster which was calculates from local alignments with >90% coverage of both sequences and >X% identity

ADD REPLYlink written 8.2 years ago by Stef50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1819 users visited in the last hour