Tool for finding unique sets of proteins
2
0
Entering edit mode
9.9 years ago
Woa ★ 2.9k

I've two sets of large number of proteins( in the order 100K) , and wish to find out unique proteins belonging to each set.

Is there any tool for doing it fast?

Thanks

set sequence • 2.1k views
ADD COMMENT
0
Entering edit mode
9.9 years ago
Prakki Rama ★ 2.7k

Try running BLAST. If FILE_A sequences are matching FILE_B sequences with 100% from one end to other, they are exact matches. You could ignore those which matched, but any proteins which did not find hit in the other file must be unique.

ADD COMMENT
0
Entering edit mode
9.9 years ago
Adrian ▴ 700

BLAST would be doing much more work than is necessary to solve the problem you've posed.

The most popular tools for clustering sequences to find the unique ones are probably CD-HIT and USEARCH.

ADD COMMENT
0
Entering edit mode

The reason I would not go clustering tools is that, they cluster based on input parameters, and outputs those which did not meet the criteria as unique. Especially, when one does not know how much similar is the other organism, it is hard to put a similarity cutoff. But in contrast, BLAST computes the similarity and tables the results. So, we could cherry pick those which did not have hit as unique sequences to that particular file. If at all, the user wants, he could still use the output file generated from BLAST file and could put cutoff's and pick up hits he wanted. Nonetheless, I would be happy to hear your points also for choosing clustering techniques.

ADD REPLY
0
Entering edit mode

It depends what one wants. If, as the question states, one wants to find the unique proteins in the set, then the problem is to do exact clustering. Doing that is going to be very much faster than doing the NxN BLAST. I agree that if the questions are more subtle, having the NxN BLAST results to play with could be useful.

ADD REPLY
0
Entering edit mode

Yes, it depends on what one wants. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2732 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6