Getting Sequence Data From Database
3
0
Entering edit mode
12.0 years ago
aj123 ▴ 120

Hello folks,

I wonder if there are any scripts to extract 25% and 90% non-homologous data from protein databases like pdb or astral? I know this is a commonly done thing, so I was just thinking there had to be (openly available) scripts out there which people use regularly to get these datasets for their analyses.

thank you! Anj.

sequence pdb dataset • 1.7k views
ADD COMMENT
1
Entering edit mode
12.0 years ago

Anj,

I think you should rephrase you question. Maybe I am missing the point but "25% and 90% non-homologous data from protein databases" does not make a lot of sense to me. First of all two things are homologous or they are not, we use similarity when we compare sequences with %s. Second % no-similar is still not understandable.

ADD COMMENT
0
Entering edit mode
12.0 years ago
aj123 ▴ 120

ok essentially i meant-how do i download the set of protein sequences which have only 25% sequence identity from a database? i know this is commonly done, and I was wondering if there are some commonly available scripts to do so?

--thanks!

ADD COMMENT
0
Entering edit mode
12.0 years ago
Bill Pearson ★ 1.0k

If you are interested in non-homologous proteins whose structures are known, a widely used resource is the Astral dataset from http://scop.berkeley.edu/astral/

This dataset provides sequences (with known structure) with less than 40% or 95% identity.

Note that the 40% identity set will contain many homologous proteins. However, it is a useful first step.

ADD COMMENT

Login before adding your answer.

Traffic: 2129 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6