How to retrieve representative at 30% sequence identity in PDB
1
0
Entering edit mode
3.1 years ago
bosimiya • 0

I plan to use the PDB advanced search to filter sequences. I need to create a test set of protein sequences. The selection conditions are probably chain length, resolution, macromolecule type, etc., which are all easy to implement.

But there is another restriction: retrieving representative at 30% sequence identity. How do I achieve this?

PDB • 665 views
ADD COMMENT
0
Entering edit mode
3.1 years ago
Mensur Dlakic ★ 27k

First you download a FASTA file with all PDB sequences.

Next you cluster them down to 30% identity using MMseqs2.

ADD COMMENT

Login before adding your answer.

Traffic: 2800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6