How to retrieve representative at 30% sequence identity in PDB
1
0
Entering edit mode
7 months ago
bosimiya • 0

I plan to use the PDB advanced search to filter sequences. I need to create a test set of protein sequences. The selection conditions are probably chain length, resolution, macromolecule type, etc., which are all easy to implement.

But there is another restriction: retrieving representative at 30% sequence identity. How do I achieve this?

PDB • 275 views
ADD COMMENT
0
Entering edit mode
7 months ago
Mensur Dlakic ★ 14k

First you download a FASTA file with all PDB sequences.

Next you cluster them down to 30% identity using MMseqs2.

ADD COMMENT

Login before adding your answer.

Traffic: 2299 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6