Hi. How may I find "Identical Protein Groups" from PDB database using commandline? So far, I have been using the following command, which successfully grabs the primary PDB but unfortunately does not include "Identical Protein Groups", aka "Related Structures":
blastp -query fasta_file -out output.blast -evalue 1e-60 -db pdbaa -num_threads 4 -outfmt 7
For example, https://www.rcsb.org/structure/5WFL is has the following "Primary Citation of Related Structures:"
5WFV, 5WFL, 5WG1, 5WHO, 5WHL, 5WIY. (all 7 structures are from one manuscript).
My blastp command only picks-up one of these PDB codes. How can this commandline be adjusted to include ALL 7 PDB codes?
When you are using blast search you are picking these sequences up purely based on their primary sequence. That is perhaps the reason you are not able to find other proteins. You could try making your
e-value
cutoff less strict. Far as I know, IPG is a NCBI database.If you check the identical/related protein query at PDB it brings up many other structures. Exact search term is:
QUERY: Structure Similarity WHERE ( PDB ID = "5WFL" AND Assembly ID = "1" AND Shape Match = "Strict" )