Question: Downloading sequences from PDB and only get protein residues and not ligand residues?
0
gravatar for furgfurg
2.2 years ago by
furgfurg10
furgfurg10 wrote:

The following python code will add ANY residue which has an atom with the name ‘CA’ to the “protein”, even if the atom is within a ligand. How do you change it to see if the residue name is in a list of standard residue names? How do you change the code to only get protein info?

for res in residues:
   atom_names = [] 
   atom_index = [] 
   for atom in res.atoms(): 
        atom_names.appendatom.name) 
         atom_index.append(atom.index) 
         if 'CA' in atom_names: 
                protein_atoms = protein_atoms+atom_index
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by furgfurg10
1

I'm just trying to figure out how to change it to differentiate between a ligand and a protein. I probably should've added the line residues = structure.topology.residues() above. But yes, the pdb I'm eventually going to want to use has a ligand and a protein (and water).

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by furgfurg10

You can do all that in Chimera if you just need a quick solution. Also the python source code of Chimera is here:

https://www.cgl.ucsf.edu/chimera/docs/sourcecode.html so you could just look for the function that does Select -> Residues -> Standard Amino Acids , possibly with some overhead attached.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Michael Dondrup47k

I'm not familiar with chimera, but I'll check it out. Thank you.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by furgfurg10

This looks like a post directly from an exam situation, so don't expect an immediate response ;) However, that is an interesting one regarding downloading of sequences from PDB, so we could keep it, just wait a few days with providing the answer.

ADD REPLYlink written 2.2 years ago by Michael Dondrup47k

I'm not sure what the end goal is here? Do you just want a list of all the protein residues, or are you looking to do something with the atoms specifically?

ADD REPLYlink written 2.2 years ago by Joe16k

I am assuming the question is lacking context. To add to the context, I would say that given a PDB file or output of the PDB API as input, extract only standard AA residues (this becomes more complex if the input is a protein complex), excluding the ligand, and write the result to a FASTA file or a PDB file that does only contain the protein atoms but not ligands. I was searching biostars for this relatively simple use case, but couldn't find it.

Example structure http://www.rcsb.org/structure/1DLH : how would you get only the sequence of the MHC without the bound peptide also? Or http://www.rcsb.org/structure/1Y1Y : how to get only the protein sequence without the bound RNA?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Michael Dondrup47k

The way it is now is, it looks for residues with a 'CA' atom name to identify protein residues. However, this will give back the protein and ligand residues due to the 'CA' search. How could it be changed so that ONLY protein residues get added to the list.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by furgfurg10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1475 users visited in the last hour