Question: Finding Single Domain Proteins
3
gravatar for Fernando
8.8 years ago by
Fernando30
Fernando30 wrote:

Hi,

I am just begining to find my way through protein science--I have a question I want a list of all Single domain proteins in the PDB, I am not sure if there is a list like that?

I tried to play with both CATH/SCOP but I am not getting anywhere, is there a list someone has of all the single domain proteins, does not matter if it is all alpha or mixed, just need a list of them

What I mean is, lets say I define a domain as defied by SCOP (or CATH), I just want a list of single domain proteins Thanks, Fernando

domain • 2.9k views
ADD COMMENTlink written 8.8 years ago by Fernando30
1

Can you clarify what you mean by single-domain proteins? (It might help to state what your research question is.)

  1. Peptides which only have 1 functional domain, ignoring overlaps. These would be identifiable by Pfam or RPS-BLAST search against the PDBAA sequence database for domain architecture.

  2. 3D structures that show only 1 domain, ignoring small ligands.

  3. Something else?

ADD REPLYlink written 8.8 years ago by Eric T.2.6k

For example lysozyme is a single domain protein, so I define a single domain as something that cannot be further divided (unlike hemoglobin which has 4 domains)

Actually, is there a way to find all small globular proteins? These are usually sinegle domains (~ 100-150 residue)??

I am sorry if these questions sound trivial !

I am looking to compare different small globular proteins structures. (Not using RMSD or FASTA just visul comparision using pymol)

ADD REPLYlink written 8.8 years ago by Fernando30
4
gravatar for Pierre Lindenbaum
8.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

The following java program scans uniprot and search for the entries having an entry in PDB and having one and only one entry in prosite:

(firts generate the XML unmarshaller with:

 xjc -d . "http://www.uniprot.org/docs/uniprot.xsd"

then compile (javac Biostar14046.java) and run ( java Biostar14046) the following program:

import java.net.URL;
import java.util.zip.GZIPInputStream;

import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import javax.xml.namespace.QName;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.events.XMLEvent;

import org.uniprot.uniprot.DbReferenceType;
import org.uniprot.uniprot.Entry;

public class Biostar14046
    {
    void run() throws Exception
        {
        JAXBContext jc = JAXBContext.newInstance("org.uniprot.uniprot");
        Unmarshaller u=jc.createUnmarshaller();
        XMLInputFactory factory = XMLInputFactory.newInstance();
        factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, Boolean.TRUE);
        factory.setProperty(XMLInputFactory.IS_VALIDATING, Boolean.FALSE);
        factory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);
        factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, Boolean.TRUE);
        XMLEventReader r= factory.createXMLEventReader(new GZIPInputStream(new URL("ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz").openStream()));
        int i=0;
        while(r.hasNext())
            {
            XMLEvent evt=r.peek();
            if(!(evt.isStartElement() && evt.asStartElement().getName().getLocalPart().equals("entry")))
                {
                r.next();
                continue;
                }
            QName qName=evt.asStartElement().getName();
            Entry entry=(Entry)u.unmarshal(r);
            int countprosite=0;
            String pdb=null;
            for(DbReferenceType ref:entry.getDbReference())
                {
                if(ref.getType().equals("PDB") && ref.getId()!=null)
                    {
                    pdb=ref.getId();
                    }
                else if(ref.getType().equals("PROSITE"))
                    {
                    countprosite++;
                    }
                }
            if(countprosite!=1 || pdb==null) continue;

            System.out.println(entry.getAccession()+"\t"+pdb);
            }
        }
    public static void main(String[] args) throws Exception
        {
        new Biostar14046().run();
        }
    }

Result:

[Q58097]    2Z61
[P49777, Q9URU7]    1IUF
[P02718]    1OLK
[Q08AH3, B3KTT9, O75202]    3GPC
[P26276]    3C04
[Q9ZCD3]    3MX6
[O35381, P97437]    2JQD
[Q9NQW6, Q5CZ78, Q6NSK5, Q9H8Y4, Q9NVN9, Q9NVP0]    2Y7B
[O43747, O75709, O75842, Q9UG09, Q9Y3U4]    1IU1
[P53068, D6VV95]    1GQP
[P07741, Q3KP55, Q68DF9]    1ZN9
[O50202]    2WFW
[P63590, Q48ZH6, Q9A0E5]    2OCZ
[P0AC38, P04422, P78140, Q2M6G5]    1JSW
[P0ABB8, P39168, Q2M665]    3GWI
[P33447]    1BW0
[P56547]    1RKR
[Q9X108]    1UP7
[P52664]    1HZO
[P0C2P0, P78986, Q0CGS9]    2Z3J
[P14315]    3LK4
[P57730, A2RRF8]    1DGN
[A5JTM5]    1NZY
[Q28960]    1N5D
[P80075, A0AV77, P78388]    1ESR
[P18181, Q545K2]    2PTV
[P31997, O60399, Q16574]    2DKS
[P30429, Q5BHI5]    3LQR
[P36222, B2R7B0, P30923, Q8IVA4, Q96HI7]    1NWU
[Q5PXQ6]    1TMX
[P01524]    1GIB
[Q96LI5, Q9UF92]    3NGQ
[Q9DBL7, A2BFA8, Q3TVZ2, Q8K3Y4]    2F6R
[P49347]    1CNV
[P02526, A2TJU8]    4GCR
[P32081, P41017, Q45690]    2I5M
[P01443]    1KBT
[Q6F495, Q3MV17]    2D04
(...)
ADD COMMENTlink modified 5.4 years ago • written 8.8 years ago by Pierre Lindenbaum129k

xjc -d . "http://www.uniprot.org/support/docs/uniprot.xsd--> When I run this it is showing error like  [ERROR] schema_reference.4: Failed to read schema document 'http://www.uniprot.org/support/docs/uniprot.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>. Will you please fix this one.

ADD REPLYlink written 5.4 years ago by venu6.7k
xjc -d . "http://www.uniprot.org/docs/uniprot.xsd"
ADD REPLYlink written 5.4 years ago by Pierre Lindenbaum129k

http://tinypic.com/view.php?pic=2hmeagx&s=8#.VP6dKHWUc8o Here is a link to screenshot, when I run program java Biostar14046, I got these exceptions

ADD REPLYlink written 5.4 years ago by venu6.7k

compile *all* classes generated by xjc , not just javac Biostar14046.java (something like `find ./ -type f -name "*.java" | xargs javac ` should work.

ADD REPLYlink written 5.4 years ago by Pierre Lindenbaum129k

tried but couldn't able to find any results.

ADD REPLYlink written 5.4 years ago by venu6.7k
2
$ xjc -d tmp/WS http://www.uniprot.org/docs/uniprot.xsd
parsing a schema...
compiling a schema...
org/uniprot/uniprot/CitationType.java
org/uniprot/uniprot/CofactorType.java
org/uniprot/uniprot/CommentType.java
org/uniprot/uniprot/ConsortiumType.java
org/uniprot/uniprot/DbReferenceType.java
org/uniprot/uniprot/Entry.java
org/uniprot/uniprot/EventType.java
org/uniprot/uniprot/EvidenceType.java
org/uniprot/uniprot/EvidencedStringType.java
org/uniprot/uniprot/FeatureType.java
org/uniprot/uniprot/GeneLocationType.java
org/uniprot/uniprot/GeneNameType.java
org/uniprot/uniprot/GeneType.java
org/uniprot/uniprot/ImportedFromType.java
org/uniprot/uniprot/InteractantType.java
org/uniprot/uniprot/IsoformType.java
org/uniprot/uniprot/KeywordType.java
org/uniprot/uniprot/LocationType.java
org/uniprot/uniprot/MoleculeType.java
org/uniprot/uniprot/NameListType.java
org/uniprot/uniprot/ObjectFactory.java
org/uniprot/uniprot/OrganismNameType.java
org/uniprot/uniprot/OrganismType.java
org/uniprot/uniprot/PersonType.java
org/uniprot/uniprot/PositionType.java
org/uniprot/uniprot/PropertyType.java
org/uniprot/uniprot/ProteinExistenceType.java
org/uniprot/uniprot/ProteinType.java
org/uniprot/uniprot/ReferenceType.java
org/uniprot/uniprot/SequenceType.java
org/uniprot/uniprot/SourceDataType.java
org/uniprot/uniprot/SourceType.java
org/uniprot/uniprot/StatusType.java
org/uniprot/uniprot/SubcellularLocationType.java
org/uniprot/uniprot/Uniprot.java
org/uniprot/uniprot/package-info.java

$ javac -sourcepath tmp/WS Biostar14046.java  tmp/WS/org/uniprot/uniprot/*

$ java -cp tmp/WS:. Biostar14046$ java  -cp tmp/WS:. Biostar14046
[P01386]    1TXB
[Q8QGR0, P80970, Q9PRZ5]    3NEQ
[P17174, B2R6R7, B7Z7E9, Q5VW80]    3II0
[P08874]    2RO4
[Q8N6N7, A6NCI2, B3KTG8]    3EPY
[Q9SWS1, Q42137]    4O7G
[P25984, Q9R559]    3FTN
[Q9Y4W6, Q6P1L0]    2LNA

 

ADD REPLYlink written 5.4 years ago by Pierre Lindenbaum129k

Hello, Now, it works with https instead of http. Also doesn't work with Java Version > 8. This is slow though, is there any faster method/way available? Thank You ~ Shashank

ADD REPLYlink written 11 months ago by Shashank Pritam0
0
gravatar for Fernando
8.8 years ago by
Fernando30
Fernando30 wrote:

For example lysozyme is a single domain protein, so I define a single domain as something that cannot be further divided (unlike hemoglobin which has 4 domains)

Actually, is there a way to find all small globular proteins? These are usually sinegle domains (~ 100-150 residue)??

I am sorry if these questions sound trivial !

I am looking to compare different small globular proteins structures. (Not using RMSD or FASTA just visul comparision using pymol)

ADD COMMENTlink written 8.8 years ago by Fernando30

You should update your original question but not adding a new answer. This one will be deleted soon.

ADD REPLYlink written 8.8 years ago by Pierre Lindenbaum129k
0
gravatar for Eric T.
8.8 years ago by
Eric T.2.6k
San Francisco, CA
Eric T.2.6k wrote:

Some combination of these strategies might do:

  1. Filter NCBI-PDBAA for sequences with length less than, say, 600.

  2. Fetch the FASTA records or PDB files for those matching sequences. Use Biopython to filter for proteins that have (a) one sequence in the FASTA record (via Bio.SeqIO), or (b) one chain in the structure (via Bio.PDB). This will lose a lot of PDB entries where the biological unit is monomeric but the crystal was solved with multiple identical chains -- but I think that's OK for your purposes.

  3. Run RPS-BLAST or HMMer on the PDBAA database, and use a script to filter for sequences that only have one distinct domain. Use a somewhat stringent e-value cutoff to reduce the number of overlapping hits you get. (The possibility of overlapping hits and multiple profile matches for a single domain can make this tricky.)

ADD COMMENTlink written 8.8 years ago by Eric T.2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 693 users visited in the last hour