Question: What Is Your Experience With The String (Interactions) Database?
gravatar for Giovanni M Dall'Olio
10.7 years ago by
London, UK
Giovanni M Dall'Olio27k wrote:

STRING is a database of predicted protein-protein interactions at EMBL. It cluster the results from many sources of protein-protein interactions databases, like Mint, etc.., and it also use the informations from KEGG-pathways and reactome, to provide the best annotations for the interactions of a protein.

I am a bit confused from the results that I see there, because when I look at the genes in the pathway I am studying, I see many errors and annotations that I don't understand.

What is your experience with STRING? If you want to do me a favor, go there and try to see the interactions annotated for a gene that you know already. Do you see anything weird?

ppi subjective • 7.4k views
ADD COMMENTlink modified 3.1 years ago by Dattatray Mongad350 • written 10.7 years ago by Giovanni M Dall'Olio27k
gravatar for Khader Shameer
10.6 years ago by
Manhattan, NY
Khader Shameer18k wrote:

I have used STRING in three projects and I am still using it for large scale protein-protein interaction data analysis. I have downloaded the data and worked on PPI data of 5 eukaryotic model organisms. I strongly recommend STRING if you are looking for prokaryotic PPI data or if you working on a global scale of PPI network analysis in any given organism. An exceptional advantage about STRING is that they derive the PPI information from multiple approaches, still every single single interaction is scored using a scoring scheme. This gives a higher advantage to filter specific interactions that you are interested in (for example you can get PPI from human that have a score >0.7 from experimental approach) and thus you can reduce the false positive rate. Another interesting aspect of STRING is the predicted interactions that are not reported in DIP or HPRD (If you are looking for literature curated, experimental annotations I strongly recommend HPRD ), this is something really exciting. You may get an interesting connections (not yet proven, though) that can lead you to new biological insights. The STRING team also maintain an interesting blog, with the new releases, code-snippets, API detailes etc.

ADD COMMENTlink written 10.6 years ago by Khader Shameer18k

Have you looked at their web site or downloadable files ? AFAIK, STRING basically use Ensembl IDs in their PPI files but provide another mapping file to map from other identitfiers. The problem of mapping a gene to a pathway is always not a direct approach, think of this scenario : 1 gene, n transcript and one of them could go in to pathway. 'n' transcripts code for n splice variants of same protein, so it is not wrong in merging the IDs of transcripts to one gene ID.

ADD REPLYlink written 10.6 years ago by Khader Shameer18k

I looked at the genes in the pathway that I studied and I have found a lot of errors, including genes with similar names being merged as one, and many false positives due to genes being in the same pathway in some database. And my pathway is not exactly badly annotated, it was already described in the '80s...

ADD REPLYlink written 10.6 years ago by Giovanni M Dall'Olio27k
gravatar for David Nusinow
10.6 years ago by
David Nusinow260
Boston, MA
David Nusinow260 wrote:

I've been using STRING extensively, but not for protein-protein interactions work. STRING, as you note, is a bit of a mutt in terms of the different data sources it mines. Some that you're missing include a broad literature-based search, as well as gene expression data sets. So if you're interested primarily in physical interactions or any other single type of data source, STRING is a poor choice for your work. On the other hand, STRING does provide confidence scores for each association, as well as annotation for their data source types (with the license). So you can use those to filter out the interactions derived from data types you don't want to see.

ADD COMMENTlink written 10.6 years ago by David Nusinow260
gravatar for Istvan Albert
10.7 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

I have not used STRING in particular but I have worked with protein interactions before (DIP dataset). I recall that even experimentally produced protein-protein interactions may have very large false positive ratios (as for false negatives, who knows?) Some papers claim that up to 50% of the interactions were spurious; and repeated experiments showed very small overlaps. Predictions may be even less reliable.

At the same time the DIP dataset performed substantially better if we only considered the interactions for which there were multiple sources of evidence, so that may be a strategy to consider in your case as well.

ADD COMMENTlink written 10.7 years ago by Istvan Albert ♦♦ 84k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1631 users visited in the last hour