What Information Do The Files For The String Protein Downloads Contain.
2
1
Entering edit mode
11.6 years ago
bruce.moran ▴ 970

I cannot find the header for the downloadable protein.links.detailed.v9.05.txt.gz (at STRING downloads).

The file looks like:

  9913.ENSBTAP00000000003 9913.ENSBTAP00000007925 0 0 0 157 0 0 0 157
  9913.ENSBTAP00000000005 9913.ENSBTAP00000000457 0 0 0 0 0 800 0 800
  9913.ENSBTAP00000000005 9913.ENSBTAP00000000477 0 0 0 0 0 800 0 800
  9913.ENSBTAP00000000005 9913.ENSBTAP00000000695 0 0 0 0 0 0 228 228
  9913.ENSBTAP00000000005 9913.ENSBTAP00000000968 0 0 0 0 0 800 0 800

Can someone let me know what columns 3-10 are? I am assuming it is as here (scroll to bottom):

http://string-db.org/help/topic/org.string-db.docs/ch04.html#d0e636

but cannot find confirmation of this.

Thoughts on what scores to filter by are also appreciated!

Thanks in advance,

Bruce.

interaction network • 3.1k views
ADD COMMENT
0
Entering edit mode
11.6 years ago

Not to play the devil's advocate but why would you need a second confirmation? The information that you found seems very reliable and appears to be the official designation.

nscore - neighborhood score, (computed from the inter-gene nucleotide count).
fscore - fusion score (derived from fused proteins in other species).
pscore - cooccurence score of the phyletic profile (derived from similar absence/presence patterns of genes).
hscore - homology score, the degree of homology of the interactors (trivial and normally not reported in STRING).
ascore - coexpression score (derived from similar pattern of mRNA expression measured by DNA arrays and similar technologies).
escore - experimental score (derived from experimental data, such as, affinity chromatography).
dscore - database score (derived from curated data of various databases).
tscore - textmining score (derived from the co-occurrence of gene/protein names in abstracts).

As to your second question it all depends on what is the problem that you are trying to solve.

ADD COMMENT
0
Entering edit mode

On the same FAQ they say to awk '$3 > 700' for a 'combined score'. So I am a bit worried that the scores are different. The neighborhood score above didn't seem to be a 'combined' score.

I am looking to make an interaction network graph (RedeR) and use these interactions as a basis for this. So I want to include genes from an RNAseq experiment that are highly expressed (but not DE) and overlay DE genes on these based on my interactions from STRING.

ADD REPLY
0
Entering edit mode

one way to double check could be to use the STRING API and make some requests to the website then compare the contents of the results to that in the file

ADD REPLY
0
Entering edit mode

Yes, just had that same thought now! Thanks for your help.

ADD REPLY
0
Entering edit mode
10.0 years ago

In protein.detailed file of STRING database, there are 8 different scores.

As I remember the last one is combined score (which is not mentioned above or in its faq).

Can anyone please confirm which column will be than experimental score??

Thanks

ADD COMMENT

Login before adding your answer.

Traffic: 1436 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6