Question

What Information Do The Files For The String Protein Downloads Contain.

1

Entering edit mode

11.1 years ago

bruce.moran ▴ 960

I cannot find the header for the downloadable protein.links.detailed.v9.05.txt.gz (at STRING downloads).

The file looks like:

  9913.ENSBTAP00000000003 9913.ENSBTAP00000007925 0 0 0 157 0 0 0 157
  9913.ENSBTAP00000000005 9913.ENSBTAP00000000457 0 0 0 0 0 800 0 800
  9913.ENSBTAP00000000005 9913.ENSBTAP00000000477 0 0 0 0 0 800 0 800
  9913.ENSBTAP00000000005 9913.ENSBTAP00000000695 0 0 0 0 0 0 228 228
  9913.ENSBTAP00000000005 9913.ENSBTAP00000000968 0 0 0 0 0 800 0 800

Can someone let me know what columns 3-10 are? I am assuming it is as here (scroll to bottom):

http://string-db.org/help/topic/org.string-db.docs/ch04.html#d0e636

but cannot find confirmation of this.

Thoughts on what scores to filter by are also appreciated!

Thanks in advance,

Bruce.

interaction network • 2.9k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 11.1 years ago by bruce.moran ▴ 960

score 0 · Answer 1 · 2013-03-22

0

Entering edit mode

11.1 years ago

Istvan Albert 100k

Not to play the devil's advocate but why would you need a second confirmation? The information that you found seems very reliable and appears to be the official designation.

nscore - neighborhood score, (computed from the inter-gene nucleotide count).
fscore - fusion score (derived from fused proteins in other species).
pscore - cooccurence score of the phyletic profile (derived from similar absence/presence patterns of genes).
hscore - homology score, the degree of homology of the interactors (trivial and normally not reported in STRING).
ascore - coexpression score (derived from similar pattern of mRNA expression measured by DNA arrays and similar technologies).
escore - experimental score (derived from experimental data, such as, affinity chromatography).
dscore - database score (derived from curated data of various databases).
tscore - textmining score (derived from the co-occurrence of gene/protein names in abstracts).

As to your second question it all depends on what is the problem that you are trying to solve.

ADD COMMENT • link 11.1 years ago by Istvan Albert 100k

0

Entering edit mode

On the same FAQ they say to awk '$3 > 700' for a 'combined score'. So I am a bit worried that the scores are different. The neighborhood score above didn't seem to be a 'combined' score.

I am looking to make an interaction network graph (RedeR) and use these interactions as a basis for this. So I want to include genes from an RNAseq experiment that are highly expressed (but not DE) and overlay DE genes on these based on my interactions from STRING.

ADD REPLY • link 11.1 years ago by bruce.moran ▴ 960

0

Entering edit mode

one way to double check could be to use the STRING API and make some requests to the website then compare the contents of the results to that in the file

ADD REPLY • link 11.1 years ago by Istvan Albert 100k

0

Entering edit mode

Yes, just had that same thought now! Thanks for your help.

ADD REPLY • link 11.1 years ago by bruce.moran ▴ 960

Ram · Answer 2 · 2014-11-20

0

Entering edit mode

9.4 years ago

Sapan Mandloi • 0

In protein.detailed file of STRING database, there are 8 different scores.

As I remember the last one is combined score (which is not mentioned above or in its faq).

Can anyone please confirm which column will be than experimental score??

Thanks

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Sapan Mandloi • 0