Working with big network data
1
0
Entering edit mode
5.5 years ago
moranr ▴ 270

Hi,

I have ~9.5 TB of data files. Each file contains a network in the format for Gephi. It is numerical data in ASCII format. I want to be able to work on compressed files to do some quantitative surveys on each file and some other analyses.
I would like to get the data down to below 4TB so it can be stored on a single HD at least. I would also like something that is fast , as I will be working with the data continuously. So far I have found lzop (deafult compression) to be the best. Anyone know anything better for this ? Or any advice for working with data like this ?

Thanks , R

networks Big Data • 1.1k views
ADD COMMENT
3
Entering edit mode
5.5 years ago

A couple of suggestions:

  • remove any unnecessary metadata
  • use a graph database, e.g. neo4j (which by the way works quite well with Gephi)
ADD COMMENT
0
Entering edit mode

Never knew about graph databases. Amazing, thank you.

ADD REPLY
1
Entering edit mode

One of the limitations of neo4j is that you only have one database per installation. The common practice for when you have multiple graphs is to put them all together, and have a flag or property to differentiate them.

ADD REPLY
0
Entering edit mode

so just something simple like concatenate the node defs together and then the edgelists together and combine the two ? could a flag be adding an extra column to each network file with the file ID or something ? Or does neo4j have a way of setting flags ?

ADD REPLY

Login before adding your answer.

Traffic: 1498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6