Question: Long-Term Hosting For Large Annotation Files?
10
gravatar for Chris Miller
8.2 years ago by
Chris Miller20k
Washington University in St. Louis, MO
Chris Miller20k wrote:

I'm getting ready to release an R package that makes use of some moderately large genome annotation files. (Let's say 500MB-1GB total) I'd like to host them somewhere other than our lab's website, because when I move on, I plan on continuing to maintain this package.

Scripts are easy to throw up onto github, google code, etc, but these larger files may be a problem. I know github, for example, limits the storage space available to its non-premium users.

So what's the best place to host these files long-term, given that I'd like the hosting to be free, accessible for download without intermediate screens (so my script can pull them down) and have reasonable tolerance for taking up storage space and bandwidth?

annotation data • 2.3k views
ADD COMMENTlink modified 7.8 years ago by Giovanni M Dall'Olio26k • written 8.2 years ago by Chris Miller20k
9
gravatar for Brad Chapman
8.2 years ago by
Brad Chapman9.2k
Boston, MA
Brad Chapman9.2k wrote:

There are a couple of developing efforts for solving this problem:

Given that you need access to the raw data from a URL, it sounds like Dryad might be the right solution. Plus you get good karma by supporting these awesome projects.

ADD COMMENTlink modified 8.2 years ago • written 8.2 years ago by Brad Chapman9.2k
1

Agreed with everyone's concerns; it's the ol' chicken and egg problem. To have confidence in a site you want to see lots of use and great data, but to get people using it you need to develop confidence. The best thing we can do is to get the word out and use them when workable.

ADD REPLYlink written 8.2 years ago by Brad Chapman9.2k

I agree. Data repositories like Dryad are the way to go since you can wrap your files in searchable meta-data and provide links to the published article

ADD REPLYlink written 8.2 years ago by Casey Bergman17k

Biotorrents is a great idea but there's never been any good data on there.

ADD REPLYlink written 8.2 years ago by Will4.5k

I'm excited about up-and-coming projects like this, but I'm a little hesistant to go with a new site - who knows if it will be around in a few years?

ADD REPLYlink written 8.2 years ago by Chris Miller20k
8
gravatar for Lars Juhl Jensen
8.2 years ago by
Copenhagen, Denmark
Lars Juhl Jensen11k wrote:

The solution that I use is to register a domain name and make it point to a server in your current lab. When you then move on to a different lab, you can simply set up a server there, change the DNS information for your domain to point to that and everything will continue to work as before.

In other words, what matters is - in my opinion - not so much where you store the data, but that you have a stable URL that continues to point to the data when you move them.

ADD COMMENTlink modified 8.2 years ago • written 8.2 years ago by Lars Juhl Jensen11k
7
gravatar for brentp
8.2 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

you mention google code, i think that's the easiest solution, each project gives you 2G of data storage and you can link directly to each file.

google docs also has 1GB of storage for any file type and you can buy more space

another option would be to pay less than $11 per month for the cheapest cloud-server from rackspace. or from whichever hosting service you prefer. then register a domain and give that as the address (rather than an IP address) so you can just update your DNS if you switch servers.

ADD COMMENTlink written 8.2 years ago by brentp22k

Nice, I wasn't aware that they gave out 2GB of space. That should be enough for my purposes.

ADD REPLYlink written 8.2 years ago by Chris Miller20k
5
gravatar for Khader Shameer
8.2 years ago by
Manhattan, NY
Khader Shameer17k wrote:

I would recommend that you may define the annotation data as '.db' package and define as a 'required package' for your main R package. For example BioConductor provides different annotation resources, some of them are bigger than 1GB in size. Technically your data will be stored on a SQLite backend which can be accessed via R / BioC.

BioConductor Accessing Annotation Data provides information about popular .db packages. Annotation resources section explains various use-cases. If you are implementing your tool as a BioC package, you may also check AnnotationDbi, AnnBuilder and PAnnBuilder as potential packages to get you started with implementation.

ADD COMMENTlink written 8.2 years ago by Khader Shameer17k

I may do further cleanup and conform to bioconductor standards someday, but at this point, getting my code to the point where it runs on Windows would be rather onerous. (It includes some calls to bash built-ins that speed up execution significantly). I will definitely keep this in mind for the future though.

ADD REPLYlink written 8.2 years ago by Chris Miller20k
1
gravatar for Pierre Lindenbaum
8.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum114k wrote:

Is it a kind of structured data (XML, JSON)? how about storing this on http://www.freebase.com/ ?

ADD COMMENTlink written 8.2 years ago by Pierre Lindenbaum114k

It's compressed tab-delimited text files, mostly.

ADD REPLYlink written 8.2 years ago by Chris Miller20k
1
gravatar for Giovanni M Dall'Olio
8.2 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

Some journals, for example Source Code for Biology and Medicine, provide unlimited hosting for the data relating the article you publish. The best solution is to publish an article on your library and then upload the data as supplementary materials.

ADD COMMENTlink written 8.2 years ago by Giovanni M Dall'Olio26k
1
gravatar for Fred Fleche
8.2 years ago by
Fred Fleche4.2k
Paris, France
Fred Fleche4.2k wrote:

[?]

[?]

[?]

https://www.dropbox.com

ADD COMMENTlink modified 8.1 years ago • written 8.2 years ago by Fred Fleche4.2k
3

To be honest with you if this was a new user I would be against it, but if someone has 1000 some reputation points then I don't really mind it, it is like earning "political capital" that they can spend as they wish ;-)

it may sound like a double standard, and maybe it is

ADD REPLYlink written 8.2 years ago by Istvan Albert ♦♦ 78k

that's not a host?

ADD REPLYlink written 8.2 years ago by Niek De Klein2.4k

I'm a dropbox user myself, and I like the suggestion. I have a bit of a problem with posting the referral links here, though. It's not that this instance is egregiously bad, but it's a slippery slope, and a zero-tolerance policy for this kind of affiliate links makes sense, in my opinion. Anyone else have thoughts on this?

ADD REPLYlink written 8.2 years ago by Chris Miller20k

@Chris : if it is bothering you to see a referral link then I delete it. It is not a problem. Sorry for the inconvenience.

ADD REPLYlink written 8.1 years ago by Fred Fleche4.2k

@Chris : if it is bothering you to see a referral link then I delete it. It is not a problem. Sorry for the inconvenience. Most of all I am happy that you do like the suggestion.

ADD REPLYlink written 8.1 years ago by Fred Fleche4.2k
1
gravatar for Giovanni M Dall'Olio
8.1 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

Check out bitbucket. They have recently changed their policy about free accounts and now they give an unlimited number of repository with unlimited disk space, for free.

ADD COMMENTlink written 8.1 years ago by Giovanni M Dall'Olio26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1319 users visited in the last hour