Long-Term Hosting For Large Annotation Files?
8
10
Entering edit mode
14.2 years ago

I'm getting ready to release an R package that makes use of some moderately large genome annotation files. (Let's say 500MB-1GB total) I'd like to host them somewhere other than our lab's website, because when I move on, I plan on continuing to maintain this package.

Scripts are easy to throw up onto github, google code, etc, but these larger files may be a problem. I know github, for example, limits the storage space available to its non-premium users.

So what's the best place to host these files long-term, given that I'd like the hosting to be free, accessible for download without intermediate screens (so my script can pull them down) and have reasonable tolerance for taking up storage space and bandwidth?

annotation data • 4.8k views
ADD COMMENT
9
Entering edit mode
14.2 years ago

There are a couple of developing efforts for solving this problem:

Given that you need access to the raw data from a URL, it sounds like Dryad might be the right solution. Plus you get good karma by supporting these awesome projects.

ADD COMMENT
1
Entering edit mode

Agreed with everyone's concerns; it's the ol' chicken and egg problem. To have confidence in a site you want to see lots of use and great data, but to get people using it you need to develop confidence. The best thing we can do is to get the word out and use them when workable.

ADD REPLY
0
Entering edit mode

I agree. Data repositories like Dryad are the way to go since you can wrap your files in searchable meta-data and provide links to the published article

ADD REPLY
0
Entering edit mode

Biotorrents is a great idea but there's never been any good data on there.

ADD REPLY
0
Entering edit mode

I'm excited about up-and-coming projects like this, but I'm a little hesistant to go with a new site - who knows if it will be around in a few years?

ADD REPLY
8
Entering edit mode
14.2 years ago

The solution that I use is to register a domain name and make it point to a server in your current lab. When you then move on to a different lab, you can simply set up a server there, change the DNS information for your domain to point to that and everything will continue to work as before.

In other words, what matters is - in my opinion - not so much where you store the data, but that you have a stable URL that continues to point to the data when you move them.

ADD COMMENT
7
Entering edit mode
14.2 years ago
brentp 24k

you mention google code, i think that's the easiest solution, each project gives you 2G of data storage and you can link directly to each file.

google docs also has 1GB of storage for any file type and you can buy more space

another option would be to pay less than $11 per month for the cheapest cloud-server from rackspace. or from whichever hosting service you prefer. then register a domain and give that as the address (rather than an IP address) so you can just update your DNS if you switch servers.

ADD COMMENT
0
Entering edit mode

Nice, I wasn't aware that they gave out 2GB of space. That should be enough for my purposes.

ADD REPLY
5
Entering edit mode
14.2 years ago

I would recommend that you may define the annotation data as '.db' package and define as a 'required package' for your main R package. For example BioConductor provides different annotation resources, some of them are bigger than 1GB in size. Technically your data will be stored on a SQLite backend which can be accessed via R / BioC.

BioConductor Accessing Annotation Data provides information about popular .db packages. Annotation resources section explains various use-cases. If you are implementing your tool as a BioC package, you may also check AnnotationDbi, AnnBuilder and PAnnBuilder as potential packages to get you started with implementation.

ADD COMMENT
0
Entering edit mode

I may do further cleanup and conform to bioconductor standards someday, but at this point, getting my code to the point where it runs on Windows would be rather onerous. (It includes some calls to bash built-ins that speed up execution significantly). I will definitely keep this in mind for the future though.

ADD REPLY
1
Entering edit mode
14.2 years ago

Is it a kind of structured data (XML, JSON)? how about storing this on http://www.freebase.com/ ?

ADD COMMENT
0
Entering edit mode

It's compressed tab-delimited text files, mostly.

ADD REPLY
1
Entering edit mode
14.2 years ago

Some journals, for example Source Code for Biology and Medicine, provide unlimited hosting for the data relating the article you publish. The best solution is to publish an article on your library and then upload the data as supplementary materials.

ADD COMMENT
1
Entering edit mode
14.2 years ago

[?]

[?]

[?]

https://www.dropbox.com

ADD COMMENT
3
Entering edit mode

To be honest with you if this was a new user I would be against it, but if someone has 1000 some reputation points then I don't really mind it, it is like earning "political capital" that they can spend as they wish ;-)

it may sound like a double standard, and maybe it is

ADD REPLY
0
Entering edit mode

that's not a host?

ADD REPLY
0
Entering edit mode

I'm a dropbox user myself, and I like the suggestion. I have a bit of a problem with posting the referral links here, though. It's not that this instance is egregiously bad, but it's a slippery slope, and a zero-tolerance policy for this kind of affiliate links makes sense, in my opinion. Anyone else have thoughts on this?

ADD REPLY
0
Entering edit mode

@Chris : if it is bothering you to see a referral link then I delete it. It is not a problem. Sorry for the inconvenience.

ADD REPLY
0
Entering edit mode

@Chris : if it is bothering you to see a referral link then I delete it. It is not a problem. Sorry for the inconvenience. Most of all I am happy that you do like the suggestion.

ADD REPLY
1
Entering edit mode
14.1 years ago

Check out bitbucket. They have recently changed their policy about free accounts and now they give an unlimited number of repository with unlimited disk space, for free.

ADD COMMENT

Login before adding your answer.

Traffic: 1073 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6