Question

Bioinformatics Server on AWS

0

Entering edit mode

4.2 years ago

jyp327 • 0

I am currently trying to build up a server for a small bioinformatics company that handles proprietary information at times. We would like to use AWS, and I've been told they would like a custom url that they can simply go to, where following a company login, they can access tools like BLAST or IMGT database to run these analyses in-house, such that the proprietary info doesn't get sent to public servers. As such, I know we would need compute, data storage, as well as some sort of domain in which we can centralize these applications.

What combination of AWS services would be suitable for this task? I've looked into S3, EC2, Route 53, Appstream and more, but due to lack of experience with AWS, I am not sure what the best methods would look like.

Thanks!

AWS • 2.2k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 4.2 years ago by jyp327 • 0

0

Entering edit mode

NCBI has an AWS AMI for Blast. My best advice is to start from https://ncbi.github.io/blast-cloud/, then determine if you and your team have the necessary expertise to help this company.

ADD REPLY • link 4.2 years ago by Eric Lim ★ 2.1k

0

Entering edit mode

You should find the optimal solution to your problem. If you don't know what services AWS provides, then how do you know you should be using AWS?

ADD REPLY • link 4.2 years ago by igor 13k

score 0 · Answer 1 · 2020-02-14

I'm not a web app developer, but I'll list a few things off the top of my head re: compute and storage.

You'll also need to set up a Virtual Private Cloud (VPC) and harden it using standard networking approaches. Instances are spawned into the default VPC unless another one is specified. Users and privileges can be assigned and managed through IAM.

The backups for your instances will be managed through EC2, and you can manage object lifecycles using the various tiers of S3. Note that you manage what happens when a file is deleted, etc. Object storage is different than block storage (and is unsuitable for OS installation), but you do get to use Amazon's backbone without egress charges moving between S3 and EC2 in the same region. There are small charges per batch of requests. I would recommend looking into Elastic File System since it can grow on-demand as opposed to reserving large, static EBS volumes when you spin up your EC2 instances.

You should also consider whether a shared reservation will work (i.e., multiple instances from different customers per physical server) or whether you need a dedicated set of instances. See here.

There's no easy way to start out learning all the tools besides jumping into some training, but I hope this helps.

score 0 · Answer 2 · 2023-03-22

I realize this answer is so late that it is unlikely to be of use to the original poster, but given technology has moved on quite a bit in 3 years I thought I would point out an alternative approach for those in the same situation. If you don't want to spend all the time and resources to set things up yourself in AWS, then second generation bioinformatics platforms like Basepair (and LifeBit I think) enable you to simply plug in the EC2 and S3 resources in your own AWS account after you have created it. The platform then abstracts away all of the dev ops that you would normally have to do yourself and this sort of set up means no proprietary data ever gets sent to public servers. Not to mention meaning you can also benefit from any credits AWS might be offering you as a newcomer to their ecosystem. Just saying...