Question: Looking for AWS EC2 advice for a metagenomic pipeline
0
gravatar for jonathon481
7 months ago by
jonathon48110
jonathon48110 wrote:

Hi all,

I am looking to setup a EC2 instance/s to run my pipeline for the purpose of virus discovery. I have ~20 libraries that are paired-end (100 bp) (HiSeq 2500) likely around this size 19.4M spots, 3.9G bases.

Basic pipeline:

  1. Trinity to assemble paired reads

  2. Estimate abundance using RSEM

  3. blastn assembled contigs against Nucleotide database

  4. diamond blastx assembled contigs against nr database

I have run this on a r5.8xlarge (memory 256gb vCPU's 32) before, but I am wonder if:

There is a better instance type to run this on/would it be more efficient to run this on multiple instances in parallel?

Is the on demand pricing model the way to go or should I try to make use of spot (never tried it before)

My time frame is flexible but faster is always better, I have a budget ~$1300 (US) to work with and likely use a server in the Asia pacific. I'm not certain if this will cover all samples (any remaining samples will be completed on a local server).

Due to storing the file and databases I estimate I would need 700gb of storage to work with. Is general purpose ssd storage recommended for this?

I am familiar with AWS so I most likely want to use it for this case but I am always willing to look into other options

Thank you!

aws blast assembly • 314 views
ADD COMMENTlink written 7 months ago by jonathon48110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 948 users visited in the last hour