Looking for AWS EC2 advice for a metagenomic pipeline
0
0
Entering edit mode
3.9 years ago
jonathon481 ▴ 10

Hi all,

I am looking to setup a EC2 instance/s to run my pipeline for the purpose of virus discovery. I have ~20 libraries that are paired-end (100 bp) (HiSeq 2500) likely around this size 19.4M spots, 3.9G bases.

Basic pipeline:

  1. Trinity to assemble paired reads

  2. Estimate abundance using RSEM

  3. blastn assembled contigs against Nucleotide database

  4. diamond blastx assembled contigs against nr database

I have run this on a r5.8xlarge (memory 256gb vCPU's 32) before, but I am wonder if:

There is a better instance type to run this on/would it be more efficient to run this on multiple instances in parallel?

Is the on demand pricing model the way to go or should I try to make use of spot (never tried it before)

My time frame is flexible but faster is always better, I have a budget ~$1300 (US) to work with and likely use a server in the Asia pacific. I'm not certain if this will cover all samples (any remaining samples will be completed on a local server).

Due to storing the file and databases I estimate I would need 700gb of storage to work with. Is general purpose ssd storage recommended for this?

I am familiar with AWS so I most likely want to use it for this case but I am always willing to look into other options

Thank you!

assembly blast aws • 807 views
ADD COMMENT

Login before adding your answer.

Traffic: 1483 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6