Entering edit mode
7.3 years ago
KJ Kunz
•
0
I have been trying to use BWA-MEM in a AWS Cloudman cluster to align paired end reads (150bp) from an NGS run. Each read file is ~130 GB so I am estimating the resulting BAM will be ~100 GB. I have configured the cluster a couple of different ways but the mapping just crawls along and I am terminating it after only a few gigs because the AWS costs start spiraling. Anyone have any recommendations on either how to configure an efficient cluster for this alignment? Any other good options to map using AWS/EC2 environment? Thanks!
Are you sure read files are ~130 GB each (did you uncompress them, there is generally no need to do that)?
While there may be some efficiencies to be had by trying to be clever about how you set AWS up, you are still going to incur charges (if you find the small test
cost spiraling
then AWS may not be the right option).That said, you must have seen this Wiki page?
Thanks, genomax2. I was using uncompressed versions because I thought I had read where BWA-MEM required this - reading otherwise now. Thanks for the link - not sure if I read that page yet. I think I will create a couple of compressed subset test files and re-run to see what processing time looks like on them. May have to abandon this if still looking not cost-effective - was trying to explore AWS/EC2 but may not end up being an option here.