Computational resources for WGS variant calling
1
3
Entering edit mode
4.4 years ago
alesssia ▴ 580

Dear all,

we have WGS data for about 2000 individuals (30x, ~100G per file). We would like to align them using bwakit, and then perform the variant calling using GATK haplotype caller, something I have never done before at this scale (and with such large files)

We have limited computational resources, and we will be applying for an external OpenStack cluster (something I am not familiar with), for which I need to prepare a list of computational requirements, and I would like to gather some suggestions from someone more expert than me.

In your opinion, how much memory would I need for each sample? And how long will it take?

I have been told that each node in the OpenStack cluster is 48 cores, with 512GB of RAM (therefore it would be 24 x 256, 12 x 128 etc.), with a local disk of 50GB and local storage mounted via NFS.

Thank you very much in advance, any suggestion will be highly appreciated!

WGS bwakit GATK variant calling OpenStack • 1.2k views
ADD COMMENT
3
Entering edit mode
4.3 years ago

you have enough memory (GATK 3.X might take 64GB for some steps). In theory, you might finish this in 12 days or so.

The local storage for scratch space might be an issue, maybe 50GB is a typo? My phone has more than that.

Perhaps, more importantly, is that you bring an experienced bioinformatics engineer on board to design this pipeline and the proper handling of sequencing and technical metadata. Otherwise, debugging the pipeline and handling subsequent runs will take far more time than a computer could ever waste.

ADD COMMENT

Login before adding your answer.

Traffic: 2550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6