we have WGS data for about 2000 individuals (30x, ~100G per file). We would like to align them using bwakit, and then perform the variant calling using GATK haplotype caller, something I have never done before at this scale (and with such large files)
We have limited computational resources, and we will be applying for an external OpenStack cluster (something I am not familiar with), for which I need to prepare a list of computational requirements, and I would like to gather some suggestions from someone more expert than me.
In your opinion, how much memory would I need for each sample? And how long will it take?
I have been told that each node in the OpenStack cluster is 48 cores, with 512GB of RAM (therefore it would be 24 x 256, 12 x 128 etc.), with a local disk of 50GB and local storage mounted via NFS.
Thank you very much in advance, any suggestion will be highly appreciated!