I need to run samtools mpileup on 38 individuals for whole genome sequencing. I intend to parallelize the process by splitting by chromosomes. I thought of splitting by regions to get more parallel chunks but I was told that each mpileup process consumes quite a fair bit of memory and it will segfault if it runs out of memory.
I am looking for tips on how to speedup the mpileup calls as I think from past experiences, it took 2 weeks for mpileup calls on 100 individuals for chr1.
I also separate ref.fa for male and female subjects. Is it alright if I were to use the male ref.fa for all idv ?