Dear community members,
I have a lot of variants for genotyping (>6 millions) and a lot of WGS samples (represented as BAM and VCF files).
My strategy for genotyping before was to read the list of variants and then iterate through VCF files, using a custom Python script. However I anticipate it will work very slow for such a huge number of samples.
Is there a way to quickly genotype a huge WGS cohort? Should I use BAM or VCF files for that?
Another issue is that VCF are called in GRCh38 and the variants for genotyping are in hg19, so for some variants where reference allele was changed in GRCh38 VCFs could be not enough, but this is a minor problem...