I read this paper and do not well understand the "Variant calling was performed on all 2,073 BAM files using the GATK UnifedGenotyper". In this paper, 2073 mouse with ~0.6X were sequenced and UnifedGenotyper was employed to call variants. The method described as:
Variant calling was performed on all 2,073 BAM files using the GATK UnifedGenotyper with thresholds
-stand_emit_conf 30, as well as the following options for building variant quality recalibration tables:
-A QualByDepth -A HaplotypeScore -A BaseQualityRankSumTest -A ReadPosRankSumTest -A MappingQualityRankSumTest -A RMSMappingQuality -A DepthOfCoverage -A FisherStrand -A HardyWeinberg -A HomopolymerRun. Raw VCF files from the variant calling step for all chromosomes except the Y chromosome were pooled together for VQSR using the GATK VariantRecalibrator under SNP mode. Training, known and true sets for building the positive model are the SNPs that segregate among the classical laboratory strains of the MGP (2011 release REL-1211) on all chromosomes except the Y chromosome.
In the UnifiedGenotyper step, I saw the sampling individuals up to 250 in multi-sample SNP calling setting in GATK. How to combine 2000 samples in this step? Or I misunderstand something？