I am performing the initial genotype calling for around 2000 samples typed on the Illumina HumanOmniExpress chip (>900,000 SNPs). I have no previous experience of genotype calling so I am (broadly) following the protocol published by Guo et al (http://www.nature.com/nprot/journal/v9/n11/abs/nprot.2014.174.html).
I have reached step 24 in the protocol: after having GenomeStudio automatically cluster all SNPs, I now need to examine autosomal SNPs with a low GenTrain score.
The article suggests manually reviewing SNPs with GenTrain score <0.7, adjusting the cluster positions where SNPs appear "fixable". A colleague also suggested I also exclude all SNPs with GenTrain score <0.4. My problem is that this leaves a list of over 15,000 SNPs with a score between 0.4 and 0.7.
Do people really go through this many SNPs manually?
I think this could take days. Manually moving cluster positions for fixable SNPs will take a few seconds, but I assume I also need to zero the remaining SNPs - this also takes non-negligible time (at least, I haven't found a quick keyboard shortcut to do this). Then, in subsequent steps I am supposed to go through the whole process again using cluster separation scores and several other criteria! Surely this is not practical?
So, am I missing some obvious shortcuts? How do people generally approach this manual calling? (Or should I just be zeroing all these SNPs?)
Any advice would be greatly appreciated.