Question: GenomeStudio manual calling - how many SNPs?
gravatar for Nick
3.3 years ago by
United Kingdom
Nick40 wrote:

I am performing the initial genotype calling for around 2000 samples typed on the Illumina HumanOmniExpress chip (>900,000 SNPs). I have no previous experience of genotype calling so I am (broadly) following the protocol published by Guo et al (

I have reached step 24 in the protocol: after having GenomeStudio automatically cluster all SNPs, I now need to examine autosomal SNPs with a low GenTrain score.

The article suggests manually reviewing SNPs with GenTrain score <0.7, adjusting the cluster positions where SNPs appear "fixable". A colleague also suggested I also exclude all SNPs with GenTrain score <0.4. My problem is that this leaves a list of over 15,000 SNPs with a score between 0.4 and 0.7.

Do people really go through this many SNPs manually?

I think this could take days. Manually moving cluster positions for fixable SNPs will take a few seconds, but I assume I also need to zero the remaining SNPs - this also takes non-negligible time (at least, I haven't found a quick keyboard shortcut to do this). Then, in subsequent steps I am supposed to go through the whole process again using cluster separation scores and several other criteria! Surely this is not practical?

So, am I missing some obvious shortcuts? How do people generally approach this manual calling? (Or should I just be zeroing all these SNPs?)

Any advice would be greatly appreciated.


genotyping genomestudio • 1.5k views
ADD COMMENTlink modified 3.3 years ago by vassialk190 • written 3.3 years ago by Nick40
gravatar for vassialk
3.3 years ago by
vassialk190 wrote:

You can focus on genes you need to know about, you can filter them based on some scores, you can cluster SNPs using other software, you can use PCA algorithms and similar to reduce dimensionality. You can use also the NextGene, Ugene, CLC Genomics, DNAStar and open source things like VCF tools.

ADD COMMENTlink written 3.3 years ago by vassialk190

Thanks for your suggestions. Just so that I can interpret them in the context of a GenomeStudio project:

  • In some projects I can appreciate that focusing on genes of interest will reduce the number of manual calls needed. However, in this case we are interested in genome-wide applications.
  • Your other suggestions all seem to be saying that I should consider skipping this step and adding some additional QC steps once I have exported the data from GenomeStudio. Is this what you meant?
  • Could you just clarify what you meant by PCA algorithms to reduce dimensionality? How can this be used to identify poorly called SNPs?


ADD REPLYlink written 3.3 years ago by Nick40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1150 users visited in the last hour