GenomeStudio manual calling - how many SNPs?
1
0
Entering edit mode
6.1 years ago
Nick ▴ 70

I am performing the initial genotype calling for around 2000 samples typed on the Illumina HumanOmniExpress chip (>900,000 SNPs). I have no previous experience of genotype calling so I am (broadly) following the protocol published by Guo et al (http://www.nature.com/nprot/journal/v9/n11/abs/nprot.2014.174.html).

I have reached step 24 in the protocol: after having GenomeStudio automatically cluster all SNPs, I now need to examine autosomal SNPs with a low GenTrain score.

The article suggests manually reviewing SNPs with GenTrain score <0.7, adjusting the cluster positions where SNPs appear "fixable". A colleague also suggested I also exclude all SNPs with GenTrain score <0.4. My problem is that this leaves a list of over 15,000 SNPs with a score between 0.4 and 0.7.

Do people really go through this many SNPs manually?

I think this could take days. Manually moving cluster positions for fixable SNPs will take a few seconds, but I assume I also need to zero the remaining SNPs - this also takes non-negligible time (at least, I haven't found a quick keyboard shortcut to do this). Then, in subsequent steps I am supposed to go through the whole process again using cluster separation scores and several other criteria! Surely this is not practical?

So, am I missing some obvious shortcuts? How do people generally approach this manual calling? (Or should I just be zeroing all these SNPs?)

Any advice would be greatly appreciated.

Nick

GenomeStudio genotyping • 2.7k views
ADD COMMENT
0
Entering edit mode
6.1 years ago
vassialk ▴ 200

You can focus on genes you need to know about, you can filter them based on some scores, you can cluster SNPs using other software, you can use PCA algorithms and similar to reduce dimensionality. You can use also the NextGene, Ugene, CLC Genomics, DNAStar and open source things like VCF tools.

ADD COMMENT
0
Entering edit mode

Thanks for your suggestions. Just so that I can interpret them in the context of a GenomeStudio project:

  • In some projects I can appreciate that focusing on genes of interest will reduce the number of manual calls needed. However, in this case we are interested in genome-wide applications.
  • Your other suggestions all seem to be saying that I should consider skipping this step and adding some additional QC steps once I have exported the data from GenomeStudio. Is this what you meant?
  • Could you just clarify what you meant by PCA algorithms to reduce dimensionality? How can this be used to identify poorly called SNPs?

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6