We are running into a bottle neck in a study aiming to identify copy number variants in multiple separate groups of subjects. We are genotyping using Omni2.5-8 chips (~2.3 million markers) and analyzing the data and cleaning up in Genome Studio. The data cleanup, prior to CNV calling, is taking weeks to perform.
I am following the cleanup procedures described in “technote_infinium_genotyping_data_analysis” Basically, sorting on various metrics and zeroing SNPs that performed poorly. Following cleanup, I exclude failed SNPs and save the project with those SNPs excluded; the clean project is then used for CNV calling with multiple algorithms. I’ve written some C++ programs to speed up annotating the CNV calls.
The literature briefly (if at all) mentions data cleanup prior to CNV calling, so I’m in the dark if data cleanup normally takes weeks to complete and is just not mentioned since it is so mundane and standard. Or is there something I am missing that will make life a lot easier?
edit: added microarray tag