Data Cleanup Prior To Cnv Calling
8.6 years ago
Robert Sicko ▴ 630


We are running into a bottle neck in a study aiming to identify copy number variants in multiple separate groups of subjects. We are genotyping using Omni2.5-8 chips (~2.3 million markers) and analyzing the data and cleaning up in Genome Studio. The data cleanup, prior to CNV calling, is taking weeks to perform.

I am following the cleanup procedures described in “technote_infinium_genotyping_data_analysis” Basically, sorting on various metrics and zeroing SNPs that performed poorly. Following cleanup, I exclude failed SNPs and save the project with those SNPs excluded; the clean project is then used for CNV calling with multiple algorithms. I’ve written some C++ programs to speed up annotating the CNV calls.

The literature briefly (if at all) mentions data cleanup prior to CNV calling, so I’m in the dark if data cleanup normally takes weeks to complete and is just not mentioned since it is so mundane and standard. Or is there something I am missing that will make life a lot easier?

Thanks, Bob

edit: added microarray tag

cnv illumina copynumber microarray microarray • 1.8k views
Surely someone has experience with cleaning up data prior to CNV calling, right? I'm not complaining, if it does indeed take this much time, so be it. I just want to make sure I'm not missing something. Thanks.


