Question: Difference between SNParray genotype dataset and WES genotype dataset
2.7 years ago by
Hi all,

I have handled many genotype datasets from SNP-array dataset or WES dataset. I found the standard PLINk QC methods for GWAS cannot be applied to genotype dataset from WES simply. For example, relatedness check step will screen more outliers. The heterozygous rate scale is smaller. It is kind of hard for me to understand the reason. Many people told me if you have WES or WGS data, no need to have SNP array data. I think there may be some critical difference. Any explanation will be appreciated.


2.7 years ago by
Kevin Blighe63k
A genotyping array will typically look at SNP positions spaced roughly evenly across the entire genome, irrespective of intergenic or coding region. WES, obviously, then, just looks at the coding regions but is far superior for screening coding variants than an array. Also, for things like burden testing and novel variants discovery, WES is far superior.

Specifically for relatedness, though, the key difference would be that intergenic regions (and thus intergenic SNPs) are less conserved than coding regions, whereas exons are highly conserved. Thus, you could argue that performing relatedness on just coding regions using WES is biasing the results. Thus, you will notice differences in relatedness results between WES and a genome-wide genotyping array.

Hope that this helps


