basic manipulation of SNP arrays for population genetics
0
2
Entering edit mode
9.9 years ago
platypus998 ▴ 20

Could you tell me a basic workflow of SNP array manipulations?

Suppose that I have two data sets:

population 1: I have a ped file genotyped by Affymetrix array

population 2: a ped file by Illumina array

My understanding is:

  1. run strand_check.py
    1. select a bgl file of reference genome data (all pops or sub-pops in 1000 genomes)
    2. change a format of ped files (target pops) to bgl file format (using ped_to_bgl)
    3. make .markers files from .bim format (writing a script)
    4. split .bgl and .markers files into each chromosome for both reference and target pops (writing a script)
    5. get common SNPs between a reference and a target pop from .bgl and .markers files generated in #1-3 and #1-4. (writing a script)
    6. run strand_check.py for #1-5
  2. bgl_to_ped
  3. plink merge
  4. then do population analyses (pca, admixture, migration etc.)

Does it correct?

Sometimes, an array has genotype data with "0". In that case, which answer should I adopt?

  1. replace genotype "0" with a genotype on an annotation file of each platform and run strand_check.py
  2. remove "0" genotyped sites from initial ped files before running strand_check.py
  3. replace genotype "0" with a genotype on an annotation file and keep positions of replacement before running strand_check.py. To avoid excessive reduction of SNP sites, replace "replaced genotypes" with "0" again after running strand_check.py (1.6 in the example above).
SNP • 2.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 1754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6