Entering edit mode
9.9 years ago
platypus998
▴
20
Could you tell me a basic workflow of SNP array manipulations?
Suppose that I have two data sets:
population 1: I have a ped file genotyped by Affymetrix array
population 2: a ped file by Illumina array
My understanding is:
- run strand_check.py
- select a bgl file of reference genome data (all pops or sub-pops in 1000 genomes)
- change a format of ped files (target pops) to bgl file format (using
ped_to_bgl
) - make .markers files from .bim format (writing a script)
- split .bgl and .markers files into each chromosome for both reference and target pops (writing a script)
- get common SNPs between a reference and a target pop from .bgl and .markers files generated in #1-3 and #1-4. (writing a script)
- run
strand_check.py
for #1-5
- bgl_to_ped
- plink merge
- then do population analyses (pca, admixture, migration etc.)
Does it correct?
Sometimes, an array has genotype data with "0". In that case, which answer should I adopt?
- replace genotype "0" with a genotype on an annotation file of each platform and run
strand_check.py
- remove "0" genotyped sites from initial ped files before running
strand_check.py
- replace genotype "0" with a genotype on an annotation file and keep positions of replacement before running
strand_check.py
. To avoid excessive reduction of SNP sites, replace "replaced genotypes" with "0" again after runningstrand_check.py
(1.6 in the example above).