My goal is to convert Illumina data to forward/reverse strand data (AffyMetrix). I have 10K, 60K, 80K and 660K data from both sources.
I have the following types of files:
***.ref snp CHROMOSOME POSITION COUNT_ALLELE OTHER_ALLELE snp1 17 190170 G A snp2 17 469495 A G --- One line for each SNP ***.dat pig_id snp1 snp2 snp3 snp4 --- snp58448 11641206 1 0 1 0 0 11324561 1 0 2 0 0 14561322 2 1 1 0 1 13513507 0 0 2 0 0 --One line per animal, one row per snp
I also have an Illumina HD annotation file which looks as follows:
Illumina Inc. [Heading] Descriptor File Name GGP HD Porcine.csv Assay Format Infinium HD Ultra Date Manufactured 5/15/2013 Loci Count 68516 [Assay] IlmnID Name IlmnStrand SNP AddressA_ID AlleleA_ProbeSeq AddressB_ID AlleleB_ProbeSeq GenomeBuild Chr MapInfo Ploidy Species Source SourceVersion SourceStrand SourceSeq TopGenomicSeq BeadSetID LD-Porcine80K_ALGA0000022-0_T_F_2164561890-0_T_F_2165597341 ALGA0000022 TOP [A/G] 19808437 *seq* 10.2 1 865364 diploid Sus scrofa rs80958395 0 TOP *seq* [A/G]877
When I compare SNPs from the illumina annotation file with the affymetrix annotation file, I find inconsistency in the SNP callings.
For example after manually checking some values I find the following:
sometimes illumina TOP A/G is called as T/C in affymetrix, it sometimes is also called as A/G in affymetrix.
sometimes illumina TOP A/C is (sometimes?)called as T/G in affymetrix,
sometimes illumina BOT T/G is (sometimes?)called as A/C in affymetrix
I hope somebody will be able to help me out here