I need some help understanding the structure of a variant call file, with Affymetrix data
I have a RAW data file in the following format:
probeset_id CEL_call_code chromosome position rsid AFFX-SP-000001 CC 10 121336954 rs10466213 AFFX-SP-000002 CG 12 23048418 rs10770943 AFFX-SP-000004 GG 17 56334747 rs11079221 AFFX-SP-000005 GG 11 85910686 rs12285109 AFFX-SP-000006 CG 15 60865412 rs12913890
The problem I have is that some identical SNPs have different call codes, and in some situation they are in same position or different position, like for these two examples:
Same position with different CEL call code
probeset_id CEL_call_code chromosome position rsid AX-96108113 AC 4 6301295 rs1801214 AX-96108115 TC 4 6301295 rs1801214
Different position with different CEL call code
probeset_id CEL_call_code chromosome position rsid AX-123355923 CACA 7 117642463 rs121908784 AX-96064890 AA 7 117642464 rs121908784
How is this possible, and how do I know which one is the CORRECT CEL call code for these SNPs which are multiple times in the same file.
Thank you, any suggestion would be very much appreciated.