I am analyzing some data from 2015 and the original postdoc has moved on. The file is named "072312_FinalReport.txt" and has ~197k SNP measurements for 60 people. I have figured out the meaning of all of the columns, except for the last column.
Can someone explain what "SNP" in the last column ([T/G] et cetera) means? I can't find documentation for what this file format is (the "txt" file extension is not very helpful). Thank you!
SNP Name Sample ID Allele1 - Top Allele2 - Top GC Score SNP chr1:109457160 2 C C 0.8609 [T/G] chr1:109457233 2 C C 0.7725 [T/G] chr1:109457614 2 - - 0.0000 [T/C] chr1:109457618 2 A A 0.5787 [T/C]
UPDATE: In case it would be helpful to have a bigger example here are 30 random rows:
SNP Name Sample ID Allele1 - Top Allele2 - Top GC Score SNP rs10838158 11 A A 0.9435 [A/G] chr1:170643885 83 A G 0.8454 [A/G] rs17868322 13 G G 0.8427 [T/C] rs1983076 8 G G 0.8717 [C/G] chr17:65847329 83 G G 0.9091 [T/C] rs12123144 59 G G 0.8699 [T/C] chr6:43861303 8 G G 0.8048 [T/C] chr10:44140519 40 A A 0.9412 [T/C] chr7:71808639 5 G G 0.6323 [G/C] chr15_60123726 77 C C 0.5374 [T/G] rs4821651 98 A A 0.8674 [A/G] chr16:19623531 83 A A 0.7984 [A/G] rs4856138 62 A A 0.8110 [A/G] rs1918761 24 G G 0.9187 [A/G] rs13074345 80 G G 0.9386 [T/C] chr1:63006481 3 - - 0.0000 [T/C] rs12349196 74 G G 0.9474 [A/G] chr11:100142349 91 A A 0.9136 [T/C] chr6:134235603 98 A A 0.9410 [T/C] chr7:150279883 2 G G 0.8980 [T/C] chr3:12220516 43 C C 0.4148 [T/G] chr1:109619829 102 - - 0.0000 [T/G] chr5:156402386 34 G G 0.9539 [A/G] chr11:47962357 11 G G 0.9013 [A/G] chr11:46698484 96 A C 0.9002 [T/G] rs217454 65 A G 0.8505 [T/C] chr11:27530273 77 A A 0.8543 [T/C] rs13201824 37 C C 0.7432 [T/G] chr6:118655523 36 A A 0.9507 [A/G] rs7791822 13 A A 0.9600 [T/C]
I heard back from Illumina:
"You will need the manifest file in order to determine the SNP. The manifest describes the SNP or probe content on a BeadChip.
Since the data that you have is from an older array the manifest is not currently listed online. In the link below I have sent the manifest file to you."
So the SNP represents the probe content and it is chosen to detect a SNP based on the data in the "TopGenomicSeq" column in the cardio-metabo_chip_11395247_a.csv file provided by Illumina. Thanks!