Top/Bottom Confusion For Illumina Snp Calls
10.6 years ago
Perry ▴ 290

Here is the problem: Illumina calls their SNPs AA,AB,BB. The meaning of A and B depend on what they call "top" or "bottom" strand. One of the problems that I am facing is that I don't have the original data. All I have is the Illumina SNP processed file with the SNP number and genotype call (AA, AB, BB). THESE CALLS SHOULD BE UNIQUELY translatable into nucleotides.

1) let's assume for a moment that the SNP calls are from a ILMN_Human_1M chip

2) let's say for rs13536 I have a call of BB

3) what nucleotides does this correspond to on the positive strand of the reference genome?

According to Illumina:

Top Strand, Bottom Strand

1: A-G , T-C

2: A-C , T-G

So if I go to dbSNP for rs13536, and I see T/C, I'm dealing with the bottom strand, and I can use this to get the nucleotides.

I see that I can solve my problem by determining if the call is top or bottom, by following these instructions:

1 You can compute the top/bottom designation yourself using the data in the /organisms/human_9606/GWAS_arrays/ directory on the dbSNP FTP site.

2 You can look at dbSNP's top/bottom assignment, which you can access if you download the SubSNP.bcp file located in the/database/organism_data/ directory for human. The field that includes the top/bottom data is called SubSNP.top_or_bot_strand. You can access the table DDL for SubSNP in the /database/organism_schema directory.

I do both to make sure my answers are consistent. I grab:



In ILLUMINA.ILLUMINA_Human_1M.xml, rs13536 is top: <ss batchid="33668" buildid="127" handle="ILLUMINA" linkouturl="&lt;a href='&lt;a href=" http:="""" products="" arraysreagents="" wgghuman1.ilmnHuman1-rs13'&gt;http:="""" products="" arraysreagents="" wgghuman1.ilmnHuman1-rs13&lt;="" a&gt;&lt;="" p&gt;"="" rel="nofollow">'>

536" locsnpid="Human1-rs13536" methodclass="other" moltype="genomic" orient="forward" ssid="65715089" strand="top" subsnpclass="snp" validated="by-submitter"> <sequence>





But Illimina states that C/T is bottom. Why is it top here?

In SubSNP_top_or_bot.bcp, rs13536 is bottom, which is consistent with C/T:

13536 B 5

Why is there a conflict between the files?

dbSNP ( shows bottom for both ILLUMINA assays. Why is ILLUMINA.ILLUMINA_Human_1M.xml in conflict with these?

10.6 years ago
Jan Oosting ▴ 920

Illumina has a technote on their naming convention of SNPs: “TOP/BOT” Strand and “A/B” Allele

Thanks. It seems to me that this naming convention conflicts with what is presented in ILLUMINA.ILLUMINA_Human_1M.xml. In the naming convention file, rs536477 is A/G and TOP. However, the rs536477 entries in the XML file are A/G and strand='bottom'. Does strand have a different meaning in the XML file?


