Question: Top/Bottom Confusion For Illumina Snp Calls
3
gravatar for Perry
8.9 years ago by
Perry290
philadelphia
Perry290 wrote:

Here is the problem: Illumina calls their SNPs AA,AB,BB. The meaning of A and B depend on what they call "top" or "bottom" strand. One of the problems that I am facing is that I don't have the original data. All I have is the Illumina SNP processed file with the SNP number and genotype call (AA, AB, BB). THESE CALLS SHOULD BE UNIQUELY translatable into nucleotides.

1) let's assume for a moment that the SNP calls are from a ILMN_Human_1M chip

2) let's say for rs13536 I have a call of BB

3) what nucleotides does this correspond to on the positive strand of the reference genome?

According to Illumina:

Top Strand, Bottom Strand

1: A-G , T-C

2: A-C , T-G

So if I go to dbSNP for rs13536, and I see T/C, I'm dealing with the bottom strand, and I can use this to get the nucleotides.

I see that I can solve my problem by determining if the call is top or bottom, by following these instructions:

1 You can compute the top/bottom designation yourself using the data in the /organisms/human_9606/GWAS_arrays/ directory on the dbSNP FTP site.

2 You can look at dbSNP's top/bottom assignment, which you can access if you download the SubSNP.bcp file located in the/database/organism_data/ directory for human. The field that includes the top/bottom data is called SubSNP.top_or_bot_strand. You can access the table DDL for SubSNP in the /database/organism_schema directory.

I do both to make sure my answers are consistent. I grab:

1) ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/GWAS_arrays/ILLUMINA.ILLUMINA_Human_1M.xml.gz

2) ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/organism_data/SubSNP_top_or_bot.bcp.gz

In ILLUMINA.ILLUMINA_Human_1M.xml, rs13536 is top: <ss batchid="33668" buildid="127" handle="ILLUMINA" linkouturl="&lt;a href='&lt;a href=" http:="" www.illumina.com="" products="" arraysreagents="" wgghuman1.ilmnHuman1-rs13'&gt;http:="" www.illumina.com="" products="" arraysreagents="" wgghuman1.ilmnHuman1-rs13&lt;="" a&gt;&lt;="" p&gt;"="" rel="nofollow">http://www.illumina.com/products/arraysreagents/wgghuman1.ilmnHuman1-rs13'>http://www.illumina.com/products/arraysreagents/wgghuman1.ilmnHuman1-rs13

536" locsnpid="Human1-rs13536" methodclass="other" moltype="genomic" orient="forward" ssid="65715089" strand="top" subsnpclass="snp" validated="by-submitter"> <sequence>

            <Seq5>TTTCGAACCGAGACAGATGGCAGCTAAATGAAGTTTAATTAAAGAATGAG</Seq5>

            <Observed>C/T</Observed>

            <Seq3>GCTGGGGCCCTTTTTATTGGGTACTGCATCTACTTCGACCACAAAAGACG</Seq3>

        </Sequence>

But Illimina states that C/T is bottom. Why is it top here?

In SubSNP_top_or_bot.bcp, rs13536 is bottom, which is consistent with C/T:

13536 B 5

Why is there a conflict between the files?

dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=rs13536) shows bottom for both ILLUMINA assays. Why is ILLUMINA.ILLUMINA_Human_1M.xml in conflict with these?

illumina dbsnp • 8.6k views
ADD COMMENTlink written 8.9 years ago by Perry290
6
gravatar for Jan Oosting
8.9 years ago by
Jan Oosting870
Leiden, NL
Jan Oosting870 wrote:

Illumina has a technote on their naming convention of SNPs: “TOP/BOT” Strand and “A/B” Allele

ADD COMMENTlink written 8.9 years ago by Jan Oosting870

Thanks. It seems to me that this naming convention conflicts with what is presented in ILLUMINA.ILLUMINA_Human_1M.xml. In the naming convention file, rs536477 is A/G and TOP. However, the rs536477 entries in the XML file are A/G and strand='bottom'. Does strand have a different meaning in the XML file?

ADD REPLYlink written 8.9 years ago by Perry290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1762 users visited in the last hour