Question: How to get ancestral SNPs states for GRCh37.p13
2
gravatar for JMR
4.4 years ago by
JMR140
London - United Kingdom
JMR140 wrote:

For a positive selection test that I want to use I need the ancestral states of all SNPs present on my data. 

I checked this FAQ from NCBI, followed the instructions and downloaded a file that contain the rsnumber, physical position and ancestral state of over 60 million  SNPs. However as a simple test, when I try to match some SNPs present in my data based on the rsnumber and physical position I didn't get any match. But when I entered the SNP on the dbSNP website I could find the SNP with a putative ancestral state with a matching physical position. 

The last upadte from the downloaded file is March 2014, but I couldn't find a reference to the build. 

Are there other places where I could get the ancestral states of SNPs? Or find an updated file from dbSNP?

Thank you in advance.

EXTRA INFORMATION FOR COMMENT

Example of an rsSNP in my data:

This is an rsSNP present in my data with its physical position based on the GRCh37 assembly.

rs2823639 17576565

When I check the SNPAncestralAllele.bcp.gz file for this rsSNP I get these matches:

rs2823639    0    A

rs2823639    1050982    A

rs2823639    1052591    A

rs2823639    1056295    A

rs2823639    1056571    A

rs2823639    1061835    A

The information on the dbSNP website is however this:

GRCh38 16204245                    
GRCh37.p13 17576565        

Ancestral allele: A

The ancestral state is the same but the physical position is not. 

snp • 2.5k views
ADD COMMENTlink modified 4.4 years ago by Jie Ping20 • written 4.4 years ago by JMR140

Can you post some sample rs#s from your dataset? Also what is the name of the file you downloaded?

Did you have a look at this instruction for getting ancestral SNP state?

http://www.ncbi.nlm.nih.gov/sites/books/NBK44409/#Build.how_do_i_download_a_flat_file_that

 

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Siva1.6k

Yes I checked the instructions and downloaded two files: Allele.bcp.gz and SNPAncestralAllele.bcp.gz. 

See edited answer to an example of an rs# of my sample.

Thanks for the help!

ADD REPLYlink written 4.4 years ago by JMR140
3

I am not sure we are seeing the same SNPAncestralAllele file.

The column definitions for the SNPAncestralAllele file from human_9606_table.sql is

CREATE TABLE [SNPAncestralAllele]
(
[snp_id] [int] NOT NULL ,
[ancestral_allele_id] [int] NOT NULL ,
[batch_id] [int] NOT NULL
)
GO

The second column in the table you posted is not chromosomal position but the batch_id

rs2823639    0    A

rs2823639    1050982    A

rs2823639    1052591    A

rs2823639    1056295    A

rs2823639    1056571    A

rs2823639    1061835    A

The chromosome position can be obtained from the b142_SNPChrPosOnRef_106.bcp file (for GRCh38). The column definitions for this file (again from human_9606_table.sql) is

CREATE TABLE [b142_SNPChrPosOnRef_106]
(
[snp_id] [int] NOT NULL ,
[chr] [varchar](32) NOT NULL ,
[pos] [int] NULL ,
[orien] [int] NULL ,
[neighbor_snp_list] [int] NULL ,
[isPAR] [varchar](1) NOT NULL
)
GO

The chromosome position for rs2823639 from b142_SNPChrPosOnRef_106.bcp file is

2823639    21    16204244    0

The reason for the -1 difference in chromosome position in .bcp file (compared to the dbSNP website) is explained here

The FTP files I linked are for the GRCh38. You can get the corresponding files for GRCh37.p13 here

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Siva1.6k

Thank you so much Siva! I downloaded the new files for GRCh37 and will try to match my rsnumber and physical position to them. I have another question though, b142_SNPChrPosOnRef_105.bcp and SNPAncestralAllele.bcp have different number of rows. Shouldn't they be the same?

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by JMR140
2

You are welcome. The b142_SNPChrPosOnRef_105.bcp file has unique rows (chromosome position) for each snp_id whereas there can be more than more row (multiple submissions/batch_ids) for the same snp_id in SNPAncestralAllele.bcp file. In the example you posted in your original post, there are 6 batch_ids for 1 snp_id.

 

ADD REPLYlink written 4.4 years ago by Siva1.6k

Thank you so much! This solved all my questions!

ADD REPLYlink written 4.4 years ago by JMR140

Hi Siva, I just encountered another problem. For several rsSNPs I found that different batches point to different ancestral alleles. Will batch number should I trust? The latest one? I searched for information on batches on the dbSNP website but couldn't find anything.

ADD REPLYlink written 4.4 years ago by JMR140
1
gravatar for Jie Ping
4.4 years ago by
Jie Ping20
China
Jie Ping20 wrote:

You can find 1kg ancestral alleles (actually derived from Ensembl) here.

 

 

ADD COMMENTlink written 4.4 years ago by Jie Ping20

I am going to try that after checking how many SNPs with ancestral information are there on the dbSNP dataset. Thanks.

ADD REPLYlink written 4.4 years ago by JMR140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1674 users visited in the last hour