Genome Coordinates Between Assemblies: Liftover A Snp Array
1
6
Entering edit mode
10.2 years ago
Jimbou ▴ 950

Hello,

I'm starting a new post about the liftover process for SNPs because several questions came into my mind during the work I've done so far. And I want to collect all issues and answers.

What is my goal: I want to change thousend genomewide SNP positions (SNP array) from hg18 (Ref_36) to hg19 (Ref_37). The data is in plink ped/map file format and grouped by chromosome (22 plus X & Y).

The first tool I used is of course liftover from USCS. This works very well. Transform the map file into a BED file, lift over and back again. But I found out, that rs numbers of some SNPs can also be changed. For example rs2266988 into rs1129172. Therefore I found it necessary to lift and update the rs numbers as well. I found this nice tutorial and used their python script. Briefly, the script compares rs numbers in two history files from dbSNP (RsMergeArch and SNPHistory) for updating rs numbers.

After doing this, I go further with the new rs numbers, using biomart in R to get the new chromosomal (hg19) positions for each SNP. And in addition I used the dbSNP file b138_SNPChrPosOnRef.bcp.gz to update, compare and validate the location. Here I get now some troubles when comparing the results. For a small proportion of SNPs the dbSNP file has following annotations: Mapped unambiguously on non-reference assembly only. e.g. rs11090516

What does this exactly mean? Should I remove those SNPs?

Finally, I want to updated the map files and change the corresponding ped files if some SNPs were excluded.

In general, is this a good or appropriate approach to liftover a SNP-array? Do you have any suggestions or improvements?

snp liftover assembly array hg19 • 11k views
ADD COMMENT
0
Entering edit mode

@Jimbou, Hey, Even I'm working something similar like above. I needed some guidance here as you have done this before. Is there any way I can contact you regarding this, and only if it's ok with you.

ADD REPLY
2
Entering edit mode
10.2 years ago
Emily 23k

A really easy way to map variants would be using the Ensembl Variant Effect Predictor (VEP). You can just upload your list of IDs and it will find their location on the genome and which genes they map to. If the IDs have been merged into new IDs, it will find the new IDs. Using the VEP you could probably have bypassed most of your steps (liftOver, updating the IDs and using BioMart) and done it all in one go.

Not sure what "Mapped unambiguously on non-reference assembly" means. Looking at that rs11090516 in Ensembl, it seems to be mapped normally to a region on chr22.

ADD COMMENT
0
Entering edit mode

I'm a lab-biologist and learning programming and scripting during my phd and so far not familiar with the perl script or the variation API. But I think I have to dive into these tools. I checked some "problematic" SNPs using the online window and it worked very well. Do you know how I can filter out multiple hits? I uploaded a file with the SNP rs6519457 and get 21 entries, depending on different transcript positions. Thanks Emily.

ADD REPLY
0
Entering edit mode

The main purpose of the VEP is to get the genes and transcripts they hit, so you will get a hit for every transcript. There isn't a way to filter this out in the online tool, but if you use the script you can add the tag --most-severe to filter down to only one output per gene.

In terms of learning the VEP as a script - this doesn't actually require any scripting on your part. You can just download the script then run it from the command line, following the instructions in the documentation for options. This doesn't require you to know perl or be familiar with our API.

ADD REPLY
0
Entering edit mode

@Emily, Does VEP also convert rs numbers (rs#, SNP numbers) from old to new builds? As far as I see, VEP just does conversion for gene IDs or transcripts.

ADD REPLY

Login before adding your answer.

Traffic: 2532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6