Entering edit mode
6 months ago
Kevin Blighe
84k
Hola, just passing by to say 'hi'.
Please post bugs / suggestions as comments to this tutorial.
Automated dbSNP rsID position lookup
rsID to position
GRCh38
cat rsids.list
rs1296488112
rs1226262848
rs1225501837
rs1484860612
rs1235553513
rs1424506967
cat rsids.list | while read rsid ;
do
pos=$(curl -sX GET "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=$rsid&retmode=text&rettype=text" | sed 's/<\//\n/g' | grep -o -P '\<CHRPOS\>.{0,15}' | cut -f2 -d">") ;
echo -e "${pos}""\t""${rsid}" ;
done ;
1:13616 rs1296488112
1:13621 rs1226262848
1:13623 rs1225501837
1:13634 rs1484860612
1:13646 rs1235553513
1:13647 rs1424506967
rsID to position
GRCh37
This is the same as above, except that the tag returned that contains the GRCh37 co-ordinates is CHRPOS_PREV_ASSM
, not CHRPOS
:
cat rsids.list
rs1296488112
rs1226262848
rs1225501837
rs1484860612
rs1235553513
rs1424506967
cat rsids.list | while read rsid ;
do
pos=$(curl -sX GET "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=$rsid&retmode=text&rettype=text" | sed 's/<\//\n/g' | grep -o -P '\<CHRPOS_PREV_ASSM\>.{0,15}' | cut -f2 -d">") ;
echo -e "${pos}""\t""${rsid}" ;
done ;
1:13616 rs1296488112
1:13621 rs1226262848
1:13623 rs1225501837
1:13634 rs1484860612
1:13646 rs1235553513
1:13648 rs1424506967
Position to rsID
GRCH38
Here, we have to be wary that more than a single rsID can map to a single position.
cat grch38.bed
chr1 13616 13616
chr1 13621 13621
chr1 13623 13623
chr1 13634 13634
chr1 13646 13646
chr1 13647 13647
sed 's/^chr//g' grch38.bed | cut -f1,2 | while read chr pos ;
do
rsid=$(curl -sX GET "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=snp&term=$pos%5BPOSITION%5D+AND+$chr%5BCHR%5D" | sed 's/<\//\n/g' | grep -e "<Id>" | cut -f2 -d">" | awk '{print "rs"$0}' | tr '\n' ',' | sed 's/,$//g') ;
echo -e "${chr}"":""${pos}""\t""${rsid}" ;
done ;
1:13616 rs1296488112
1:13621 rs1226262848
1:13623 rs1225501837
1:13634 rs1639595957,rs1484860612
1:13646 rs1235553513
1:13647 rs1424506967
GRCH37
cat grch37.bed
chr1 13616 13616
chr1 13621 13621
chr1 13623 13623
chr1 13634 13634
chr1 13646 13646
chr1 13648 13648
sed 's/^chr//g' grch38.bed | cut -f1,2 | while read chr pos ;
do
rsid=$(curl -sX GET "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=snp&term=$pos%5BPOSITION_GRCH37%5D+AND+$chr%5BCHR%5D" | sed 's/<\//\n/g' | grep -e "<Id>" | cut -f2 -d">" | awk '{print "rs"$0}' | tr '\n' ',' | sed 's/,$//g') ;
echo -e "${chr}"":""${pos}""\t""${rsid}" ;
done ;
1:13616 rs1296488112
1:13621 rs1226262848
1:13623 rs1225501837
1:13634 rs1639595957,rs1484860612
1:13646 rs1235553513
1:13648 rs1424506967
.
Automated genome build lift-over
For automated lift-over in R via rtracklayer, please try this:
# chain file for hg19 to hg38
download.file('http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz', 'hg19ToHg38.over.chain.gz')
R.utils::gunzip('hg19ToHg38.over.chain.gz')
# chain file for hg38 to hg19
download.file('http://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz', 'hg38ToHg19.over.chain.gz')
R.utils::gunzip('hg38ToHg19.over.chain.gz')
# import variant positions
positions <- read.table('grch37.bed', sep = '\t', header = FALSE, stringsAsFactors = FALSE)
# perform liftover, assuming hg19 to hg38
library(rtracklayer)
grObject <- GRanges(
seqnames = positions[,1],
ranges = IRanges(start = positions[,2], end = positions[,2]))
chainObject <- import.chain('hg19ToHg38.over.chain')
as.data.frame(liftOver(grObject, chainObject))[,c('seqnames','start','end','width')]
seqnames start end width
1 chr1 13616 13616 1
2 chr1 13621 13621 1
3 chr1 13623 13623 1
4 chr1 13634 13634 1
5 chr1 13646 13646 1
6 chr1 13647 13647 1
Gracias,
Kevin