Question: How I can know what coordinates my snp data?
0
gravatar for jiumeng66
2.6 years ago by
jiumeng6630
jiumeng6630 wrote:

Hi, I am new to GWAS. I have some plink files (bed/bim/fam) and plan to do imputation using "Michigan Imputation Server". When I prepare my data, I find that "GRCh37 coordinates are required". In my plink files, I can get the snp rs numbers and their chromosome and position. What can I do to make sure the coordinates of my plink files? Thank you very much.

snp assembly • 1.2k views
ADD COMMENTlink modified 2.6 years ago by Alex Reynolds30k • written 2.6 years ago by jiumeng6630
1
gravatar for Kevin Blighe
2.6 years ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

You can do this manually by searching for different SNPs (in your data) via their 'rs' ID at:

Once you corroborate the position, then you'll know your genome reference build version.

As an example, randomly, for rs1067, the co-ordinates are:

  • GRCh38, chr3:122414118
  • GRCh37, chr3:122132965

[source: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1067]

Kevin

ADD COMMENTlink written 2.6 years ago by Kevin Blighe65k
1

Thank you so much. Your advice is very useful.

ADD REPLYlink written 2.6 years ago by jiumeng6630
1
gravatar for Alex Reynolds
2.6 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

If you want to do things locally, you can make a searchable resource you can query as you like.

1) Get SNPs and write them into a text file sorted by SNP ID.

For hg38, for instance:

$ LC_ALL=C
$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/snp147.txt.gz \
   | gunzip -c \
   | awk -v OFS="\t" '{ print $5,$2,$3,($3+1) }' \
   | sort -k1,1 \
   > hg38.snp147.sortedByName.txt

2) Install pts-line-bisect:

$ git clone https://github.com/pts/pts-line-bisect.git
$ cd pts-line-bisect && make && cd ..

3) Run a query:

$ rs_of_interest=rs2814778
$ ./pts-line-bisect/pts_lbsearch -p hg38.snp147.sortedByName.txt ${rs_of_interest} \
   | head -1 \
   | cut -f2-

Step 3 can be put into a script so that you can re-run it with your SNP of interest on demand.

For instance:

#!/bin/bash
pts_lbsearch_bin=./pts-line-bisect/pts_lbsearch
sorted_snp_txt=hg38.snp147.sortedByName.txt
${pts_lbsearch_bin} -p ${sorted_snp_txt} $1 | head -1 | cut -f2-

Then:

$ ./search_snps.sh rs2814778
...
ADD COMMENTlink written 2.6 years ago by Alex Reynolds30k

Thank you so much. It's very kind of you to reply me in such a detailed way.

ADD REPLYlink written 2.6 years ago by jiumeng6630

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 2.5 years ago by Pierre Lindenbaum130k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1663 users visited in the last hour