Question: How I can know what coordinates my snp data?
0
gravatar for jiumeng66
20 months ago by
jiumeng6630
jiumeng6630 wrote:

Hi, I am new to GWAS. I have some plink files (bed/bim/fam) and plan to do imputation using "Michigan Imputation Server". When I prepare my data, I find that "GRCh37 coordinates are required". In my plink files, I can get the snp rs numbers and their chromosome and position. What can I do to make sure the coordinates of my plink files? Thank you very much.

snp assembly • 866 views
ADD COMMENTlink modified 20 months ago by Alex Reynolds29k • written 20 months ago by jiumeng6630
1
gravatar for Kevin Blighe
20 months ago by
Kevin Blighe50k
Kevin Blighe50k wrote:

You can do this manually by searching for different SNPs (in your data) via their 'rs' ID at:

Once you corroborate the position, then you'll know your genome reference build version.

As an example, randomly, for rs1067, the co-ordinates are:

  • GRCh38, chr3:122414118
  • GRCh37, chr3:122132965

[source: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1067]

Kevin

ADD COMMENTlink written 20 months ago by Kevin Blighe50k
1

Thank you so much. Your advice is very useful.

ADD REPLYlink written 20 months ago by jiumeng6630
1
gravatar for Alex Reynolds
20 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

If you want to do things locally, you can make a searchable resource you can query as you like.

1) Get SNPs and write them into a text file sorted by SNP ID.

For hg38, for instance:

$ LC_ALL=C
$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/snp147.txt.gz \
   | gunzip -c \
   | awk -v OFS="\t" '{ print $5,$2,$3,($3+1) }' \
   | sort -k1,1 \
   > hg38.snp147.sortedByName.txt

2) Install pts-line-bisect:

$ git clone https://github.com/pts/pts-line-bisect.git
$ cd pts-line-bisect && make && cd ..

3) Run a query:

$ rs_of_interest=rs2814778
$ ./pts-line-bisect/pts_lbsearch -p hg38.snp147.sortedByName.txt ${rs_of_interest} \
   | head -1 \
   | cut -f2-

Step 3 can be put into a script so that you can re-run it with your SNP of interest on demand.

For instance:

#!/bin/bash
pts_lbsearch_bin=./pts-line-bisect/pts_lbsearch
sorted_snp_txt=hg38.snp147.sortedByName.txt
${pts_lbsearch_bin} -p ${sorted_snp_txt} $1 | head -1 | cut -f2-

Then:

$ ./search_snps.sh rs2814778
...
ADD COMMENTlink written 20 months ago by Alex Reynolds29k

Thank you so much. It's very kind of you to reply me in such a detailed way.

ADD REPLYlink written 20 months ago by jiumeng6630

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 18 months ago by Pierre Lindenbaum123k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 3572 users visited in the last hour