convert 9million Marker Names to rsid
2
1
Entering edit mode
4.3 years ago
mk19726 ▴ 40

I have been sent GWAS summary data that I want to use for Mendelian Randomization studies. However it only contains marker name (chr:pos) rather than SNP rsid. There are 9million observations.

Is there an easy/quick way to convert them to rsid?

I am a first year PHD student with a non-programming background, so fairly simple explanations greatly appreciated!

Thank you very much for your help, Daniel

SNP R • 2.2k views
ADD COMMENT
0
Entering edit mode

Does it also have alleles?

ADD REPLY
0
Entering edit mode

yes, it does have effect and other allele

ADD REPLY
0
Entering edit mode
4.3 years ago

extract CHROM:POS from dbsnp, convert to CHROM:POS,RSID , sort on CHROM:POS (time consumming)...

wget -O - "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz" | gunzip -c | awk -F '\t' '/[^#]/ {printf("%s:%s,%s\n",$1,$2,$3)}' | LC_ALL=C sort -T . -t, -k1,1 > dbsnp.csv

sort your data on CHROM:POS. Assuming the chrom notation is the same as dbsnp ('1' not 'chr1'), assuming the genome build is the same as dbsnp.

LC_ALL=C sort -T . -t, -k1,1 yourlist.txt > sorted.csv

join both list

LC_ALL=C join -t, -1 1 -2 1 dbsnp.csv sorted.csv > output.txt
ADD COMMENT
0
Entering edit mode
4.3 years ago
Emily 23k

Easiest thing is to stick the variants into the Ensembl VEP. It will calculate the consequences on genes and give you rsIDs of known variants at the loci.

ADD COMMENT

Login before adding your answer.

Traffic: 1766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6