Question: Efficient means of mass converting Affy SNP IDs to dbSNP rs ids
1
gravatar for devenvyas
3.8 years ago by
devenvyas570
Stony Brook
devenvyas570 wrote:

I am trying to convert a map file for some SNP data I want to use from Affy ids to dbSNP rs ids (ftp://ftp.cephb.fr/hgdp_supp10/Harvard_HGDP-CEPH/, specifically ftp://ftp.cephb.fr/hgdp_supp10/Harvard_HGDP-CEPH/all_snp.map.gz).

I am trying to find an effective way to this. I have the annotation file for the Axiom Human Origins array from which the data comes from, so I know the proper ids.

I was wondering if anyone could suggest a good bash/Python/Perl based method to this. The idea I had in mind was the

sed -i 's/*Affy#*/*rs#*/g' *filename*

method, but I figure this would not be the most efficient as it would be >600,000 replacements. Any suggestions? Thanks!

bash dbsnp affy python perl • 1.5k views
ADD COMMENTlink modified 3.4 years ago by stolarek.ir580 • written 3.8 years ago by devenvyas570
0
gravatar for dylan.storey
3.8 years ago by
dylan.storey60
United States
dylan.storey60 wrote:

if you can use a command line tool it will likely be the more efficient than relying on perl.

I'd suggest you pipe your output instead of doing an inline replace just to ensure you have an un-tainted file to come back to in the event of a mistake.

    --   Dylan B. Storey
<h6></h6>

Post Doctoral Researcher University of California, Davis 1089 Veterinary Medicine Drive 4016 VM3B Davis Ca 95616 1-(714)-425-0620

<h6></h6>

"There is a single light of science, and to brighten it anywhere is to brighten it everywhere."- Isaac Asimov

ADD COMMENTlink written 3.8 years ago by dylan.storey60

Any suggestions on what kind of command line tool to use? I am new to this stuff. I have previously been told to use a scripting language like Perl/Python, but I am not sure how to implement it.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by devenvyas570

It sounded like you had a sed solution.

ADD REPLYlink written 3.8 years ago by dylan.storey60

yea, but that take >600,000 lines of code and has to re-read the entire file for every single line.

ADD REPLYlink written 3.8 years ago by devenvyas570

why do you think it would re read the file every time? Sed works on a stream. You literally posted the exact solution you needed as a one line sed solution. Am I missing something about the replacement pattern ? 

ADD REPLYlink written 3.8 years ago by dylan.storey60

There are >600,000 different replacements to do

E.g.

Affx-23483877 changes to rs10000011
Affx-21814892 changes to rs1000002
Affx-25184125 changes to rs10000023
Affx-23002973 changes to rs1000003
Affx-23821302 changes to rs10000041
Affx-23694978 changes to rs10000046
Affx-23977073 changes to rs10000057
Affx-24508350 changes to rs10000073
Affx-24214680 changes to rs10000092

and so on...

 

ADD REPLYlink written 3.8 years ago by devenvyas570

If you already have the one to one mappings in a file. Build a hash (or dict) in perl (python) by reading in and parsing previously mentioned file. (maybe 5 lines) .

Then read your other file in and do look ups and replacements.

    --   Dylan B. Storey
<h6></h6>

Post Doctoral Researcher University of California, Davis 1089 Veterinary Medicine Drive 4016 VM3B Davis Ca 95616 1-(714)-425-0620

<h6></h6>

"There is a single light of science, and to brighten it anywhere is to brighten it everywhere."- Isaac Asimov

ADD REPLYlink written 3.8 years ago by dylan.storey60
0
gravatar for stolarek.ir
3.4 years ago by
stolarek.ir580
Poland
stolarek.ir580 wrote:

Posting if anyone might have similar problem.

This is easily done with plink update

ADD COMMENTlink written 3.4 years ago by stolarek.ir580
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1876 users visited in the last hour