Efficient means of mass converting Affy SNP IDs to dbSNP rs ids
2
1
Entering edit mode
6.4 years ago
devenvyas ▴ 680

I am trying to convert a map file for some SNP data I want to use from Affy ids to dbSNP rs ids (ftp://ftp.cephb.fr/hgdp_supp10/Harvard_HGDP-CEPH/, specifically ftp://ftp.cephb.fr/hgdp_supp10/Harvard_HGDP-CEPH/all_snp.map.gz).

I am trying to find an effective way to this. I have the annotation file for the Axiom Human Origins array from which the data comes from, so I know the proper ids.

I was wondering if anyone could suggest a good bash/Python/Perl based method to this. The idea I had in mind was the

sed -i 's/*Affy#*/*rs#*/g' *filename*

method, but I figure this would not be the most efficient as it would be >600,000 replacements. Any suggestions? Thanks!

affy dbSNP Python Perl Bash • 2.4k views
0
Entering edit mode
6.4 years ago
dylan.storey ▴ 60

if you can use a command line tool it will likely be the more efficient than relying on perl.

I'd suggest you pipe your output instead of doing an inline replace just to ensure you have an un-tainted file to come back to in the event of a mistake.

    --   Dylan B. Storey

<h6></h6>

Post Doctoral Researcher University of California, Davis 1089 Veterinary Medicine Drive 4016 VM3B Davis Ca 95616 1-(714)-425-0620

<h6></h6>

"There is a single light of science, and to brighten it anywhere is to brighten it everywhere."- Isaac Asimov

0
Entering edit mode

Any suggestions on what kind of command line tool to use? I am new to this stuff. I have previously been told to use a scripting language like Perl/Python, but I am not sure how to implement it.

0
Entering edit mode

It sounded like you had a sed solution.

0
Entering edit mode

yea, but that take >600,000 lines of code and has to re-read the entire file for every single line.

0
Entering edit mode

why do you think it would re read the file every time? Sed works on a stream. You literally posted the exact solution you needed as a one line sed solution. Am I missing something about the replacement pattern ?

0
Entering edit mode

There are >600,000 different replacements to do

E.g.

Affx-23483877 changes to rs10000011
Affx-21814892 changes to rs1000002
Affx-25184125 changes to rs10000023
Affx-23002973 changes to rs1000003
Affx-23821302 changes to rs10000041
Affx-23694978 changes to rs10000046
Affx-23977073 changes to rs10000057
Affx-24508350 changes to rs10000073
Affx-24214680 changes to rs10000092

and so on...

0
Entering edit mode

If you already have the one to one mappings in a file. Build a hash (or dict) in perl (python) by reading in and parsing previously mentioned file. (maybe 5 lines) .

Then read your other file in and do look ups and replacements.

    --   Dylan B. Storey

<h6></h6>

Post Doctoral Researcher University of California, Davis 1089 Veterinary Medicine Drive 4016 VM3B Davis Ca 95616 1-(714)-425-0620

<h6></h6>

"There is a single light of science, and to brighten it anywhere is to brighten it everywhere."- Isaac Asimov

0
Entering edit mode
6.0 years ago
stolarek.ir ▴ 670

Posting if anyone might have similar problem.

This is easily done with plink update