Question

get gene name from rsID

0

Entering edit mode

4 months ago

a3532321 • 0

I've got a list of rs IDs in xlsx format. I need to get the gene name for each rsID. When I use this command, I get the gene name

esearch -db snp -query "rs573455" | esummary | xtract -pattern GENE_E -element NAME | sort | uniq
CEP164

But when I use the code, the result is only found for some rsIDs. Why is this happening?

import subprocess

rsIDs = [
    "rs573455",
    "rs7215121",
    "rs2873296",
    "rs6672420",
    "rs6664445"
]

for rsID in rsIDs:
    query = f"esearch -db snp -query {rsID} | esummary | xtract -pattern GENE_E -element NAME | sort | uniq"
    result = subprocess.run(query, shell=True, stdout=subprocess.PIPE, text=True)

    print(f"{rsID}:")
    print(result.stdout)

Output:

rs573455: 
rs7215121:
rs2873296:
rs6672420: RUNX3 RUNX3-AS1
rs6664445: SPOCD1

dbSNP • 431 views

ADD COMMENT • link updated 4 months ago by Ram 43k • written 4 months ago by a3532321 • 0

score 2 · Accepted Answer · 2023-12-07

2

Entering edit mode

4 months ago

Ram 43k

Not all variants fall in coding regions. rs2873296 for example is NC_000001.10:g.21721038A>G, which is in a non-coding region.

ADD COMMENT • link 4 months ago by Ram 43k