Entering edit mode
3.5 years ago
f-rasmussen
▴
10
I have a list of 500.000 SNPs from which I want to obtain the gene name. I try to search with biomaRt
library(data.table)
library(biomaRt)
rs <- fread("SNPs.txt")
ensembl_version = "https://dec2016.archive.ensembl.org"
ensembl <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
getBM(attributes=c("refsnp_id", "associated_gene"), filters="snp_filter", values=rs, mart=ensembl, uniqueRows=TRUE)
However many of the SNPs return NA
or simply nothing. Show here:
refsnp_id associated_gene
1 rs425277 PRKCZ
2 rs1571149
3 rs1240707
4 rs1240708
5 rs873927
6 rs880051 SSU72
7 rs904589
8 rs908742
9 rs909823
10 rs925905
11 rs7290
12 rs7407
13 rs1878745
14 rs2296716 SSU72
15 rs2298217
16 rs2459994
When I search some of the rsIDs which did not produce a gene name on dbSNP, they are in fact associated with a gene name in the database. My question is then, how can I connect biomaRt to dbSNP and retrieve the correct gene names for all the SNPs in the list 'SNPs.txt'?
Hi Alex! Thanks for this information. I have a list of SNPs in a .csv file along with other columns. Can you please tell me how I can supply those SNPs in the csv file in place of the following format?
variants = [ 'rs425277', 'rs1571149', 'rs1240707', 'rs1240708', 'rs873927', 'rs880051', 'rs1878745', 'rs2296716', 'rs2298217', 'rs2459994' ]
Thanks in advance
The exact code will depend on your CSV file:
Thanks Alex
Hi Alex, I had used this methods to get gene names earlier. But when I tried it again, I'm getting the following error:
Can you please suggest how I can address this issue? TIA
The
KeyError
you are getting is telling you there is no column (or "key") calledSNP
in the dataframeSNPs
.You may need to print out the list of column headers to find out what columns you do have, perhaps via
print(SNPs.head())
or similar.Thank you so much, Alex! That's helpful