So I'm attempting to map positions to rsids that I have in a datatable. Below is the code I use:.. Where wt is my weights table with rsid I need positions for, rsLst is a list of rsids for wt broken into chunks of 500 and psLst is the new datatable with positions mapped.
grch37 = useMart(biomart="ENSEMBL_MART_SNP",
host = "grch37.ensembl.org",
path = "/biomart/martservice",
dataset = "hsapiens_snp")
pullPos <- function(x) {d <- getBM(attributes = c('refsnp_id', 'chr_name', 'chrom_start','chrom_end'),
filters = "snp_filter",
values = x, mart = grch37)
as.data.table(d)
}
wt <- readRDS("combinedWeightFiles.rds")
rsLst <- split(wt[ , unique(rsid)], ceiling(seq_along(wt[ , unique(rsid)])/500))
psLst <- mclapply(rsLst, pullPos, mc.cores = 32)
However when I perform a check on my files with:
psLst[[1]][chr_name %in% as.character(1:23)]
Instead of a table with 500 rsids theres only 497. So 3 are missing.
c("rs12774134", "rs4531365", "rs11819128")
I go check ensembl and the rsids exist with their positions rs12774134, rs4531365, rs11819128
I also tried using gprofiler2 with the following code:
library(grofiler2)
gsnpense("rs12774134", filter_na = TRUE)
[1] rs_id chromosome start end strand ensgs gene_names
[8] variants
<0 rows> (or 0-length row.names)
Yet when I go to gprofiler's site and search the rsid it will give me the link to ensembl that is linked above. Any ideas why these rsids aren't being mapped properly with either tool?
and yes I am supposed to be using grch37.
I did but I'm new to using biomart and was didn't know that flag would cause them to not map.
Is there a way of ignoring the flags and still pull the positions?
The flagged variants are not stored in the BioMart database, so not with BioMart.