I'm wondering how to determine what dbSNP build (i.e. 150 or 151) the variation annotations are based on. For example, I can find that for GRCh37, this is based on Ensembl Variation 94. The SNP attributes state that the source is dbSNP but not what build:
grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice") listMarts(grch37) biomart version 1 ENSEMBL_MART_ENSEMBL Ensembl Genes 94 2 ENSEMBL_MART_SNP Ensembl Variation 94 3 ENSEMBL_MART_FUNCGEN Ensembl Regulation 94 variation = useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org", path="/biomart/martservice") listDatasets(variation)[12,] dataset 12 hsapiens_snp description 12 Human Short Variants (SNPs and indels excluding flagged variants) (GRCh37.p13) version 12 GRCh37.p13 snps = useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org", path="/biomart/martservice",dataset="hsapiens_snp") getBM(attributes=c('refsnp_id','refsnp_source',"refsnp_source_description" ,'chr_name','chrom_start','chrom_end','minor_allele','minor_allele_freq','minor_allele_count','consequence_allele_string'), filters = 'snp_filter', values ="rs123", mart = snps) refsnp_id refsnp_source 1 rs123 dbSNP refsnp_source_description chr_name chrom_start 1 Variants (including SNPs and indels) imported from dbSNP 7 24966446 chrom_end minor_allele minor_allele_freq minor_allele_count 1 24966446 C 0.292133 1463 consequence_allele_string 1 C/A
1) Is this information available programtically through biomaRt? 2) If not, is http://grch37.ensembl.org/info/genome/variation/species/sources_documentation.html the best place to find this information? It unfortunately looks like the documentation link on this page (http://grch37.ensembl.org/info/genome/variation/prediction/sources_phenotype_documentation.html) is currently broken.
An additional source of confusion: searching for individual SNPs on the GRCh37 ENSEMBL website (i.e. http://grch37.ensembl.org/Homo_sapiens/Variation/Explore?r=7:24965946-24966946;v=rs123;vdb=variation;vf=119) gives information from dbSNP build 150: "Original source: Variants (including SNPs and indels) imported from dbSNP (release 150)" despite "Ensembl GRCh37 release 94" stated at the bottom of the page. So perhaps we can't assume that http://grch37.ensembl.org/info/genome/variation/species/sources_documentation.html is true for all Ensembl 94/GRCh37 data?