Question: How to use R biomaRt to get flanking sequence for a batch of SNPs
0
gravatar for woofung
2.7 years ago by
woofung0
woofung0 wrote:

Hello all,

I had a hard time trying to find a way to retrieve flanking sequence for a batch of SNPs. Here is the code I tried:

ensembl.snp = useEnsembl(biomart="snp", dataset="hsapiens_snp",GRCh=38)
s <- c('snp_filter'='rs429358', 'rs429337')
snp <- getBM(attributes=c('refsnp_id'
                      ,'minor_allele_freq'
                      ,'snp'
                      ,'ensembl_peptide_allele'
                      , 'allele'
                      , 'chr_name'
                      ,'mapweight'),
                filters = 'snp_filter'
              ,value=s

                , mart = ensembl.snp)
snp

And it returns like this: minor_allele_freq snp refsnp_id ensembl_peptide_allele allele chr_name mapweight 1 0.281550 %T/G% rs429337 T/G 8 1 2 0.150559 %T/C% rs429358 C/R T/C 19 1

But if I add "upstream_flank" to the attributes, like

snp <- getBM(attributes=c('refsnp_id'
                      ,'minor_allele_freq'
                      ,'snp'
                      ,'ensembl_peptide_allele'
                      , 'allele'
                      , 'chr_name'
                      ,'mapweight', 'upstream_flank'),
                filters = 'snp_filter'
              ,value=s

                , mart = ensembl.snp)

it always return this :

Error in getBM(attributes = c("refsnp_id", "minor_allele_freq", "snp",  : The query to the BioMart webservice returned an invalid result: the number of columns in the result table does not equal the number of attributes in the query. Please report this to the mailing list.

Anyone know how to solve this? Thanks

snp R software error • 1.0k views
ADD COMMENTlink modified 2.7 years ago by Jean-Karim Heriche23k • written 2.7 years ago by woofung0

I think that in getBM(), sequences have to be treated as filters, alternatively, you could use the getSequence() function. Also did you do as asked in the error message and report it to the biomaRt mailing list ?

ADD REPLYlink written 2.7 years ago by Jean-Karim Heriche23k

I don't know how to. But by google, I found a post that some people had the same problem reported 4 years ago but without a solution.

ADD REPLYlink written 2.7 years ago by woofung0
0
gravatar for Jean-Karim Heriche
2.7 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

I think using filters for flanking sequence with getBM goes like this (untested):

snp <- getBM(attributes=c('refsnp_id', 
'minor_allele_freq' ,
'snp',
'ensembl_peptide_allele', 
'allele', 
'chr_name', 
'mapweight'),
filters = c('snp_filter', 'upstream_flank'), 
value= list(s, 1000), 
mart = ensembl.snp)

The reason is that you need a way to specify the amount of sequence you need.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Jean-Karim Heriche23k

It doesn't work because "upstream_flank" can only be found in attributes list but not as a filter.

ADD REPLYlink written 2.7 years ago by woofung0

Actually, the question has already been answered on Biostars here. It seems I was on the right track but what's additionally required is checkFilters=FALSE.

ADD REPLYlink written 2.7 years ago by Jean-Karim Heriche23k

I use listFilters() to look for the filters in ENSEMBL_MART_SNP database, so it is different from the post you gave.. There is no any related filters for flanking sequence. Even though, I still tried. But as expected, R finds no upstream_flank filter.

I also tried checkFilters=FALSE before, it didn't help.

Here is the code I tested.

snp.dataset = useMart("ENSEMBL_MART_SNP",dataset="hsapiens_snp") #select dataset

d <- c('rs429358', 'rs429337') snp <- getBM(attributes=c('refsnp_id' ,'minor_allele_freq' ,'snp' ,'ensembl_peptide_allele' , 'allele' , 'chr_name' ) ,filters=c('snp_filter', 'upstream_flank') ,value=(d, 20) , mart = snp.dataset, checkFilters=F) snp

ADD REPLYlink written 2.7 years ago by woofung0
snp.dataset = useMart("ENSEMBL_MART_SNP",dataset="hsapiens_snp") #select dataset
d <- c('rs429358', 'rs429337') 
snp <- getBM(attributes=c('refsnp_id'
                      ,'minor_allele_freq'
                      ,'snp'
                      ,'ensembl_peptide_allele'
                      , 'allele'
                      , 'chr_name'
                      )
                ,filters=c('snp_filter', 'upstream_flank')
                ,value=(d, 20)
                , mart = snp.dataset, checkFilters=F)
snp
ADD REPLYlink written 2.7 years ago by woofung0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1306 users visited in the last hour