Question: How to use R biomaRt to get flanking sequence for a batch of SNPs
0
gravatar for woofung
23 months ago by
woofung0
woofung0 wrote:

Hello all,

I had a hard time trying to find a way to retrieve flanking sequence for a batch of SNPs. Here is the code I tried:

ensembl.snp = useEnsembl(biomart="snp", dataset="hsapiens_snp",GRCh=38)
s <- c('snp_filter'='rs429358', 'rs429337')
snp <- getBM(attributes=c('refsnp_id'
                      ,'minor_allele_freq'
                      ,'snp'
                      ,'ensembl_peptide_allele'
                      , 'allele'
                      , 'chr_name'
                      ,'mapweight'),
                filters = 'snp_filter'
              ,value=s

                , mart = ensembl.snp)
snp

And it returns like this: minor_allele_freq snp refsnp_id ensembl_peptide_allele allele chr_name mapweight 1 0.281550 %T/G% rs429337 T/G 8 1 2 0.150559 %T/C% rs429358 C/R T/C 19 1

But if I add "upstream_flank" to the attributes, like

snp <- getBM(attributes=c('refsnp_id'
                      ,'minor_allele_freq'
                      ,'snp'
                      ,'ensembl_peptide_allele'
                      , 'allele'
                      , 'chr_name'
                      ,'mapweight', 'upstream_flank'),
                filters = 'snp_filter'
              ,value=s

                , mart = ensembl.snp)

it always return this :

Error in getBM(attributes = c("refsnp_id", "minor_allele_freq", "snp",  : The query to the BioMart webservice returned an invalid result: the number of columns in the result table does not equal the number of attributes in the query. Please report this to the mailing list.

Anyone know how to solve this? Thanks

snp R software error • 799 views
ADD COMMENTlink modified 23 months ago by Jean-Karim Heriche21k • written 23 months ago by woofung0

I think that in getBM(), sequences have to be treated as filters, alternatively, you could use the getSequence() function. Also did you do as asked in the error message and report it to the biomaRt mailing list ?

ADD REPLYlink written 23 months ago by Jean-Karim Heriche21k

I don't know how to. But by google, I found a post that some people had the same problem reported 4 years ago but without a solution.

ADD REPLYlink written 23 months ago by woofung0
0
gravatar for Jean-Karim Heriche
23 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

I think using filters for flanking sequence with getBM goes like this (untested):

snp <- getBM(attributes=c('refsnp_id', 
'minor_allele_freq' ,
'snp',
'ensembl_peptide_allele', 
'allele', 
'chr_name', 
'mapweight'),
filters = c('snp_filter', 'upstream_flank'), 
value= list(s, 1000), 
mart = ensembl.snp)

The reason is that you need a way to specify the amount of sequence you need.

ADD COMMENTlink modified 23 months ago • written 23 months ago by Jean-Karim Heriche21k

It doesn't work because "upstream_flank" can only be found in attributes list but not as a filter.

ADD REPLYlink written 23 months ago by woofung0

Actually, the question has already been answered on Biostars here. It seems I was on the right track but what's additionally required is checkFilters=FALSE.

ADD REPLYlink written 23 months ago by Jean-Karim Heriche21k

I use listFilters() to look for the filters in ENSEMBL_MART_SNP database, so it is different from the post you gave.. There is no any related filters for flanking sequence. Even though, I still tried. But as expected, R finds no upstream_flank filter.

I also tried checkFilters=FALSE before, it didn't help.

Here is the code I tested.

snp.dataset = useMart("ENSEMBL_MART_SNP",dataset="hsapiens_snp") #select dataset

d <- c('rs429358', 'rs429337') snp <- getBM(attributes=c('refsnp_id' ,'minor_allele_freq' ,'snp' ,'ensembl_peptide_allele' , 'allele' , 'chr_name' ) ,filters=c('snp_filter', 'upstream_flank') ,value=(d, 20) , mart = snp.dataset, checkFilters=F) snp

ADD REPLYlink written 23 months ago by woofung0
snp.dataset = useMart("ENSEMBL_MART_SNP",dataset="hsapiens_snp") #select dataset
d <- c('rs429358', 'rs429337') 
snp <- getBM(attributes=c('refsnp_id'
                      ,'minor_allele_freq'
                      ,'snp'
                      ,'ensembl_peptide_allele'
                      , 'allele'
                      , 'chr_name'
                      )
                ,filters=c('snp_filter', 'upstream_flank')
                ,value=(d, 20)
                , mart = snp.dataset, checkFilters=F)
snp
ADD REPLYlink written 23 months ago by woofung0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 796 users visited in the last hour