Question: How Do I Use Biomart To Get Upstream Flanking Sequence For A Gene?
2
gravatar for Arturo_M
6.6 years ago by
Arturo_M70
Arturo_M70 wrote:

Hi. I'm trying to get the 100bp upstream sequences of some genes from A. gambiae. I'm using biomaRt and my query looks like this:

vector<-useMart("vectorbase_mart_13",dataset="agambiae_eg_gene")
agambiaeseq<-getBM(attributes=c('start_position','end_position','chromosome_name','strand','ensembl_gene_id','gene_flank','upstream_flank'),filters='ensembl_gene_id',value='AGAP004677', mart=vector)

I know that for the attribute 'upstream_flank' I should put the value 100 but I just don't know where.

Thank you for your attention.

biomart bioconductor • 3.6k views
ADD COMMENTlink modified 2.8 years ago by Biostar ♦♦ 20 • written 6.6 years ago by Arturo_M70

I changed the title of your question to be more specific (just "Biomart" is too generic for a forum with many BioMart questions) and formatted your question so the code appears more nicely (four spaces before each code line is all that is needed). Welcome to biostar and thanks for your question! I've made an attempt at answer below.

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Obi Griffith17k
8
gravatar for Obi Griffith
6.6 years ago by
Obi Griffith17k
Washington University, St Louis, USA
Obi Griffith17k wrote:

From the biomaRt documentation for 'getBM' it says: "Sometimes attributes where a value needs to be specified, for example upstream_flank with value 20 for obtaining upstream sequence flank regions of length 20bp, are treated as filters in BioMarts. To enable such a query to work, one must specify the attribute as a filter and set checkFilters = FALSE for the query to work." Also note that for the 'values' argument, "If multiple filters are specified then the argument should be a list of vectors of which the position of each vector corresponds to the position of the filters in the filters argument."

So, does this do what you are looking for?

library('biomaRt')
mart=useMart("vectorbase_mart_13",dataset="agambiae_eg_gene")
agambiaeseq=getBM(attributes=c('gene_flank','start_position','end_position','chromosome_name','strand','ensembl_gene_id'),filters=c('ensembl_gene_id','upstream_flank'),values=list(ENSG='AGAP004677', Upstream=100), mart=mart, checkFilters=FALSE)

The output looks like:

gene_flank start_position end_position chromosome_name strand ensembl_gene_id
ATCTCAAAATGGCAACATGTCAAACGCTAAGAAGACACCTCTTCTATATTCCACCTTGATTTGAACGGTAACATTCAGTAGTCCGTGGCTTTCGGATTAT         157348       186936              2L     -1      AGAP004677

It seems to correspond to what I imagine your query might look like at the VectorBase Biomart web interface.

ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by Obi Griffith17k

You have resolved my problem, thanks a lot!

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Arturo_M70
2

Great. Glad to help. If you find the forum useful, please stick around, contribute more good questions, answers ... and vote! ;-)

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Obi Griffith17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 939 users visited in the last hour