Question: How Do I Use Biomart To Get Upstream Flanking Sequence For A Gene?
gravatar for Arturo_M
7.5 years ago by
Arturo_M70 wrote:

Hi. I'm trying to get the 100bp upstream sequences of some genes from A. gambiae. I'm using biomaRt and my query looks like this:

agambiaeseq<-getBM(attributes=c('start_position','end_position','chromosome_name','strand','ensembl_gene_id','gene_flank','upstream_flank'),filters='ensembl_gene_id',value='AGAP004677', mart=vector)

I know that for the attribute 'upstream_flank' I should put the value 100 but I just don't know where.

Thank you for your attention.

biomart bioconductor • 4.0k views
ADD COMMENTlink modified 3.6 years ago by Biostar ♦♦ 20 • written 7.5 years ago by Arturo_M70

I changed the title of your question to be more specific (just "Biomart" is too generic for a forum with many BioMart questions) and formatted your question so the code appears more nicely (four spaces before each code line is all that is needed). Welcome to biostar and thanks for your question! I've made an attempt at answer below.

ADD REPLYlink modified 7.5 years ago • written 7.5 years ago by Obi Griffith18k
gravatar for Obi Griffith
7.5 years ago by
Obi Griffith18k
Washington University, St Louis, USA
Obi Griffith18k wrote:

From the biomaRt documentation for 'getBM' it says: "Sometimes attributes where a value needs to be specified, for example upstream_flank with value 20 for obtaining upstream sequence flank regions of length 20bp, are treated as filters in BioMarts. To enable such a query to work, one must specify the attribute as a filter and set checkFilters = FALSE for the query to work." Also note that for the 'values' argument, "If multiple filters are specified then the argument should be a list of vectors of which the position of each vector corresponds to the position of the filters in the filters argument."

So, does this do what you are looking for?

agambiaeseq=getBM(attributes=c('gene_flank','start_position','end_position','chromosome_name','strand','ensembl_gene_id'),filters=c('ensembl_gene_id','upstream_flank'),values=list(ENSG='AGAP004677', Upstream=100), mart=mart, checkFilters=FALSE)

The output looks like:

gene_flank start_position end_position chromosome_name strand ensembl_gene_id

It seems to correspond to what I imagine your query might look like at the VectorBase Biomart web interface.

ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by Obi Griffith18k

You have resolved my problem, thanks a lot!

ADD REPLYlink modified 7.5 years ago • written 7.5 years ago by Arturo_M70

Great. Glad to help. If you find the forum useful, please stick around, contribute more good questions, answers ... and vote! ;-)

ADD REPLYlink modified 7.5 years ago • written 7.5 years ago by Obi Griffith18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1210 users visited in the last hour