Retrieve initial query in getBM()
1
0
Entering edit mode
3.8 years ago

Hello,

I am writing a piece of code to retrieve some data using biomaRt from a given region. I have several regions in a list. I wonder if there is any way to get the initial "chromosomal_region" in the attributes. In the other words, I wanted to know each result is associated with what query.

Here is my command:

results=getBM(
  attributes = c("hgnc_symbol","ensembl_gene_id", "chromosome_name", "start_position", "end_position","gene_biotype"),
  filters = c("chromosomal_region"),
  values = list(chromosomal_region=unlist(as.list(data$query))),
  mart = ensemble)

and I want to have the chromosomal region in the output results.

getBM biomaRt R gene • 989 views
ADD COMMENT
2
Entering edit mode

If it's not present in the "attribute" list you cannot include it. But you can add it later with the function findOverlaps from IRanges Bioconductor package.

ADD REPLY
1
Entering edit mode
3.8 years ago
loughrae ▴ 90

I don’t think that’s possible, probably because the chromosomal region is custom and not stored in the database as one of the gene attributes. You’d need to get it to return your filters, which I don’t think it does.

I deal with this by left-joining my table of chromosomal regions to the results using bedtools:

bedtools intersect -loj -a regions.bed -b ensemblgenes.bed > joined.bed

You’ll need to first convert your regions and results to bed format and save them as tab-delimited files before running this on the command line. You could alternatively use IRanges as @lessismore suggested, I just find bedtools more intuitive.

ADD COMMENT

Login before adding your answer.

Traffic: 2313 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6