Question: Retrieving phytozome data using the R bioconductor package biomaRt
gravatar for Fabio Marroni
5 months ago by
Fabio Marroni1.6k
Fabio Marroni1.6k wrote:

Hi, I am trying to use biomaRt to retrieve data for P. trichocarpa from phytozome. This is non standard, because the standard is via ensembl. However, the database has V2.2 of poplar annotation and I would like to work with V3.0 (which is e.g. on phytozome)

I issue the following commands (adpted from what I used to do for retrieving data from ensembl), and obtain the following output:

allattributes<-listAttributes(useDataset(dataset = as.character(mydataset), mart    = useMart("phytozome_mart",host = host)))
#Use a given dataset for analysis
myusemart <- useDataset(as.character(mydataset), mart = useMart("phytozome_mart", host = host))
resultTable <- getBM(attributes="organism_name", mart = myusemart)
> allattributes[1:10,]
                name       description
1      organism_name     Organism Name
2        organism_id       Proteome ID
3         gene_name1         Gene Name
4   gene_description       Description
5          chr_name1   Chromosome Name
6  gene_chrom_strand            Strand
7   gene_chrom_start   Gene Start (bp)
8     gene_chrom_end     Gene End (bp)
9      transcript_id PAC Transcript ID
10  transcript_name1   Transcript Name

> resultTable
2                                                                                       <html><head>
3                                                                           302 Found
4                                                                                      </head><body>



The document has moved>here.

7 </body></html>

So basically the list of the attributes is correctly defined, but when I try to do a query using the main querying function of biomaRt, I get that strange result, obviously meaning that there is something wrong. Did someone succeed in querying phytozome using biomaRt? Any help will be greatly appreciated!

ADD COMMENTlink modified 5 months ago by Mike Smith510 • written 5 months ago by Fabio Marroni1.6k
gravatar for Mike Smith
5 months ago by
Mike Smith510
EMBL Heidelberg / de.NBI
Mike Smith510 wrote:

Short answer is that I think for now you have to bypass some of the biomaRt functions, and create a Mart object yourself. So give this a try:


phytozomeMart <- new("Mart", 
                 biomart = "phytozome_mart",
                 vschema = "zome_mart", 
                 host = "")

The rest of your code should work using this object now, e.g.:

mysets <- listDatasets(phytozomeMart)
mydataset <- mysets$dataset[mysets$dataset == "phytozome"]
myusemart <- useDataset(as.character(mydataset), mart = phytozomeMart)

allattributes <- listAttributes(mart = myusemart)
resultTable <- getBM(attributes = "organism_name", mart = myusemart)

Checking the content:

> resultTable[1:5, ,drop = FALSE]
1 Smoellendorffii
2         Cpapaya
3       Rcommunis
4        Csativus
5       Vvinifera

As for why this is necessary, it looks like this instance of BioMart automatically redirects you to a https server when you run the query, and you need to access port 443. Although you can provide a port argument to useMart, it is currently overridden if there's a port specified in the registry on the server. For the mart, that registry is located at, and referencs port 80 throughout. Since biomaRt expects that document to be up-to-date it uses the values there and fails. This is obviously annoying, so I'll have a think about how best to deal with this situation.

ADD COMMENTlink written 5 months ago by Mike Smith510

Thank you very much. I am very happy with your patch!

ADD REPLYlink written 5 months ago by Fabio Marroni1.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 938 users visited in the last hour