Retrieving phytozome data using the R bioconductor package biomaRt
1
0
Entering edit mode
5.5 years ago
Fabio Marroni ★ 3.0k

Hi, I am trying to use biomaRt to retrieve data for P. trichocarpa from phytozome. This is non standard, because the standard is via ensembl. However, the plant.ensembl.org database has V2.2 of poplar annotation and I would like to work with V3.0 (which is e.g. on phytozome)

I issue the following commands (adpted from what I used to do for retrieving data from ensembl), and obtain the following output:

host="phytozome.jgi.doe.gov"
myset="phytozome"
library(biomaRt)
mylist<-listMarts(host=host)
mysets<-listDatasets(useMart("phytozome_mart",host=host))
mydataset<-mysets$dataset[mysets$dataset==myset]
allattributes<-listAttributes(useDataset(dataset = as.character(mydataset), mart    = useMart("phytozome_mart",host = host)))
#Use a given dataset for analysis
myusemart <- useDataset(as.character(mydataset), mart = useMart("phytozome_mart", host = host))
resultTable <- getBM(attributes="organism_name", mart = myusemart)
> allattributes[1:10,]
name       description
1      organism_name     Organism Name
2        organism_id       Proteome ID
3         gene_name1         Gene Name
4   gene_description       Description
5          chr_name1   Chromosome Name
6  gene_chrom_strand            Strand
7   gene_chrom_start   Gene Start (bp)
8     gene_chrom_end     Gene End (bp)
9      transcript_id PAC Transcript ID
10  transcript_name1   Transcript Name

> resultTable
organism_name
1
3                                                                           302 Found
5                                                                                     Found
6 The document has moved https://phytozome.jgi.doe.gov/biomart/martservice?>here.
7                                                                                     </body></html>


So basically the list of the attributes is correctly defined, but when I try to do a query using the main querying function of biomaRt, I get that strange result, obviously meaning that there is something wrong. Did someone succeed in querying phytozome using biomaRt? Any help will be greatly appreciated!

R bioconductor phytozome biomart biomaRt • 3.3k views
4
Entering edit mode
5.5 years ago
Mike Smith ★ 2.0k

Short answer is that I think for now you have to bypass some of the biomaRt functions, and create a Mart object yourself. So give this a try:

library(biomaRt)

phytozomeMart <- new("Mart",
biomart = "phytozome_mart",
vschema = "zome_mart",
host = "https://phytozome.jgi.doe.gov:443/biomart/martservice")


The rest of your code should work using this object now, e.g.:

mysets <- listDatasets(phytozomeMart)
mydataset <- mysets$dataset[mysets$dataset == "phytozome"]
myusemart <- useDataset(as.character(mydataset), mart = phytozomeMart)

allattributes <- listAttributes(mart = myusemart)
resultTable <- getBM(attributes = "organism_name", mart = myusemart)


Checking the content:

> resultTable[1:5, ,drop = FALSE]
organism_name
1 Smoellendorffii
2         Cpapaya
3       Rcommunis
4        Csativus
5       Vvinifera


As for why this is necessary, it looks like this instance of BioMart automatically redirects you to a https server when you run the query, and you need to access port 443. Although you can provide a port argument to useMart, it is currently overridden if there's a port specified in the registry on the server. For the mart, that registry is located at https://phytozome.jgi.doe.gov/biomart/martservice?type=registry, and referencs port 80 throughout. Since biomaRt expects that document to be up-to-date it uses the values there and fails. This is obviously annoying, so I'll have a think about how best to deal with this situation.

0
Entering edit mode

Thank you very much. I am very happy with your patch!

0
Entering edit mode

I am attempting to use this solution, however, I get the following error when running the getBM command:

Error in .processResults(postRes, mart = mart, sep = sep, fullXmlQuery = fullXmlQuery,  :
The query to the BioMart webservice returned an invalid result.
The number of columns in the result table does not equal the number of attributes in the query.
Please report this on the support site at http://support.bioconductor.org


I am using biomaRt_2.44.4. I see that this error has already been reported to the Bioconductor user forum (several years ago). Should I post again? Thank you in advance for any help.

0
Entering edit mode

Yes, you should start a new thread on the Bioconductor support site with the code that generates the error and the output of sessionInfo()