Question: How do I import RDF data into R?
 
16
 
 

What approach are you using to import Resource Description Framework data into R? There is minimal support with the R package Rredland, but that seems rather spartanic. There was an interesting Rswub, but that was lost in time. I also noted Rsparql, but the project does not seem to have delivered anything yet. And, of course, I can do something manually... what are your best practices to use RDF data from, for example, Bio2RDF?

 
 
 
log in to reply • written 21 months ago by Michael Dondrup ♦♦ 14601826
 

Sorry, you lost me... Swedish RDF?

log in to reply • written 21 months ago by Egon Willighagen  4171518
 
1

Your first link connects to the Swedish version of wikipedia. For the english version: http://en.wikipedia.org/wiki/Resource_Description_Framework

log in to reply • written 21 months ago by David Quigley  8011822
 

Oh, crap... OK, fixing... stupid, we're-so-smart-we-know-where-you-live websites... :(

log in to reply • written 21 months ago by Egon Willighagen  4171518
 

Ah! Sorry about that; fixed now.

log in to reply • written 21 months ago by Egon Willighagen  4171518

2 answers

 
8
 
 
 

I started a package for just this purpose yesterday. It is available from CRAN, as functionality is a bit limited today:

library(rrdf)
m1 = load.rdf("one.rdf")
m2 = load.rdf("two.rdf")
m3 = combine.rdf(m1, m2)
summarize.rdf(m3)
sparql.rdf(m3, "SELECT ?s ?p { ?s ?p ?o }")

It is wrapping around Jena and using rJava to interface to it.

There is in fact also a Bioconductor package called Rredland.

Because the rrdf package now also supports SPARQL queries against remote databases, you can also do (following this BioStar answer):

library(rrdf)

endpoint = "http://rdf.farmbio.uu.se/chembl/sparql"

query = "
SELECT ?organism ?instance
WHERE {
  ?instance a <http://rdf.farmbio.uu.se/chembl/onto/#Target> ;
    <http://rdf.farmbio.uu.se/chembl/onto/#organism> ?organism .
}
";

data = sparql.remote(endpoint, query)

As of version 1.4 you can also use on of the SPARQL variables as values for the row names. For example, to get a single column with the protein names as row names, you do:

query = "
SELECT ?organism ?title
WHERE {
  ?instance a <http://rdf.farmbio.uu.se/chembl/onto/#Target> ;
    <http://purl.org/dc/elements/1.1/title> ?title ;
    <http://rdf.farmbio.uu.se/chembl/onto/#organism> ?organism .
}
";

data = sparql.remote(endpoint, query, rowvarname="title")

Resulting in a R matrix like:

                                                      organism                       
Maltase-glucoamylase                                  "Homo sapiens"                 
Sulfonylurea receptor 2                               "Homo sapiens"                 
Voltage-gated T-type calcium channel alpha-1H subunit "Homo sapiens"                 
Dihydrofolate reductase                               "Escherichia coli (strain K12)"
Tyrosine-protein kinase ABL                           "Homo sapiens"                 
DNA-directed RNA polymerase beta chain                "Escherichia coli (strain K12)"
 
 
 
 
5
 
 
 

The following hints are all far from perfect, and will require some experimenting on your side, but here's my best guess (I got only worst practices for language interfaces, not for reading data from BioRDF):

  • The Redland C library has many language bindings (Perl, Python, Ruby). If these bindings are more complete than Rredland, you could use e.g. the Perl-binding + RPy or RSPerl
  • There are java libraries out there, see the StackExchange answer. They can be interfaced using e.g. SJava or (less nicely) JRI.
  • Pimping the Rredland package to add the functionality you need (maybe most clean but takes a lot of your time)

I would maybe go for the SJava solution first because there at least four java libraries to chose from. I have had some mixed experiences with using language bindings, but in the end RSPerl and SJava worked with Perl and Java for me, and I heard that RPy works nicely too. So it should be possible in principle[?]TM[?] to access the libraries too. Whatever solution you come up with will likely be appreciated by the BioC community.

 
 
 

Done, see my own answer.

log in to reply • written 12 months ago by Egon Willighagen  4171518
 
Log in to add a post