How Do I Import Rdf Data Into R?
2
17
Entering edit mode
11.9 years ago

What approach are you using to import Resource Description Framework data into R? There is minimal support with the R package Rredland, but that seems rather spartanic. There was an interesting Rswub, but that was lost in time. I also noted Rsparql, but the project does not seem to have delivered anything yet. And, of course, I can do something manually... what are your best practices to use RDF data from, for example, Bio2RDF?

r web • 16k views
1
Entering edit mode

Your first link connects to the Swedish version of wikipedia. For the english version: http://en.wikipedia.org/wiki/Resource_Description_Framework

0
Entering edit mode
0
Entering edit mode

Sorry, you lost me... Swedish RDF?

0
Entering edit mode

Oh, crap... OK, fixing... stupid, we're-so-smart-we-know-where-you-live websites... :(

0
Entering edit mode

Ah! Sorry about that; fixed now.

11
Entering edit mode
11.3 years ago

I started a package for just this purpose yesterday. It is available from CRAN, as functionality is a bit limited today:

library(rrdf)
m3 = combine.rdf(m1, m2)
summarize.rdf(m3)
sparql.rdf(m3, "SELECT ?s ?p { ?s ?p ?o }")


It is wrapping around Jena and using rJava to interface to it.

There is in fact also a Bioconductor package called Rredland.

Because the rrdf package now also supports SPARQL queries against remote databases, you can also do (following this BioStar answer):

library(rrdf)

endpoint = "http://rdf.farmbio.uu.se/chembl/sparql"

query = "
SELECT ?organism ?instance
WHERE {
?instance a <http://rdf.farmbio.uu.se/chembl/onto/#Target> ;
<http://rdf.farmbio.uu.se/chembl/onto/#organism> ?organism .
}
";

data = sparql.remote(endpoint, query)


As of version 1.4 you can also use on of the SPARQL variables as values for the row names. For example, to get a single column with the protein names as row names, you do:

query = "
SELECT ?organism ?title
WHERE {
?instance a <http://rdf.farmbio.uu.se/chembl/onto/#Target> ;
<http://purl.org/dc/elements/1.1/title> ?title ;
<http://rdf.farmbio.uu.se/chembl/onto/#organism> ?organism .
}
";

data = sparql.remote(endpoint, query, rowvarname="title")


Resulting in a R matrix like:

                                                      organism
Maltase-glucoamylase                                  "Homo sapiens"
Sulfonylurea receptor 2                               "Homo sapiens"
Voltage-gated T-type calcium channel alpha-1H subunit "Homo sapiens"
Dihydrofolate reductase                               "Escherichia coli (strain K12)"
Tyrosine-protein kinase ABL                           "Homo sapiens"
DNA-directed RNA polymerase beta chain                "Escherichia coli (strain K12)"

0
Entering edit mode
6
Entering edit mode
11.9 years ago

The following hints are all far from perfect, and will require some experimenting on your side, but here's my best guess (I got only worst practices for language interfaces, not for reading data from BioRDF):

• The Redland C library has many language bindings (Perl, Python, Ruby). If these bindings are more complete than Rredland, you could use e.g. the Perl-binding + RPy or RSPerl
• There are java libraries out there, see the StackExchange answer. They can be interfaced using e.g. SJava or (less nicely) JRI.
• Pimping the Rredland package to add the functionality you need (maybe most clean but takes a lot of your time)

I would maybe go for the SJava solution first because there at least four java libraries to chose from. I have had some mixed experiences with using language bindings, but in the end RSPerl and SJava worked with Perl and Java for me, and I heard that RPy works nicely too. So it should be possible in principleTM to access the libraries too. Whatever solution you come up with will likely be appreciated by the BioC community.

1
Entering edit mode