How to Access Open Data Hub Data With R and SPARQL
Changed in version 2023.1: notify users of SPARQL endpoint reachable upon request only.
Datasets and their data in the Open Data Hub can be accessed using R, a software for statistical analysis and Open Data Hub’s SPARQL endpoint.
Warning
The SPARQL endpoint is currently not active, but can be activated upon request to . However, the ODH SPARQL portal, which contains sample data and queries, can be accessed at https://sparql.opendatahub.com/.
This howto shows you a method to retrieve data from the Open Data Hub, but does not address other features like, for example, plotting fetched data on a map.
It is also assumed you have already installed R on your workstation as well as the required R’s SPARQL library from CRAN.
Note
The SPARQL package is currently archived as apparently not maintained anymore. In this howto, we use the latest version available, which is available at the above mentioned link.
Install SPARQL library
To install the SPARQL library, simply execute on Debian-like systems:
~# Rscript -e "install.packages('SPARQL')"
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/SPARQL_1.16.tar.gz'
Content type 'application/x-gzip' length 6548 bytes
==================================================
downloaded 6548 bytes
* installing *source* package ‘SPARQL’ ...
** package ‘SPARQL’ successfully unpacked and MD5 sums checked
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (SPARQL)
The downloaded source packages are in
‘/tmp/RtmpISkL9Z/downloaded_packages’
If you see the message * DONE (SPARQL), the installation was successful.
If you see instead any ERROR, like those reported below, please refer to section Troubleshooting:
Warning messages:
1: In install.packages("SPARQL") :
installation of package ‘RCurl’ had non-zero exit status
2: In install.packages("SPARQL") :
installation of package ‘SPARQL’ had non-zero exit status
In order to fetch data, you need:
An endpoint, which for Open Data Hub is https://sparql.opendatahub.com/sparql
a SPARQL query, that you can simply copy from one of the precooked queries at https://sparql.opendatahub.com/ We’ll be using this one:
PREFIX schema: <https://schema.org/> PREFIX geo: <http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#> PREFIX noi: <https://noi.example.org/ontology/odh#> SELECT ?pos ?posLabel WHERE { ?p a noi:Pizzeria ; geo:asWKT ?pos ; schema:name ?posLabel ; schema:geo ?geo . FILTER (lang(?posLabel) = "it") } LIMIT 10
An R script to put all together
1library(SPARQL) 2 3endpoint <- "https://sparql.opendatahub.com/sparql" 4 5query <- 6'PREFIX schema: <https://schema.org/> 7PREFIX geo: <http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#> 8PREFIX noi: <https://noi.example.org/ontology/odh#> 9 10SELECT ?pos ?posLabel 11WHERE { 12 ?p a noi:Pizzeria ; 13 geo:asWKT ?pos ; 14 schema:name ?posLabel ; 15 schema:geo ?geo . 16 FILTER (lang(?posLabel) = "it") 17} 18LIMIT 10' 19 20result_set <- SPARQL(endpoint,query) 21print(result_set)
The script above can be saved in a file called R-demo.r
and
executed using the Rscript R-demo.r command. The output
will be similar to:
~# Rscript R-demo.r
Loading required package: XML
Loading required package: RCurl
$results
pos
1 "POINT (11.440394 46.511651)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
2 "POINT (11.200728 46.729921)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
3 "POINT (11.9412 46.9803)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
4 "POINT (11.4278 46.4135)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
5 "POINT (11.326362 46.310963)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
6 "POINT (12.279453 46.733497)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
7 "POINT (10.867335 46.622179)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
8 "POINT (11.241217 46.246141)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
9 "POINT (11.598339 46.40688)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
10 "POINT (12.0114 46.7474)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
posLabel
1 "Ristorante Pizzeria Bar Pirpamer"@it
2 "Bar Pizzeria Alpenhof"@it
3 "Ahrner Wirt Ristorante Pizzeria"@it
4 "Ristorante Pizzeria Adler"@it
5 "Hotel Al Mulino"@it
6 "Ristorante Pizzeria Zentral"@it
7 "Hotel Ristorante Bar Rasthof Vermoi"@it
8 "Hotel Grünwald"@it
9 "Hennenstall"@it
10 "Après Ski Bar Pizzeria Ristorante "Gassl""@it
In the script, all data fetched are kept into the result_set variable and can be manipulated at will using R libaries.
Troubleshooting
SPARQL installation fails!
When installing a package, R tries to satisfy all the package’s dependencies and installs any missing library required by the package. If you still stumble upon errors, like for example:
Warning messages:
1: In install.packages("SPARQL") :
installation of package ‘RCurl’ had non-zero exit status
2: In install.packages("SPARQL") :
installation of package ‘SPARQL’ had non-zero exit status
It means that SPARQL’s dependency RCurl also failed. In this case it is not easy to spot the root cause, which is a missing package in the OS installation, called libcurl4-gnutls-dev. To install it on a Debian-like system, use as root the following command:
~# apt-get install libcurl4-gnutls-dev
I have some strange warning when executing the script!
If you execute a query and the outcome is not a result set but some error message similar to the following ones, please verify that the URL of the SPARQL endpoint is correct: https://sparql.opendatahub.com/sparql
Opening and ending tag mismatch: meta line 5 and head
Opening and ending tag mismatch: meta line 4 and html
Premature end of data in tag meta line 3
Premature end of data in tag head line 2
Premature end of data in tag html line 1