How to Access Open Data Hub Data With R and SPARQL

Changed in version 2023.1: notify users of SPARQL endpoint reachable upon request only.

Datasets and their data in the Open Data Hub can be accessed using R, a software for statistical analysis and Open Data Hub’s SPARQL endpoint.

Warning

The SPARQL endpoint is currently not active, but can be activated upon request to . However, the ODH SPARQL portal, which contains sample data and queries, can be accessed at https://sparql.opendatahub.com/.

This howto shows you a method to retrieve data from the Open Data Hub, but does not address other features like, for example, plotting fetched data on a map.

It is also assumed you have already installed R on your workstation as well as the required R’s SPARQL library from CRAN.

Note

The SPARQL package is currently archived as apparently not maintained anymore. In this howto, we use the latest version available, which is available at the above mentioned link.

Install SPARQL library

To install the SPARQL library, simply execute on Debian-like systems:

~# Rscript -e "install.packages('SPARQL')"
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/SPARQL_1.16.tar.gz'
Content type 'application/x-gzip' length 6548 bytes
==================================================
downloaded 6548 bytes

* installing *source* package ‘SPARQL’ ...
** package ‘SPARQL’ successfully unpacked and MD5 sums checked
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (SPARQL)

The downloaded source packages are in
   ‘/tmp/RtmpISkL9Z/downloaded_packages’

If you see the message * DONE (SPARQL), the installation was successful.

If you see instead any ERROR, like those reported below, please refer to section Troubleshooting:

Warning messages:
1: In install.packages("SPARQL") :
  installation of package ‘RCurl’ had non-zero exit status
2: In install.packages("SPARQL") :
  installation of package ‘SPARQL’ had non-zero exit status

In order to fetch data, you need:

  1. An endpoint, which for Open Data Hub is https://sparql.opendatahub.com/sparql

  2. a SPARQL query, that you can simply copy from one of the precooked queries at https://sparql.opendatahub.com/ We’ll be using this one:

    PREFIX schema: <https://schema.org/>
    PREFIX geo: <http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#>
    PREFIX noi: <https://noi.example.org/ontology/odh#>
    
    SELECT ?pos ?posLabel
    WHERE {
      ?p a noi:Pizzeria ;
         geo:asWKT ?pos ;
         schema:name ?posLabel ;
         schema:geo ?geo .
      FILTER (lang(?posLabel) = "it")
    }
    LIMIT 10
    
  3. An R script to put all together

     1library(SPARQL)
     2
     3endpoint <- "https://sparql.opendatahub.com/sparql"
     4
     5query <-
     6'PREFIX schema: <https://schema.org/>
     7PREFIX geo: <http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#>
     8PREFIX noi: <https://noi.example.org/ontology/odh#>
     9
    10SELECT ?pos ?posLabel
    11WHERE {
    12  ?p a noi:Pizzeria ;
    13     geo:asWKT ?pos ;
    14     schema:name ?posLabel ;
    15     schema:geo ?geo .
    16  FILTER (lang(?posLabel) = "it")
    17}
    18LIMIT 10'
    19
    20result_set <- SPARQL(endpoint,query)
    21print(result_set)
    

The script above can be saved in a file called R-demo.r and executed using the Rscript R-demo.r command. The output will be similar to:

~# Rscript R-demo.r
 Loading required package: XML
 Loading required package: RCurl
 $results
                                                                                 pos
 1  "POINT (11.440394 46.511651)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
 2  "POINT (11.200728 46.729921)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
 3      "POINT (11.9412 46.9803)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
 4      "POINT (11.4278 46.4135)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
 5  "POINT (11.326362 46.310963)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
 6  "POINT (12.279453 46.733497)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
 7  "POINT (10.867335 46.622179)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
 8  "POINT (11.241217 46.246141)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
 9   "POINT (11.598339 46.40688)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
 10     "POINT (12.0114 46.7474)"^^<http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral>
                                          posLabel
 1           "Ristorante Pizzeria Bar Pirpamer"@it
 2                      "Bar Pizzeria Alpenhof"@it
 3            "Ahrner Wirt Ristorante Pizzeria"@it
 4                  "Ristorante Pizzeria Adler"@it
 5                            "Hotel Al Mulino"@it
 6                "Ristorante Pizzeria Zentral"@it
 7        "Hotel Ristorante Bar Rasthof Vermoi"@it
 8                             "Hotel Grünwald"@it
 9                                "Hennenstall"@it
 10 "Après Ski Bar Pizzeria Ristorante "Gassl""@it

In the script, all data fetched are kept into the result_set variable and can be manipulated at will using R libaries.

Troubleshooting

SPARQL installation fails!

When installing a package, R tries to satisfy all the package’s dependencies and installs any missing library required by the package. If you still stumble upon errors, like for example:

Warning messages:
1: In install.packages("SPARQL") :
  installation of package ‘RCurl’ had non-zero exit status
2: In install.packages("SPARQL") :
  installation of package ‘SPARQL’ had non-zero exit status

It means that SPARQL’s dependency RCurl also failed. In this case it is not easy to spot the root cause, which is a missing package in the OS installation, called libcurl4-gnutls-dev. To install it on a Debian-like system, use as root the following command:

~# apt-get install libcurl4-gnutls-dev

I have some strange warning when executing the script!

If you execute a query and the outcome is not a result set but some error message similar to the following ones, please verify that the URL of the SPARQL endpoint is correct: https://sparql.opendatahub.com/sparql

Opening and ending tag mismatch: meta line 5 and head
Opening and ending tag mismatch: meta line 4 and html
Premature end of data in tag meta line 3
Premature end of data in tag head line 2
Premature end of data in tag html line 1