How to Access Open Data Hub Data With R and SPARQL

New in version 2021.04.

Datasets and their data in the Open Data Hub can be accessed using R, a software for statistical analysis.

This howto shows you a method to retrieve data from the Open Data Hub, but does not address other features like, for example, plotting fetched data on a map.

It is also assumed you have already installed R on your workstation as well as the required R’s SPARQL library from a CRAN mirror.

In order to fetch data, you need:

  1. An endpoint, which for Open Data Hub is https://sparql.opendatahub.bz.it/sparql

  2. a SPARQL query, that you can simply copy from one of the precooked queries at https://sparql.opendatahub.bz.it/ We’ll be using this one:

    PREFIX schema: <http://schema.org/>
    PREFIX geo: <http://www.opengis.net/ont/geosparql#>
    PREFIX noi: <http://noi.example.org/ontology/odh#>
    
    SELECT ?pos ?posLabel
    WHERE {
      ?p a noi:Pizzeria ;
         geo:asWKT ?pos ;
         schema:name ?posLabel ;
         schema:geo ?geo .
      FILTER (lang(?posLabel) = "it")
    }
    LIMIT 10
    
  3. An R script to put all together

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    library(SPARQL)
    
    endpoint <- "https://sparql.opendatahub.bz.it/sparql"
    
    query <-
    'PREFIX schema: <http://schema.org/>
    PREFIX geo: <http://www.opengis.net/ont/geosparql#>
    PREFIX noi: <http://noi.example.org/ontology/odh#>
    
    SELECT ?pos ?posLabel
    WHERE {
      ?p a noi:Pizzeria ;
         geo:asWKT ?pos ;
         schema:name ?posLabel ;
         schema:geo ?geo .
      FILTER (lang(?posLabel) = "it")
    }
    LIMIT 10'
    
    result_set <- SPARQL(endpoint,query)
    print(result_set)
    

The script above can be saved in a file called R-demo.r and executed using the Rscript R-demo.r command. The output will be similar to:

~# Rscript R-demo.r
 Loading required package: XML
 Loading required package: RCurl
 $results
                                                                                 pos
 1  "POINT (11.440394 46.511651)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
 2  "POINT (11.200728 46.729921)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
 3      "POINT (11.9412 46.9803)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
 4      "POINT (11.4278 46.4135)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
 5  "POINT (11.326362 46.310963)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
 6  "POINT (12.279453 46.733497)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
 7  "POINT (10.867335 46.622179)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
 8  "POINT (11.241217 46.246141)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
 9   "POINT (11.598339 46.40688)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
 10     "POINT (12.0114 46.7474)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
                                          posLabel
 1           "Ristorante Pizzeria Bar Pirpamer"@it
 2                      "Bar Pizzeria Alpenhof"@it
 3            "Ahrner Wirt Ristorante Pizzeria"@it
 4                  "Ristorante Pizzeria Adler"@it
 5                            "Hotel Al Mulino"@it
 6                "Ristorante Pizzeria Zentral"@it
 7        "Hotel Ristorante Bar Rasthof Vermoi"@it
 8                             "Hotel Grünwald"@it
 9                                "Hennenstall"@it
 10 "Après Ski Bar Pizzeria Ristorante "Gassl""@it

In the script, all data fetched are kept into the result_set variable and can be manipulated at will using R libaries.

Troubleshooting

SPARQL installation fails!

When installing a package, R tries to satisfy all the package’s dependencies and installs any missing library required by the package. If you still stumble upon errors, like for example:

Warning messages:
1: In install.packages("SPARQL") :
  installation of package ‘RCurl’ had non-zero exit status
2: In install.packages("SPARQL") :
  installation of package ‘SPARQL’ had non-zero exit status

It means that SPARQL’s dependency RCurl also failed. In this case it is not easy to spot the root cause, which is a missing package in the OS installation, called libcurl4-gnutls-dev. To install it on a Debian-like system, use as root the following command:

~# apt-get install libcurl4-gnutls-dev

I have some strange warning when executing the script!

If you execute a query and the outcome is not a result set but some error message similar to the following ones, please verify that the URL of the SPARQL endpoint is correct: https://sparql.opendatahub.bz.it/sparql

Opening and ending tag mismatch: meta line 5 and head
Opening and ending tag mismatch: meta line 4 and html
Premature end of data in tag meta line 3
Premature end of data in tag head line 2
Premature end of data in tag html line 1