OpenData All-in-one. iCMS Virtual Appliance

In recent weeks, the iCMS development team has been working configuring a virtual machine that incorporates the necessary components to build the technological infrastructure that supports an open data initiative.

The main target of this virtual machine is to simplify the process of publishing open data, reducing the costs associated with the implementation of technological infrastructure, allowing so to focus all efforts in locating and modeling datasets.

Under the name “iCMS Virtual Appliance”, the following components have been integrated:

  • Open Data Website / Catalog Data: CKAN 1.7.
  • Content Interoperability Interface: iCMS 1.3.3.
  • Vocabularies Publishing Platform: Neologism 0.5.2.
  • Triple-Store / End-Point SPARQL: OpenLink Virtuoso Open-Source Edition 6.1.5. Database hybrid system that combines RDF, XML, relational database management and even object-oriented databases developed by Openlink Software.

In CKAN we classify and define the different datasets (metadata describing the dataset, ie: description, agencies responsible for maintenance, type of license, login information to SPARQL end-point, etc..).

Following the iCMS work philosophy, the data are not stored in the DataStore of CKAN, but are offered through iCMS servers (the URL to access the iCMS collection is included in the CKAN dataset as a resource).

This infrastructure allows our clients to publish 5 stars datasets, offering to the stakeholders / reusers various access alternatives to information:
  • iCMS: API REST / WS.
  • Virtuoso: End-Point SPARQL.
  • CKAN: Catalog information access services (the data access services are available only in those cases where DataStore is used).

We conclude that, with iCMS Virtual Appliance we have a single system with the following features:

  • Centralized website which provides a data catalog.
  • Website Integration with heterogeneous data repositories such as Drupal, OpenCms, Alfresco or relational databases.
  • Viewers of the information (geospatial, graphs, etc.).
  • SPARQL Endpoint for complex queries on datasets.
  • The data published in the Open Data Website are updated at the same time that the updates occur in backends / repository where they are managed (thanks to the subscription / notification implemented in iCMS).
  • Transformation of information into multiple formats like XML, Atom, RDF, JSON, etc.
  • API access via WS/SOAP y REST.
  • Ability to subscribe to datasets, so that changes in the datasets on the source will be notified to the subscribed systems.