Adaptation to the Spanish regulations of CKAN

Accessibility of citizens to public data has always been a constant in terms of the rights demanded by the population to the different governments of each country, always speaking in terms of democracy. Today we talk about the adaptation to the Spanish regulations of CKAN.

A little history

Throughout history, there are many historical figures, both known and anonymous, who have fought for information to be shared with the people, with the firm conviction that this openness of governments towards citizens would be rewarded by these in a multiplication factor much higher than the supposed damage they could cause, making malicious use of certain sensitive data.

Although it has not really been until the 21st century, known by the nickname of “the information era”, when governments have accepted that the public exposure of such information is not only a right that the citizen has, but it is also a duty of public organizations with them.

This, together with globalization and the use of the Internet as the standard means to communicate with the different institutions today, made it necessary to create laws that regulate the exposure and reuse of this information.

This outbreak of the opening of data by governments, led to the need to provide new ways for citizens to obtain information without having to request it through bureaucratic procedures that collide with the philosophy of open data. Internet has been the only option in terms of the means to be used; the starting signal was given for the development of software solutions that would allow the consultation and use of the data.

Right here is where CKAN (Comprehensive Knowledge Archive Network), open source web applications initially conceived to store and distribute data, whose main objective was to become a specialized tool in the management of documents and meta-information associated with the scientific field.

In 2009, the United Kingdom Government adopted CKAN as a back-end for the Open Data portal, assuming great support for the project, causing an important turn in terms of the final focus of the tool: it went from being a purely focused tool to the scientific field to become the great reference in terms of web tools to manage Open Data portals.

It is from this moment, coinciding with the version 1.0 release of of the software, when they begin to emerge many Open Data platforms around the world, almost all with one common point: the use of CKAN as a management tool for such data.

Evolution of CKAN as an Open Data platform

From version 1.0 to the current version 2.6.2 released in February 2017, it has been almost 8 years, in which the platform has suffered many small and large improvements, to which we must add the large amount of extensions developed both by the fixed core developers of the platform, as by the broad and active community that has it. This makes us be not only facing a platform with a great present, but with infinite possibilities in the future.

That version 1.0 was totally focused on the use of the tool for the management of documents and internal data, not as a final user interface, although in this first released version it was intuited where its developers wanted to take it, since it had certain features that provided the project with great versatility such as the RESTful API for access, creation and updating of data.

Since then and up to today, the core of CKAN has been enhanced, including new functionalities and constantly correcting failures and vulnerabilities of the system, fostered by the feedback between the developers and the very active community of users.

Among the great improvements that have been included during these years are:

  • Management of organizations and user permits associated with these
  • RESTful API much more powerful than the initial one
  • Categorization in data sets and resources (own terminology of Open Data)
  • Option to create new metadata associated with these categories
  • Inclusion of fixed metadata from the Open Data, required by law
  • Consultation of basic statistics, personalization of user interface for the use of CKAN as a single front-end
  • Viewers of different types of content
  • Use of DataStore to store data in tabular format, etc.

All this and much more has made CKAN what it is today: the best and only option in terms of platform to manage Open Data portals.

CKAN

Starting to use CKAN

Our relationship with the Open Data portals began in 2013, with the Open Data Portal of the Santander City Council. It was then that we had to make the decision to develop custom software, from scratch, or use some of the tools that existed in the market at that moment as a solution for Open Data portals.

After studying all the options, the decision was simple: we would use as back-end the version 1.8 of CKAN, which had just been released, which was already the great reference in terms of software dedicated to Open Data portals.

This covered the management of the data catalog but, although this version of CKAN had a clean, simple and customizable look & feel through the use of CSS, it did not cover all of our needs.

With the intention of enhancing the exploitation of the data that would be stored in CKAN, it was decided to provide our platform with a simple interface for users, with which we felt comfortable as developers and in which we could include any future idea that serve to increase user satisfaction with the platform.

The decision adopted was the use of the well-known content management system (CMS) WordPress along with some libraries for the creation of graphics.

OGoov: Our solution adapted to Spanish regulations

Once they had taken appropriate technology decisions, we got down to work to adapt CKAN to what was dictated by the Application Guide of the Technical Norm of Interoperability of reuse of Information Resources (NTI) regarding the metadata to be stored, both for the data sets and for the resources associated with them.

When crossing the metadata supported by CKAN with those requested by the NTI, the need to add the following metadata to the system arose:

Data set

  • Title in several languages (depends on the client’s requirements).
  • Description in several languages (depends on the client’s requirements).
  • Degree of opening.
  • Last known update.
  • Update frequency.
  • Language (s).
  • Geographic coverage.
  • Temporary coverage.
  • Effective date.
  • Related resource (s).
  • Regulation (s).

Resources

  • Title in several languages (depends on the client’s requirements).
  • Description in several languages (depends on the requirements of the client).
  • Additional information on the format.
  • Size in bytes.

For the development of these changes, it was decided in the first instance to create a CKAN extension from scratch, in which the new fields would be included and the existing forms would be extended to show and save such information.

Once this new extension was developed and implemented, we observed that this solution, although functionally correct, it was not well designed, mainly because the inclusion of the multi-language fields were not something dynamic, but required us to review the extension itself depending on each installation, having to add and / or delete translation fields according to the needs of each client.

In a second repetition of the development, it was decided to use two of the CKAN extensions most used by the developers: ckanext-scheming, which allows configuring new metadata associated with data sets, resources, organizations and groups, and ckanext-fluent, which It allows to make such fields added with the previous multi-language extension.

The ckanext-scheming extension facilitates the way of adding, modifying or eliminating the metadata associated with data sets, resources, organizations and groups. For this, it is enough to define a new scheme in JSON format, where each of the fields that will be displayed in the forms is defined.

In our case, it was only necessary to modify the schemas for data sets and for resources. Once the schema is generated in JSON format, it is only necessary to configure CKAN to use it instead of the default schema.

Subsequently, depending on the multi-language requirements of each installation, this scheme would have to be enriched with the corresponding options for the languages ​​to be supported.

This is where the ckanext-fluent extension comes into play. This allows us to define the texts associated with the metadata in different languages ​​and indicate the fields that will be shown in the form.

With all this, we are allowed both to comply with each of the current information requirements of the NTI, and be prepared at all times before any future evolution of the standard, simplifying the inclusion, modification and elimination of fields. That is, it allows us to adapt our open data platform to any change in the standard in a matter of hours.