The Open Science Preservation Platform of ESFRI LIFEWATCH deployed in the Doñana Biological Reserve

Many times in the past, we have highlighted the importance of open data when it comes to complying with the strategies proposed by the European Union to achieve the objectives set for 2020. Today we focus on the work of open data for research within Europe, and more specifically, in the SCTF of the Biological Reserve of Doñana, an infrastructure for the investigation of the ecosystem and biodiversity whose headquarters is based in Seville.

The importance of open data is beyond doubt and is an essential tool for development and progress in general. Particularly, within the field of research are making great progress thanks to Open policies, leading to greater synergies in a shorter period of time.

Before going on, we need to know a little what is the context in which these applications, infrastructure, and initiatives that have an important role in this circuit take place.

EGI – European Gride Infraestructure

EGI is the European Infrastructure in Mesh, in reference to the techniques of computation in mesh (grid computing) that they use to carry out their work. Its main objective is to facilitate access to computational resources through a network of interconnected centers in several countries of the European Union. In this way, international scientific collaboration is facilitated and enhanced.

This federation hosts two types of groups within its core: Organizations that represent national e-Structures (NGIs) and European Intergovernmental Research Organizations (EIROS).

EGI offers a wide range of services to its partners, ranging from support consulting to marketing, but its main function is the creation of unique access points for all its researchers. In this way, what is achieved is to homogenize the software sources and avoid duplication.

The functioning of this international platform is identical in the organizations corresponding to each country. In the case of Spain, ES-NGI is a collaborative environment for Spanish researchers to develop their work together.

ESFRI – European Strategy Forum Research Infrastructures

ESFRI stands for the European Strategic Forum on Research Infrastructures. It is a strategic instrument to develop Europe’s scientific integration and strengthen its international reach.

The purpose of this institution, besides giving support to the scientific community, is that the planning is framed within the strategic objectives set by the European Union. This way the satisfaction of the needs of the citizens is achieved.

Every year, ESFRI publishes a roadmap in which it summarizes the results achieved and gives an overview of the status of the projects. In its latest published roadmap (2018), there are 18 ongoing projects, divided into five different categories: energy, environment, health and food, physical sciences and engineering, cultural, and social innovation.

In the last year, ESFRI modified and refined the definitions, models and methods, so the current methodology is as follows: concept development, design, preparation, implementation, operation and conclusion.

Lifewatch, the union of science and open data

Within this list of projects included in the annual report, mention should be made of LifeWatch ERIC, the code name of the electronic infrastructure for the investigation of biodiversity and ecosystems.

Lifewatch is a consortium of European infrastructures led by Spain (its central base is in Seville) and in which the following participate: Belgium, Slovenia Greece, Italy, Holland, Portugal, and Slovakia as observer country.

This project aims to eliminate the limitations that affect scientific research and cover the need to have a greater amount of data and more varied. To achieve these purposes, tools such as Big Data analysis, semantic resources and open and FAIR data are used.

FAIR data is an acronym in English formed by “findable” (which can be found), “accessible”, “interoperable” and “reusable”. This acronym forms the word “fair”.

Although they are very similar concepts, they are not exactly the same, since fair data does not necessarily have to be open. Their main aspect is that they must be accessible and this may mean that they are accessible to a specific group or by any person (in this case, they would be open).

For example, a common process of experimental data is that they start being accessible only by a group of people who are working with them. Then it passes through the hands of more people who help to refine the set and, finally, in the case that this has been decided, they become accessible to everyone and become open data.

By operating in several countries of the European Union, with Spain as the coordinating center, actions can be carried out in local areas not limited to a single country, offering a broader vision of the continent.

The importance of open data in research

As we have pointed out throughout the article, open data is a central axis for the development of this type of projects such as Lifewatch, since they allow sharing information with other researchers and creating a true scientific community that feeds back into each other. These are the advantages offered by open data:

  • Opportunities for synergies increase, so that efforts are joined to achieve the objectives in less time.
  • Duplicating projects or lines of research is avoided, since it is known what colleagues are working on the instant, no matter if they are in another country.
  • The use of information that may be erroneous or obsolete is reduced.
  • Collaboration between researchers is fostered and strengthened, regardless of the research center they are in.
  • In short, the resources are optimized to obtain results in a more efficient way than before.

The way to achieve this use of open data in research has needed some previous steps, which we will detail:

  • Development of common international standards, since without them, collaboration between the community would be impossible when dealing with data in a heterogeneous and non-unified way.
  • Make public investments to offer universities and research groups the infrastructure and tools necessary to work together and take advantage of the potential of open data appropriately.
  • Foster solidarity among different groups of researchers, overcoming the fear of sharing the results of their own work.

Open Data Preservation Platform

Taking into account all these aspects previously described, it was decided to create the SCTF-DBR Open Data Preservation Platform so that the researcher had the capacity to manage the complete life cycle of the data, in which we are now going to deepen.

The life cycle of the data refers to all the phases through which they pass, from their planning to their consumption by third parties. That is why you have to know each of the stages to offer specific support.

  • Data management planning.
  • Acquisition of the data, through either sensors or external repositories.
  • Data storage.
  • Recovery of data stored in heterogeneous sources.
  • Publication of the data in open data portals following established standards.
  • Data consumption.
  • Preservation of the data.

Originally, the Open Data Preservation Platform had six modules, which required the implementation of two more to complete it: authentication and authorization. In this way, the final structure of the platform was as follows: planning, acquisition, open data portal, consumption, preservation, storage, authentication and authorization.

These solutions were used for each of the phases of the cycle:

  • Planning: Solution based on DMPTool and extended to allow the use of ontologies (and add semantics to the DMPs), integration of associated metadata, etc.
  • Acquisition: Python solutions for the monitoring and remote control of sensors, calibration modules, connection with defined data (existing in external repositories or available in remote repositories), etcetera.
  • Storage and recovery: Solution that allows obtaining information from heterogeneous sources in a centralized and common way, starting from OneData.
  • Publication: Portal based on Invenio, allowing exploitation as open data, and assigning a DOI (Digital Object Identifier) ​​for each dataset.
  • Consumption: Exploitation of development environments for researchers based on Jupyter Notebook.
  • Preservation: Open Source tools such as Bacula on physical media (disks, SAN / NAS, tapes, etc.).

What was initially going to be an open data portal ended up being an Open Science Platform, which functioned as a coordinating element and entry point to the other remaining modules.

This project was completed in November 2015 and was deployed at the Singular Scientific and Technological Facility of the Doñana Biological Reserve (SCTF-DBR), available to the ESFRI-Lifewatch research network.

Within this project led by Telefónica, Viafirma developed five modules of the platform, collaborating with Adevice, who was in charge of the acquisition module of the data, and with Aeonium, nascent technology-based company, responsible for developing the Open Science Platform.