D6.2 - Scientific Workflow Generation¶
- D6.2 - Scientific Workflow Generation
- Executive Summary
- 1. Introduction
- 2. Overall Developmental Approach to Support the Use Cases
- 3. Deployed VRE(s)
- 4. Implementation Plan
- Additional Information:
In the domain of food safety modelling two use cases and two additional research communities were identified where web-based scientific data analysis workflows and software based resources for knowledge sharing and integration are of extraordinary importance. Community-specific Virtual Research Environments (VRE) hosted on the AGINFRA+ gateway (https://aginfra.d4science.org/group/aginfra-gateway) provide a promising infrastructure to support the need for scientific collaboration and knowledge exchange for all four research areas. Specifically, the AGINFRA+ project demonstrated how existing and newly developed web-based services can be customized and integrated into community-specific VREs that support the specific needs of each research community. These VREs include services supporting general collaboration, services for storing and sharing of data and knowledge as well as computational resources to run simulations. In addition, the VRE demonstrates how external computational resources can be integrated and made available via the VRE to scientific communities.
Within the AGINFRA+ project the following community-centred VREs were established:
The DEMETER-VRE supports researchers in the identification of emerging risks in the food (and feed) chain (Use case 1). DEMETERstands for Determination and Metrics of Emerging Risks. Specificfeatures that were developed and integrated into this VRE provide innovative data and knowledge visualization services, e.g. a MindMap portlet and a Network Graph visualization service. The VRE has also been successfully used as a project management facility for the EFSA-funded project "DEMETER".
The RAKIP_portal-VRE supports risk assessors and risk modellers in their efforts to share their knowledge (data, mathematical models, simulation results, resources) in a harmonized way (Use case 2). RAKIP stands for Risk Assessment Modelling and Knowledge Integration Platforms. This VRE provides the first community-driven food safety knowledge repository containing among others mathematical models from the area of predictive microbial modelling and quantitative microbial risk assessment (QMRA) that are compliant to the harmonized information exchange format FSK-ML. FSK-ML (Food Safety Knowledge Mark-up Language) is a file format that supports standardized knowledge exchange, i.e. it improves the whole knowledge generation and knowledge dissemination process.
The ORIONKnowledgeHub-VRE supports the community of One Health in their efforts to increase harmonization and interpretation of surveillance data from relevant and diverse disciplines. Specifically, this VRE hosts the first OneHealth EJP Glossary and serves as project management and communication facility for the EJP ORION project.
The Global Foodsource Identifier (GFI)-VRE supports researchers investigating foodborne disease outbreaks. The VRE provides a resource for sharing microbial typing results from foodborne disease outbreaks to the international community and thus, facilitates collaboration and interaction between outbreak investigators, globally. It provides online computational resources like RStudio, DataMiner and Jupyter notebooks to the researchers.
Within the AGINFRA+ project, WP6-Food Safety Risk Assessment Community-identified several research communities within the food safety domain that could benefit from services available via the D4Science and AGINFRA+ gateway. For two use cases (DEMETER and RAKIP) this work package performed a detailed requirement analysis including the identification of personas and specific software/service needs (see Deliverable D6.1). In addition, it was possible to promote the use of VREs in two additional research communities (EJP ORION and GFI) by provisioning specific community-centred VRE solutions.
All four research communities share the need of a protected, web-based environment that facilitates communication, collaboration and knowledge sharing within these communities. At the same time, there is a need to finalize research results and make those accessible to the general research community. Customized VREs hosted on the AGINFRA+ gateway were considered as a promising infrastructure to support the these needs. Unique selling points of the VRE are features that facilitate sharing of
data, knowledge and mathematical models as well as computational resources. In the following, the features, services, and workflows that were developed for the research communities within the food safety domain are elucidated in detail.
2. Overall Developmental Approach to Support the Use Cases¶
Two VREs have been deployed to support the WP6 use cases (namely DEMETER and RAKIP_portal) and two VREs were deployed additionally for other communities interested in exploring the VRE technology (namely ORIONKnowledgeHub and Global Foodsource Identifier).
Both, the DEMETER and RAKIP_portal VREs serve the Food Safety Risk Assessment community, but they address two independent research areas in there.
The first VRE, DEMETER, supports data analysis and data visualization tasks for the identification of upcoming risks in the food (and feed) chain. A community-specific service that facilitates such analysis is the Network Graphs visualization portlet, which was newly incorporated into the DEMETER-VRE. This web service can be applied to any network data set that is time-resolved (e.g. transport data, text mining data and supply chain network). From a broader perspective, this web service demonstrates how new VRE-based services can support knowledge and resource sharing within a specific community. Also the DEMETER-use case demonstrates how VRE-based computational resources (like DataMiner, Algorithm Importer and RStudio) support computational intensive data processing and visualization tasks in communities. Further, the VRE is equipped with services that enabled the use of the VRE as project management resource for the currently ongoing EFSA-funded research project "DEMETER" (examples for useful services are the Social Networking, Wiki, Calendar and Survey tool).
The second VRE, RAKIP_portal, supports risk assessors and risk modellers in their efforts to share their knowledge (data, mathematical model, simulation results) in a harmonized way. Specifically, this VRE provides a prototypic community-driven food safety knowledge repository (using the VRE Catalogue functionality), which contains mathematical models from the area of predictive microbial modelling and quantitative microbial risk assessment (QMRA). It has been possible to integrate the RAKIP Model Repository hosted on a German Federal Institute for Risk Assessment (BfR)'s KNIME Server directly into the RAKIP_portal-VRE.This integration facilitates the use of interactive services hosted on the BfR's KNIME Server that allow to create and define own model-based simulations for FSK-ML compliant models (see Section "Food Safety Knowledge Markup Language" for details). Thereby, the repository represents a core achievement: it allows to use services that are hosted outside the VRE. The new information exchange format FSK-ML (that is also supported by the repository) has been developed with support of the AGINFRA+ project to share food safety models more efficiently. Several highly relevant resources were integrated into the VRE: (1) RStudio, (2) Jupyter notebooks, (3) Galaxy and (4) the ARPHA Writing Tool (AWT). The AWT is able to read FSK-ML compliant model files and to integrate relevant model metadata directly into a new manuscript. The DataMiner computational service has been extended during the AGINFRA+ project.
Now, it allows to execute KNIME workflows. Within the AGINFRA+ project several new resources were developed to address specific needs of the RAKIP-community. It is now possible to share information between different VRE components directly as the workspace is now accessible from applications like Jupyter and Galaxy. Further, it is possible to start any algorithm within the DataMiner with a right-click on a file that is stored in the workspace of the VRE (the so called "right-click"-feature).
WP6 was further able to promote the use of AGINFRA+ and CNR resources beyond the originally planned scientific communities. Specifically WP6 promoted the use of the VRE technology in the One Health community (i.e. to research project ORION that is funded under the H2020 European Joint Programming Initiative) and by a group of researchers investigating foodborne disease outbreaks that was funded under the H2020 COMPARE project.
ORIONKnowledgeHub-VRE supports the One Health community in their efforts to increase harmonization and interpretation of surveillance data from diverse and adjacent disciplines. A key component of this VRE is the first OneHealth EJP Glossary. In addition to that the VRE is used for project management and communication facility for the EJP ORION project.
The Global Foodsource Identifier (GFI)-VRE supports researchers investigating foodborne disease outbreaks. In that context, resource for sharing microbial typing results to the international community are needed. The VRE provides this resource in form of a data management system. By that it facilitates collaboration and interaction between outbreak investigators, globally. Furthermore, the VRE provides online computational resources like RStudio, DataMiner and Jupyter notebooks to the researchers.
3. Deployed VRE(s)¶
3.1 Overall Description¶
Two VREs have been deployed to support the WP6 use cases and two VREs were deployed additionally for other communities interested in exploring the VRE technology. All VREs are available via the AGINFRA+ gateway (https://aginfra.d4science.org/) via the following links:
Global Foodsource Identifier¶
3.2 Semantic Features¶
In the course of the AGINFRA+ project the adoption of a number of new semantic services has been explored and discussed with the corresponding communities. The German Federal Institute for Risk Assessment (BfR) developed a new and open standard for the annotation of food safety knowledge (data and model) called FSK-ML could be developed, tested and improved.
Food Safety Knowledge Markup Language (FSK-ML)¶
The Food Safety Knowledge Markup Language (FSK-ML) harmonizes the exchange of food safety knowledge, i.e. mathematical models and data sets, e.g. simulation results. It provides a full set of specifications for accurately describing food safety knowledge. By that, the knowledge becomes interoperable and reusable. One common use of this standard is the encoding of FSKX-container files that can be used to encapsulate food safety model and data as well as their simulation configurations and other relevant metadata (see
Before the project, there was only the Predictive Modelling in Food Markup Language (PMF-ML), that detailed how experimental data and mathematical models from the domain of predictive microbial modelling could be saved and encoded in a software independent manner. With FSK-ML the constraint of software independence models has been relaxed to support the exchange of knowledge / information that is embedded legacy and new software dependent models (e.g. models developed in R, Matlab or Python).
FSK-ML aims at harmonizing the exchange of food safety knowledge (e.g. predictive models) including the corresponding metadata this provides the basis to develop software tools that allow importing and exporting of models in this format and thus, overcoming an error-prone re-implementation process. The open source software FSK-Lab is such a software tool that was developed within the AGINFRA+ project. In addition to that knowledge (e.g. models or data sets) that are provided in an FSK-ML compliant format could therefore automatically inserted into the Resource Catalogues of the corresponding community VREs by applying a dedicated open source KNIME workflows that is registered as a public resource in the DataMiner.
A core element of FSK-ML is a metadata schema that has been developed by members of the RAKIP community and that is now used to annotate food safety knowledge. These schemes are provided as a community driven, online resource and thus, are subject to regular improvements (see https://goo.gl/PE4ysP for interactive list).
RAKIP Controlled Vocabularies¶
FSK-ML provides also controlled vocabularies for relevant metadata concepts. These vocabularies are used to annotate food safety models and data. The list is based on terms used by other sources like ontologies, standards and tools, e.g. EFSA's Standard Sample Description (https://efsa.onlinelibrary.wiley.com/doi/pdf/10.2903/sp.efsa.2015.EN-918) and other (see https://dx.doi.org/10.1016/j.mran.2018.06.001for details). These controlled vocabularies are listed in an interactive online resource (https://goo.gl/wbFoZU).
A reference implementation for FSK-ML (see Section "Food Safety Knowledge Markup Language (FSK-ML) for details), open source KNIME extension "FSK-Lab" was developed (see Figure 1 for an example; see https://foodrisklabs.bfr.bund.de/fsk-lab/ for details).
Figure 1. Two KNIME-workflows that comprise out of nodes from the KNIME extension FSK-Lab
BfR supported the conversion of the FSK-ML metadata schema (see Section "Metadata Schema" for details) into a FSK-ML ontology and the generation of the RAKIP vocabulary that are available from within VocBench (work linked to WP2).
3.3 Analytical Features¶
The four VREs from WP6 exploit a number of new or enriched analytical services provided via the AGINFRA+ gateway. The main achievement for these food safety communities is that the DataMiner resource has been empowered to allow the execution of KNIME-based data analytics workflows (and that even different versions of KNIME could be used). This innovation opens up the way to integrate and deploy existing data analytics knowledge (encoded as KNIME and FSK-Lab workflows) inside the VRE. This functionality overcomes the limitation that DataMiner algorithms cannot invoke elements for user interaction (work related other WPs).
An overview on other services and resources explored and tested over the course of the AGINFRA+ project is here:
A service to create an URI for each model uploaded into the VRE Catalogue
A service to execute FSK-ML compliant models on a computational cluster provided by the VRE (including multi-core processing)
DataMiner and Statistical Algorithm Importer services—opportunity to register and disseminate own data processing services, as a KNIME-workflow to access social media data (Twitter and Facebook) from within the VRE
A resource to integrate the KNIME Server model repository as new tab into the VRE
A resource to link external services (e.g. KNIME-workflows executed on BfR’s KNIME Server) via WPS and REST (exploited by the RAKIP_portal-VRE)
A new “right-click”-feature that allows to invoke any DataMiner algorithm registered to the VRE directly from the workspace
3.4 Presentation Features¶
The degree to which the different communities expressed needs and subsequently indicated their will to adopt the developed visualization features varied greatly. Over the course of the AGINFRA+ project a number of new visualization services were developed, tested and discussed with the communities (e.g. Mind Maps portlet, View Graphs portlet, Network Visualization portlet and Ontology browser portlet).
BfR developed web-based services that were deployed on BfR’s KNIME Server infrastructure and that could be accessed directly from the VRE. Through the integration of the RAKIP Model Repository (as a tab inside the RAKIP_portal-VRE) and the integration of model-specific links into the VRE Catalogue entries it has been possible to create interactive and user-friendly GUIs to annotate, edit and create food safety models. Due to the user-friendliness and the high degree of interaction between researchers, there is a high uptake potential after the end of the AGINFRA+ project.
A highly relevant service for any research community is an efficient scientific publication tool. The ARPHA Writing Tool efficiently transforms an FSK-ML compliant model files into a publication. A journal that is particularly interested in publications with standardized models in the domain of food science is the new Food Modelling Journal (FMJ; https://fmj.pensoft.net/) (this work is linked to WP3 and WP4).
4. Implementation Plan¶
4.1 Components/Features for M12(Completed Work)¶
The following components were implemented in the RAKIP_portal and DEMETER VREs:
|Core VRE||Basic VRE with standard collaboration features such as file sharing and message posting||
|Wiki||General Wiki for knowledge sharing||
|Calendar||Feature to share relevant dates||
|User Management||Feature to manage VRE members||
|DataMiner||e-Infrastructure service providing data mining algorithms under the Web Processing Service (WPS) standard (see https://gcube.wiki.gcube-system.org/gcube/Data_Mining_Facilities )||RAKIP_portal|
|SAI||Statistical Algorithms Importer (SAI) as a tool to import KNIME-based workflow into the D4Science e-Infrastructure(see https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer )||RAKIP_portal|
|Data Catalogue||The Data Catalogue is a feature that allows to browse and search documents made available via the shared workspace of the VRE. In the case of the RAKIP_portal this feature will serve as the front end of the RAKIP model repository, as models are stored as FSK-ML formatted files||RAKIP_portal|
|Activity Tracker||Redmine-based feature to create tickets and project roadmaps||RAKIP_portal|
In Figure 1, the landing page of the RAKIP_portal VRE is presented and in Figure 2 the landing page of the DEMETER VRE.
Figure 2. Landing page of the RAKIP_portal-VRE.¶
Figure 3. Landing page of the DEMETER-VRE.¶
4.2 Components / Features for M18(Completed Work)¶
The following components were deployed in the RAKIP_portal and DEMETER VREs. All components were tested and explored by the corresponding scientific community:
|Semantic feature:VocBench||VocBench is a web-based, multilingual, editing and workflow tool that manages thesauri, authority lists and glossaries in a collaborative manner||RAKIP_portal|
|Analytical feature: FSK-Lab is provided to the community via a KNIME Server integration||
||Two specific portlets to create and visualize graphs were added to the VRE||
|Presentation feature:Mind Maps||This visualization component allows the online creation of mind maps to structure knowledge. It can also be used collaboratively||
|Presentation feature:Ontologies||Access to ontologies via external services that allow to automate knowledge generation and extraction||
|Landing page||Content creation and update of the landing page||
|DEMETER Glossary||A glossary with key terminology for the DEMETER-community was added to the Wiki||DEMETER|
|Importing KNIME-workflows (that include R and Python-code)||Import KNIME-workflows (via SAI) that include R and Python-code||DEMETER|
In Figures 3 and 4, the web pages of the integrated RAKIP Model Repository (hosted on BfR KNIME Server) and the DEMETER Glossary (hosted on the VRE Wiki) are presented, respectively.
Figure 4. Screenshot of the RAKIP Model Repository integrated into the RAKIP_portal VRE.¶
Figure 5. The web page of the DEMETER Glossary.¶
4.3 Components/Features for M24(Completed Work)¶
The following components were developed, deployed, tested and explored by the RAKIP and DEMETER scientific communities:
|Semantic feature: Ontology maintenance||Maintenance of ontologies via VocBench||RAKIP_portal|
|Semantic feature:Ontology visualization||Interactive visualization of ontologies via WebVOWL||RAKIP_portal|
|Analytical features: Specific KNIME-workflows developed to add the following functionalities:|
|fskx-file execution via DataMiner||This workflow accepts an fskx-file via the VRE and executes it in the DataMiner (see FSKX Model Runner )||RAKIP_portal|
|Automatic creation of unique URI/URL for each model||Infrastructure to create an unique URI/URL for each model uploaded into the VRE workspace and the VRE Data Catalogue||RAKIP_portal|
|FSKX Model Runner||KNIME-workflow implemented in the VRE DataMiner, which allows to read and to run an fskx-model selected via the Catalogue or via "right-click”-feature from a user folder||RAKIP_portal|
|Connection to external services||
|KNIME-workflow Pubmed Abstract||Tool to download metadata from publications available in Pubmed||DEMETER|
|KNIME-workflow EPPOMining||Tool to mine information from the EPPO Database||DEMETER|
|KNIME Workflow Hub||Public service to search for DEMETER emerging risk identification models||DEMETER|
|Analytical features: FSK-Lab features provided to the community via KNIME Server integration||13 new FSK-Lab releases. The main new features developed were:
|DEMETER Glossary||Provision of a searchable glossary in the Wiki||DEMETER|
4.4 Components/Features for M36 (Completed Work)¶
The third project year was dedicated to the establishment of scientific content inside the RAKIP and DEMETER-VRE based on those features that have been successfully deployed. The main focus of the work was the improvement of usability for those features that have been successfully deployed between M24 and M36.
The following components were developed, deployed, tested and explored by the scientific communities:
|Presentation feature: Mind Maps||Improvement of Mind Maps portlet usability and extension of saving options (to VRE-Workspace, browser local storage, as a file on the computer)||
|Presentation feature:Network Graph Visualization||The web service can be applied to any network data set that is time-resolved (e.g. transport data, text mining data, supply chain network)||DEMETER|
|Presentation feature:ARPHA Writing Tool||ARPHA writing tool automatically transforms an fskx-model into an article||RAKIP_portal|
|Analytical features: Specific KNIME-workflows developed to add the following functionalities:|
|KNIME Server portlet to change model settings and run it in the VRE||A portlet to send an fskx-model to a specific workflow on the KNIME Server, where model parameter settings can be adjusted. Afterwards the model is sent back to the VRE and executed in the DataMiner||RAKIP_portal|
|FSKX Catalogue Publisher||DM-algorithm (implemented as KNIME- workflow) that publishes an FSK-ML compliant model to the VRE Catalogue. It extracts all metadata from the file and injects it into a new Catalogue item. Additionally it adds 3 clickable URLs:
|KNIME-Workflow Hub||Improved UI to exchange, rate and share KNIME workflows in the VRE||DEMETER|
|Maintenance of the KNIME-workflow PubmedAbstract||Tool to download Pubmed Publication Metadata on KNIME Server||DEMETER|
|Maintenance of the KNIME-workflow TwitterSentiment||Tool to mine information from the Twitter Database on the KNIME Server||DEMETER|
|Maintenance of the KNIME-workflow EPPOMining||Tool to mine information from the EPPO Database on the KNIME Server||DEMETER|
|KNIME-workflow to publish content into the VRE||KNIME-workflow to directly publish content into the catalogue VRE||ORIONKnowledgeHub|
|GFI VRE Catalogue Data Extraction Workflow||KNIME-workflow to extract all data from the VRE Catalogue. These data are then placed in a table-file that can be used further||Global Foodsource Identifier|
|Analytical features: FSK-Lab Features provided to the community via KNIME Server integration||11 new FSK-Lab releases. Main new features developed:
|Data catalogue: OpenFSMR Models publication||Upload of models from the OpenFSMR repository into the catalogue||RAKIP_portal|
|Data catalogue:RAKIP Glossary||A glossary with key terminology for the RAKIP-community was provided||RAKIP_portal|
- RAKIP Model Repository (https://aginfra.d4science.org/group/rakip_portal/model-repository)
- FSK-ML (https://foodrisklabs.bfr.bund.de/fsk-ml-food-safety-knowledge-markup-language/)
- FSK-Lab (https://foodrisklabs.bfr.bund.de/fsk-lab/)
- AGINFRA+ VRE (https://aginfra.d4science.org/group/aginfra-gateway)