D4.4 - Open Science Publication Technologies


While open access as a publishing model led to dramatic changes in the way scientists communicate their results and succeeded in challenging traditional business strategies of academic publishers, this model soon appeared insufficient as far as access to underlying data was concerned. Open data, by itself, raised the question how we can re-use data and reproduce research results, how transparent is the peer review and, more generally, how scientific evaluation is being performed. Over time, these and other similar developments morphed into what we now call "open science" or, in more general terms, transforming research into a primarily collaborative rather than a primarily competitive endeavour (Penev 2017).


The concept of open science (Nielsen 2011, Pontika et al. 2015, see also the TED talk video of Michael Nielsen) refers to a whole range of issues around opening up the research life cycle, the most important of which are: (1) Open access, (2) Open data, (3) Free and Open-source software, (4) Reproducible research, (5) Open peer-review, (6) Open science policies, (7) Open funding, (8) Open science evaluation, (9) Open science tools and (10) Open education.

A critical requirement of open science is the transparency in methodology, observation and collection of data, open access and re-usability of research objects covering the entire research cycle, public accessibility and transparency of scientific communication – including the open peer review process (Fig. 1) – and using web-based open tools for scientific collaboration and communication. In brief, open science builds on collaboration rather than competition between researchers (European Commission 2016b). To achieve such goals for more collaborative and transparent research cycle, there are several transformations that are already taking or still need to take place:

  • From open access to open science. Open access and open data models are quickly being transformed into open science practices that affect the whole ecosystem of producing, communicating and re-using research results.
  • From human-readable to machine-readable content. Machine readability of the content is now at least as important as human readability as it facilitates the automated harvesting, text mining and re-use of content.
  • From open data to data re-use. Implementation of technologies that integrate structured data into the narrative to the highest possible degree.
  • From traditional publishing to technology-driven service. Technological innovations become critical for the proper publishing and dissemination of scientific content.
  • From semantic enrichment of content to semantic publishing. Semantic tagging and enrichment of content is seen as a transitional step towards the next stage of transformation of content into Linked Open Data (LOD).

Figure 1 Novel and transparent models of peer review as used by the Research Ideas and Outcomes(RIO) journal


Data publishing in the digital age is the act of making data available on the Internet, so that they can be downloaded, analysed, re-used and cited by people and organisations other than the creators of the data. This can be achieved in various ways. In the broadest sense, any upload of a dataset onto a freely accessible website could be regarded as “data publishing”. There are, however, several issues to be considered during the process of data publication, including:

  • Data hosting, long-term preservation and archiving
  • Documentation and metadata
  • Citation and credit to the data authors
  • Licenses for publishing and re-use
  • Data interoperability standards
  • Format of published data
  • Software used for creation and retrieval
  • Dissemination of published data

Pensoft has a long history of trying to adhere to these principles by putting data publishing, machine–readability and dissemination of content central priorities in the development of all tools and workflows. ARPHA stands for: Authoring, Reviewing, Publishing, Hosting and Archiving, all in one place, for the first time. ARPHA is the first publishing platform created by Pensoft to support the full life cycle of a manuscript within a single online collaborative environment, while at the same time providing an environment allowing for high-tech publishing including possibilities to add rich semantic layer to published output; publish data in automated and semi-automated manner; integrate with industry-leading technologies; and disseminate and archive widely to stimulate re-use and collaboration. Below you will find a few examples of innovations that were made possible by ARPHA.

From human- to machine-readable content

Non-machine readable PDFs, either digitally born or scanned from paper prints, require significant additional effort of post-publication markup and data extraction into a structured form, in order to address issues of interoperability and reuse of publications and data (Agosti 2006, Penev et al. 2010, Agosti 2016). To tackle such issues the open access journal ZooKeys was the first to implement both generic and domain-specific markup which was adopted thereafter by PhytoKeys, MycoKeys, Journal of Hymenoptera Research, Deutsche Entomologische Zeitschrift, Zoosystematics and Evolution and other Pensoft journals.

The Biodiversity Data Journal

The next stage of development of integrated narrative and data publishing was landmarked by the Biodiversity Data Journal (BDJ) and its associated authoring tool, ARPHA Writing Tool (AWT), launched within the ViBRANT EU Framework Seven (FP7) project. The Biodiversity Data Journal was the first ever journal that provided a fully Web- and XML-based life cycle of a manuscript, starting from authoring to submission, peer review, publishing and dissemination. Later, the BDJ workflow was upgraded to the "ARPHA-XML journal publishing workflow" which itself is a part of the ARPHA Journal Publishing Platform. The ARPHA-XML workflow came with several tools and workflows developed by Pensoft, such as ReFindit for discovery and import of literature and data references, import/export of tabular data and also of Darwin Core occurrence records, conversion of Ecological Metadata Language (EML) metadata into manuscripts, automated archiving of articles and sub-article elements in Zenodo and others.

Research Ideas and Outcomes (RIO) journal

The third stage of Pensoft's effort towards open science publishing was the launch of the Research Ideas and Outcomes (RIO) journal that publishes all outputs of the research cycle, beginning with research ideas; project proposals; data and software management plans; data; methods; workflows; software; and going all the way to project reports; research and review articles, using the most transparent, open and public peer review process (Mietchen et al. 2015). The RIO Journal publishes open science collections of various project or research cycle outcomes, with the EU BON project collection, entitled Building the European Biodiversity Observation Network (EU BON) Project Outputs, being a fine example.

The ARPHA-BioDiv workflow

Eventually, all these years spent in development of novel approaches to publication of biodiversity data resulted in a set of standards, guidelines, workflows, tools, journals and services which we define here as ARPHA-BioDiv: A Toolbox for Scholarly Publishing and Dissemination of Biodiversity Data (Fig. 2). The toolbox is designed to ease scholarly publishing of biodiversity and biodiversity-related data with special emphasis on the EU BON and GEO BON networks. Some of the features facilitating data publishing and dissemination within ARPHA-BioDiv are as follows:

  • Novel article formats (e.g. Data Paper, R Package, Species Conservation Profile)
  • Semantic tagging of the article content
  • Import of data into manuscripts
  • Content and Data Export from Published Articles
  • Data extraction and re-publishing workflow
  • Submission of manuscripts through an Application Programming Interface (API)
  • Creation and publication of Data Papers from Ecological Metadata Language (EML) metadata

Figure 2 ARPHA-BioDiv is a set of standards, guidelines, tutorials, tools, workflows, journals and services, designed to facilitate the scholarly publication and dissemination of biodiversity data.


Within AGINFRA+ examples of open data and open science workflow enumerated above were adapted. Additionally, while new ARPHA-based tools and workflows have been developed toward the goal of creating novel ways for automatic import and publishing of dataset in the field of agricultural and food sciences.

Refindit tool enabled for AGRIS

To reflect the needs of the agro-sector the ARPHA-integrated ReFindit reference management and conversion tool was integrated with FAO’s AGRIS database. Sector-oriented templates were also created within the ARPHA Writing Tool (AWT) to enable the ARPHA-XML workflow for the needs of open agricultural and food research.

Novel, sector-oriented templates

Open science workflows, including API-enabled data import/export functionality as described above for the ARPHA-BioDiv module were developed together with UoA for the needs of Aginfra+. Examples include:

  • FSKX (Food Safety Knowledge)
  • Applied Study
  • Emerging Technique
  • Data Paper
  • R Package

Automated conversion and import of FSK-ML data

Early on in the project, a guidance document was designed within AGINFRA for software developers and project managers that want to enhance their software tools with import and export functions for food safety models, simulations or food safety data or for those who want to develop new tools. (Fig.3)

Figure 3 Model metadata encoded in a standardized way within the FSK-ML file

Following in depth discussions within the consortium, it was decided to embed the FSK-ML metadata format within the workflows of one of the two open science publication venues of the project: the recently launched Food Modelling Journal (FMJ). The vision is for the journals to 1) promote the use of the FSK-ML format as a standard and thus increase reusability and interoperability of agricultural data; 2) promote data publishing by providing straightforward and intuitive workflows for converting raw data into a data paper.

A dedicated FSK-ML data import and conversion workflow for the FMJ journal was setup within the ARPHA publishing platform. The new workflow adheres to the standard, taking into account described structure and requirements of FSK-ML within journal article templates and metadata requirements. The workflow allows the re-use of data collected and deposited via the AGINFRA+ VRE and is also enabled for other data management tools based on KNIME, an open source software for creating data science applications and services (Fig.4).

Figure 4 A workflow schema illustrating the newly introduces FSK-ML conversion functionality.


A new open science journal specialised in publishing non-conventional but important outcomes of the research cycle such as datasets, software, models, methods and others in agricultural and food sciences were designed and launched using the workflows and tools available through the ARPHA Publishing Platform to catalyse publication of data from these fields, as well as to stimulate open science by increasing the use, transparency and efficiency throughout the research cycle.

The idea of the new journal was presented at the AGINFRA Project Meeting in Wageningen (Nov 2017) where names, focus and scope, templates and ideas for built-in tools and workflows were presented and discussed openly with project members.

The following names were suggested through brainstorming sessions:

  • Food and Agricultural Modelling Journal (FAMJ)
  • Agricultural and Food Modelling Journal (AFMJ)
  • Food, Agriculture and Supply Chain Modelling Journal (FASCMJ)
  • Food Modelling Journal (FMJ)

Out of these, initially Agricultural and Food Modelling Journal (AFMJ) was agreed to best accommodate the goals and the various aspects of the journal’s focus and scope.

At a later stage decision was taken to create one more journal to serve the needs of two different test studies within the project:

  • Food Modelling Journal (FMJ)
  • Viticulture Data Journal (VDJ)

Both journals already have a strong Editorial Board and growing communities around them and are planned to remain as self-sustainable open science publication venues for the respective communities. Individual websites have been created for both journals reflecting a carefully prepared visual identity pack (logo, colour palettes, fonts etc.). Website content, including about pages, focus and scope and terms and conditions have been created and updates on the website. Since their official start a number of additional discussions were held within the consortium to set priorities and define goals such as the collection and publication of the journals first issues and the promotional strategies towards sustainability of the two titles.

Another important point of discussion has been the implementation of various technologically advanced tools and workflows to ensure open data publishing and the right direction towards open science workflows. As an outcome of these discussions, as outlined in detail above) the ReFindit reference tool was updated to work with the specialised AGRIS database and a special workflow has been set up to allow the conversion and direct import of FSK-ML metadata from the AGINFRA+ VRE and other KNIME-based data management tools.


Тhe Food Modeling Journal (FMJ) is already fully functional and visible online (Fig.5) on the ARPHA Publishing Platform here:

Figure 5 FMJ’s homepage, available online at

Focus and scope:

Food Modelling Journal (FMJ) is an innovative open access journal which facilitates the publication of mathematical models and data sets in the area of food science. The journal is focussed on submissions documenting the following outcomes of the research cycle: data, models, software, data analytics pipelines and visualization methods relevant for modelling in food science. The journal will consider manuscripts for publication related (but not limited) to the following topics: food safety, food quality, food control, food defence, food design.

FMJ will consider the following categories of papers for publication:

  • Models on food safety and defence, food quality and control, food properties, design and production
  • Data analytics methods, workflows / pipelines and standards
  • Model application and validation studies
  • Software descriptions including software design concepts, description of tools and services

Publication of first issue of FMJ

The first issue of Food Modelling Journal is prepared to published together closely following the editorial explaining the value of open science and sharing intermediate results and data in the field of food science.


Filter M, Candela L, Guillier L, Nauta M, Georgiev T, Stoev P, Penev L (2019) Open Science meets Food Modelling: Introducing the Food Modelling Journal (FMJ). Food Modelling Journal 1: e46561.

FSKX Food Safety Knowledge - Executable model:

Desvignes V, Buschhardt T, Guillier L, Sanaa M (2019) Quantitative microbial risk assessment for Salmonella in eggs. Food Modelling Journal 1: e39643.


The Viticulture Data Journal (VDJ) is fully functional, with an Editorial Board already in place (Fig.6).

Figure 6 VDJ: homepage, available online at:

Focus and Scope:

Viticulture Data Journal (VDJ) is an innovative open access peer-reviewed journal which facilitates the publication of data, research articles and other research objects in the area of viticulture. The journal is focussed on submissions documenting the following outcomes of the research cycle: data, models, software, data analytics pipelines and visualisation methods in viticulture research area.

Viticultural research covers a wide range of topics, from genetic research, food safety of viticultural products to climate change adaptation of grapevine varieties through grape specific research. The journal will consider manuscripts for publication related (but not limited) to the following topics:

  • Phenotyping and genotyping
  • Vine growth and development
  • Vine ecophysiology
  • Berry yield and composition
  • Genetic resources and breeding
  • Vine adaptation to climate change, abiotic and biotic stress
  • Vine propagation
  • Rootstock and clonal evaluation
  • Effects of field practices (pruning, fertilization etc.) on vine growth and quality
  • Sustainable viticulture and environmental impact
  • Ampelography
  • Plant pathology, diseases and pests of grapevine
  • Microbiology and microbiological risk assessment
  • Food safety related to table grapes, raisins, wine, etc.

VODJ will consider the following categories of papers for publication:

  • Data Paper
  • Methods
  • Emerging Technique
  • Applied Study
  • Software Description
  • R Package
  • Research Article
  • Opinion Paper

Publication of the first issue of VDJ

The first issue of the Food Modelling Journal was published together with an editorial explaining the value of open science and sharing intermediate results and data in the field of food science.


Penev L, Stavrakaki M, Georgiev T, Candela L, Poni S, Savé R, Rusjan D, Biniari K, Pezzotti M, Neveu P, Stoev P (2019) Opening data and research objects in viticulture: The Viticulture Data Journal (VDJ). Viticulture Data Journal 1: e49717.

Data paper:

Biniari K, Daskalakis I, Bouza D, Stavrakaki M (2019) Comparative study of qualitative and quantitative characters of grape cultivar 'Mavrodafni' (Vitis vinifera L.) and 'Renio' grown in different regions of the Protected Designation of Origin Mavrodafni Patras. Viticulture Data Journal 1: e37852.



  • Continue building a community of journal editors for the two new journals FMJ and VDJ
  • Workflow for semantically enhance the published content using more ontologies
  • Deeper integration with VRE that facilitates the access of the users to the executable models directly from the published paper
  • Create new workflows and functionalities where necessary to reflect requirements within the domain
  • Add to the current domain-specific templates where necessary
  • Continue the integration of ARPHA and relevant for the domain resources.
  • Integration with the AGINFRA Semantic API (documented in D2.3)
  • Integration with the AGINFRA Search API (documented in D2.3)


  • Marketing collateral

STATUS: Introductory flyers have been created for both journals to be used in promotional activities.

  • Presentation at conferences and meetings Presentation at conferences and meetings Related events where the target community to develop around the journal will be present will be sourced and a calendar will be prepared, where the new journals will be present both in presentation and via sponsorship and stands. Preliminary discussions with the project members have already sourced the bi-annual l Conference of Predictive Modelling in Food (ICPMF), FAO Conferences and events organized by EEA and EFSA as possible means for outreach.

STATUS: The two journals were presented at a Pensoft stand during the Ecosystem Services Partnership (ESP) European Conference 2018, and the Ecosystem Services Partnership (ESP) World Conference 2019

  • Social media outreach

STATUS: Dedicated accounts for FMJ and VDJ were created in both Facebook and Twitter to reach out to the community via alternative outlets. Active sourcing of accounts and persons with interests and professional orientation related to the Focus and Scope of the journal is executed on an ongoing basis to make sure the accounts are reaching out the right stakeholders.

  • PR Campaign

STATUS: A press release was prepared and published via EurekAlert to promote the first issue of the Food Modelling Journal and Viticulture Data Journal. The press releases will be published in CORDIS Wire as well.

Respectively, a blog post promoting each of the journals was published in the Pensoft blog and the ARPHA blog.

  • Direct community outreach

STATUS: A series of email templates email have been already prepared to reach out directly to the community both via AGINFRA’s own and related projects and initiatives contact lists.

Food modelling journal

A detailed social media strategy was prepared in the pre-launch stage of the journal with a vision to achieve a higher interest and build a community around the journals prior to the publication of the first articles. A detailed research and analysis of relevant accounts has identified parties with a potential interest to receive news and engage with the Food Modelling Journal. The resulting list includes the following groups of recipients:

  • Food product modellers
  • Food processing experts
  • Nutrition specialists
  • Dietetics specialists
  • Agrifood specialists
  • Crop biodiversity and breeding informatics specialists
  • Foodtech researchers
  • Rural sociology & geography experts
  • Food & agricultural policy experts
  • Food crime and incidents specialists
  • Food fraud inspectors
  • Ethnobotanists
  • Soil scientists
  • Food geographers
  • Agricultural economist
  • Nature conservation and landscape ecology specialists
  • Plant pathology experts
  • Insect farmers
  • Agronomists
  • Rural sociology & geography experts
  • Nutritional epidemiologists
  • Food & waste researchers
  • Sustainable food researchers
  • Nutritional psychiatry researchers

Not for profit and scholarly organisations that are focused on various food researching topics were additionally identified.
Scientific news associated with the journal’s topic and published on the websites of reputable and well-known newspapers wer
e selected to prepare a campaign with engaging tweets, appealing to the wilder audiences. The topics discussed include agriculture innovation, nutrition, food revolution, future of food, urban farming, food security, food engineering, food systems etc.

Viticulture Data Journal

As part of the marketing strategy, related to the dissemination of the Viticulture Data Journal outcomes, potential recipients of the journal’s news and outcomes were identified. Direct and indirect sources from industrial and scholarly background that could be approached were identified through a detailed social media research, these including:

  • Viticulturists
  • Oenology engineers
  • Oenology researchers
  • Oenology scientists
  • Plant ecophysiologists
  • Wine market experts
  • Winemakers
  • Wine farmers
  • Winery owners
  • Grapevine genealogists
  • Vineyard managers
  • Sommeliers
  • Flavor chemistry scientists
  • Horticulturists
  • Agronomists
  • Agricultural economists
  • Fruit crops specialists

Additionally, same industry organisations, faculties and associations were also identified. They are engaged in the oenological and viticulture science and industry.

We have prepared a number of social media posts for Twitter, sourced from non-scientific topic-specific newspapers with global recognition. Those news articles describe curious and high-tech innovations in the wine industry as well as the climate change challenges that winemakers face and were selected to engage the wider audience and create a community around the journal.


Agosti D (2006) Biodiversity data are out of local taxonomists’ reach. Nature 439 (7075): 392–392. doi: 10.1038/439392a

Agosti D (2016) Where Do We Come From, Where Do We Go To? 20 Years Of Open Access To Biodiversity Knowledge. Zenodo doi: 10.5281/ZENODO.165979

European Commission (2016a) H2020 Programme Guidelines on FAIR Data Management in Horizon 2020. Accession date: 2017 2 09.

Mietchen D, Mounce R, Penev L (2015) Publishing the research process. Research Ideas and Outcomes 1: e7547.

Nielsen M (2011) Reinventing Discovery: The New Era of Networked Science. Princeton University Press, Princeton, N.J. [ISBN 978-0-691-14890-8]

Penev L (2017) From Open Access to Open Science from the viewpoint of a scholarly publisher. Research Ideas and Outcomes 3: e12265.

Penev L, Agosti D, Georgiev T, Catapano T, Miller J, Blagoderov V, Roberts D, Smith V, Brake I, Ryrcroft S, Scott B, Johnson N, Morris R, Sautter G, Chavan V, Robertson T, Remsen D, Stoev P, Parr C, Knapp S, Kress W, Thompson F, Erwin T (2010) Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples. ZooKeys 50: 1–16. doi: 10.3897/zookeys.50.538.

Penev L, Georgiev T, Geshev P, Demirov S, Senderov V, Kuzmova I, Kostadinova I, Peneva S, Stoev P (2017) ARPHA-BioDiv: A toolbox for scholarly publication and dissemination of biodiversity data based on the ARPHA Publishing Platform. Research Ideas and Outcomes 3: e13088.

Pontika N, Knoth P, Cancellieri M, Pearce S (2015) Fostering open science to research using a taxonomy and an eLearning portal. Graz, Austria, 21 - 22 October 2015. Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business (i-KNOW '15)