Task #1461

Project WP #629: WP4 - VREs Deployment and Operation [Months: 1-30]

Project Task #630: T4.1 BlueBRIDGE Infrastructure Operation [Months: 1-30]

Project Activity #1441: Integrate .Net algorithms on the BlueBRIDGE computing platform

Design the interface between the Statistical Manager and EwE

Added by Jeroen Steenbeek over 3 years ago. Updated 9 months ago.

Status:ClosedStart date:Sep 06, 2016
Priority:NormalDue date:Sep 22, 2016
Assignee:Jeroen Steenbeek% Done:

100%

Category:-
Sprint:WP05
Infrastructure:
Milestones:
Duration: 13

Description

We must determine how EwE is going to be controlled through the statistical manager.

I can foresee different strategies:
* If EwE is to be integrated as a black box model that performs batch runs, EwE could be configured via a control file that is written by the Statistical Manager. EwE output is written to known locations, and EwE can inform the statistical manager of produced results via an output summary file.
* If EwE is be integrated into a larger modelling complex that requires bi-directinal information exchange between model components over time, EwE could be launched as a persistent server application that remains alive until a run is over. The Statistical Manager and EwE could communicate via sockets, which forms the foundation for exchanging data and controlling time stepping.

Perhaps a Skype call is in order

EwECmdV3_64bit_results.tgz (428 KB) Daniele Pavia, Nov 26, 2015 12:26 PM

Christensen et al. - 2015 - The global ocean is an ecosystem Simulating marin.pdf (444 KB) Jeroen Steenbeek, Dec 09, 2015 11:34 AM

screenshot1.png - Main algorithm interface (98.6 KB) Gianpaolo Coro, Jan 20, 2016 04:02 PM

screenshot2.png - Algorithm output (138 KB) Gianpaolo Coro, Jan 20, 2016 04:02 PM

209
210

Subtasks

Task #4924: Connect EwE desktop software to the BB cloud for running ...RejectedPaolo Fabriani

Task #4939: The authorization service has to return the email of the ...RejectedLucio Lelii


Related issues

Related to BlueBRIDGE - Project Task #688: T9.2 Data Analytics Facilities [Months: 1-29] Closed Dec 18, 2015 Jan 18, 2018
Related to D4Science Infrastructure - Task #1368: Install and configure "Mono" to run EwE stock assessment ... Closed Nov 16, 2015

History

#1 Updated by Pasquale Pagano over 3 years ago

I added GP and I to this ticket since I assume that the discussion will be mainly done with GP while I wish to get involved in this discussion.

#2 Updated by Jeroen Steenbeek over 3 years ago

A new EwE test application and execution instructions are available in my workspace for testing: [[[http://goo.gl/mx1lYb]]]

This new test application is controlled via a rudimentary run script that soon may be managed by the Statistical Manager. The test application also tries to create output directories, and fills these directories with outout from the Ecosim temporal module of EwE. We will need such functionality in a near future.

When this application is tested on Mono I would like to see the contents of:
- The run_log.txt file
- The Georgia_Strait_log.xml file, if created
- The files in the Georgia_Strait directory, if created

#3 Updated by Gianpaolo Coro over 3 years ago

  • Assignee changed from Jeroen Steenbeek to Daniele Pavia

Daniele (@daniele.pavia@eng.it), please could you execute Jeroen's new program on Mono? You find it in our shared folder.

#4 Updated by Daniele Pavia over 3 years ago

Hi Jeroen, you'll find the results you asked for in attachment.

#5 Updated by Gianpaolo Coro over 3 years ago

  • % Done changed from 100 to 10
  • Assignee changed from Daniele Pavia to Jeroen Steenbeek

#6 Updated by Jeroen Steenbeek over 3 years ago

Daniele Pavia wrote:

Hi Jeroen, you'll find the results you asked for in attachment.

Hi Daniele,

Thank you, the output and run log containt exactly what I was hoping for. EwE can read local files, can create output files in its own directories, can run in both 32 and 64 bit app mode on Unix, and can be configured via an external config file. The foundation for executing EwE batch runs through the Statistical Manager is in place.

Next, a discussion needs to be had how we want to run and use EwE.

#7 Updated by Gianpaolo Coro over 3 years ago

Dear Jeroen (@jeroen.steenbeek@gmail.com ), in ticket #1441 we are going to discuss about the best way to do this integration, now that we have clarified that your test programs work. Stay tuned on that ticket, we will later report the result here.

#8 Updated by Jeroen Steenbeek over 3 years ago

For testing the integration of EwE into BB we will use a case study developed by Dr. Villy Christensen (paper attached). This model, the Global Ocean model, represents the global ocean at 1 degree resolution, and is set up to be driven by various spatial-temporal datasets.

The Global Ocean case study provides clear direction and data requirements for initial BB development. Later on, when integration is complete, we can expand the BB score to include other case studies, connecting to other datasets. The final aim (as far as I am concerned) is an open data infrastructure connectivity that can cater to any spatial resolution, temporal resolution, and research question.

Shall we open a specific ticket to implementing this case study in BB?

Paper reference: Christensen, V., Coll, M., Buszowski, J., Cheung, W.W.L., Frölicher, T., Steenbeek, J., Stock, C.A., Watson, R., and Walters, C.J. (2015). The global ocean is an ecosystem: Simulating marine life and fisheries. GEB 24, 507–517.

#9 Updated by Gianpaolo Coro over 3 years ago

Yes please, open another ticket for this.

#10 Updated by Gianpaolo Coro over 3 years ago

Dear @jeroen.steenbeek@gmail.com , we just had a Skype meeting with Engineering (Paolo Fabriani) to assess the best way to integrate EwE with the Statistical Manager (SM).

Two possible solutions we found are:
1 - developing one generic SM method that accepts a user-defined model file (e.g. Georgia_Strait.eiixml) and a configuration file (e.g. run_config.xml) and produces a zip file containing the complete output of EwE. The input file should be uploaded on the Statistical Manager and later indicated to the algorithm through the web interface.
2 - developing one SM method for each EwE algorithm, by mapping each configuration file onto SM method inputs.

Solution 1 has the advantage to have fast implementation, but the drawback that every algorithm run needs a new configuration file to be uploaded onto the Statistical Manager.
Solution 2 has the advantage of having nicer web interfaces and easier algorithm setup, but the drawbacks are several: one example of configuration file should be provided for each EwE model, EwE inputs should be data-typed (currently they are not) otherwise we need to develop heuristics to guess their types or the nomenclature, required implementation time could be high.

Thus, we will first go for Solution 1, if you agree, and later decide to possibly refine it.
I remind that this integration will bring, anyhow, the following benefits with respect to the current desktop approach:

1 - Multi-users management
2 - Provisioning of the models as-a-Service
3 - Virtual Research Environment mechanisms enabled, to publish the algorithms for selected users only
4 - Storing output on a high-availability storage system
5 - Large requests load management, via multiple services hosting
6 - Automatic generation of a web user interface
7 - Standard xml-based Web Processing Service interface, to invoke the algorithms via REST communication and through other software (e.g. QGIS)
8 - Management of provenance information for the output, to allow other people produce the same results
9 - Data sharing and publishing facilities

We are open to a discussion and to better clarification also via Skype call.

#11 Updated by Jeroen Steenbeek over 3 years ago

Dear @gianpaolo.coro@isti.cnr.it, @julien.barde@ird.fr

Thanks for this. Solution 1 sounds by far the fastest path to an implementation, yes, let's take this path. Let's start simple by requiring every user to upload a model and run config file for every run, and to receive a zip file with results.

I would like to start designing this path on the EwE end, where the upload and download process is automated from the EwE desktop software. In the meantime I start working on extending the run config logic to accept desriptions of spatial-temporal driver sets.

Two extensions of Solution 1 that we'll have to keep in mind:
* Under BB we want to make a series of famous models available for running. Instead of uploading their own models, users should also be able to target a pre-uploaded model in their run_config.xml file. This will require a system where BB distributes a list (XML) of famous models that it hosts, and where EwE can somehow find and load those models. As I described above, the Christensen Global Ocean Model should be the first famous model to use for testing;

* We have not ran yet the full spatial model (Ecospace) of EwE on the infrastructure. Ecospace can be driven by any number of spatial-temporal data sets (e.g., series of maps). These drivers can be obtained from files that the user uploads, or from data connections provided by BB. The run_config.xml file will be extended to describe these connections. Uploading these files will take awfully long, especially if this needs to be done again for every run. We may want to extend Solution 1 by consider some kind of "reuse" system (akin to web sessions?) where users can re-order an older run with just new run instructions, using the same data and model that they uploaded before.

An unknown:
* EwE can be bi-directionally coupled to other models. If we need this on BB, we have to keep in mind that we will need to build support for centralized time stepping control. Just something to think about for the future.

#12 Updated by Gianpaolo Coro over 3 years ago

Dear @jeroen.steenbeek@gmail.com is it possible to separate the model_file variable from the run_config.xml file? In other words, is it possible to indicate the list of models to use as an input to the EwE program and keep only the experimental variables in the run_config.xml? E.g. executing EwE (in the case of two model files required) as

EwE.exe -m Georgia_Strait1.eiixml -m Georgia_Strait2.eiixml -c run_config.xml

This would make invocation more modular and would require the user to upload the models only once, and to later upload several run_config files to execute experiments on the same model.

#13 Updated by Jeroen Steenbeek over 3 years ago

Dear @gianpaolo.coro@isti.cnr.it

Theoretically yes, but the run config file will be closely tied to a particular model especially if we start driving the model with spatial temporal timeseries or allow users to tweak specific parameters. This must be specified in a config file that is 100% catered to a specific model; uncoupling model and parameters can lead to disaster.

I would rather say that:
1. Uses select which model to run via the EwE Desktop (or via other UIs such as a BB web front-end). All configuration user interfaces should have access to a list of BB hosted models
2. The list of hosted models also contains descriptive metadata information for each hosted model. This metadata explains how a model can be configured and driven (e.g., which spatial-temporal driver layers does the model accept, which parameters can be tweaked, which spatial and temporal resolution Ecospace scenarios does a model contain, which functional groups does the model have, which biomass time series are available, etc). Model configurability depends on the purpose for which a model was built, and will be determined per model in collaboration with the original model author.
3. Through a run configuration UI users pick a model from the model list, and then configure a BB EwE run via the model metadata and the model metadata. This information produces a run config file that is sent to BB, and is directly consumed by the EwE console exe upon execution.

What do you think? Perhaps I am misunderstanding your point, we can chat by Skype later today.

If we need to execute batch runs of multiple models, or slightly different runs of the same model for uncertainty analysis, we could incorporate that in the config file by including either multiple run blocks (model + parameter tweaks), or by adding a structure that requests X runs (100, 1000, 10.000, ?) specifying confidence intervals and sample distributions for parameter values.

This leads me to believe that designing a structure for the run_config.xml file is essential IF we want to provide a web-based management UI under BB. If the run_config.xml file is only EwE's problem then this design is of less priority.

#14 Updated by Gianpaolo Coro over 3 years ago

Hi Jeroen,
what you are describing is very close to the Web Processing Service (WPS) standard. This OGC standard was born to (i) get a remote list of models available as-a-Service, (ii) to get descriptions of their inputs and outputs and (iii) execute them. WPS clients are already available for several programming languages and software (e.g. QGIS).

For example, here is one for Python: http://pywps.wald.intevation.org/
Here one for Java: http://52north.org/communities/geoprocessing/wps/clients/index.html
Here for R: #1129

Generally speaking, it is a very simple REST interface, by means of which you can invoke an algorithm even just from the web browser. Obviously, in BB we publish algorithms under this standard and we will do also for EwE.

Just to give you some examples:

Example of call for the list of algorithms available in the "devsec" VRE in BB:

http://dataminer1-d-d4s.d4science.org/wps/WebProcessingService?Request=GetCapabilities&Service=WPS&gcube-token=d7a4076c-e8c1-42fe-81e0-bdecb1e8074a

Example of call for inputs and outputs descriptions of one of the algorithms (BiOnym for one species search=BiOnym Local):

http://dataminer1-d-d4s.d4science.org/wps/WebProcessingService?Request=DescribeProcess&Service=WPS&Version=1.0.0&gcube-token=d7a4076c-e8c1-42fe-81e0-bdecb1e8074a&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.BIONYM_LOCAL

Execution of the algorithm:

http://dataminer1-d-d4s.d4science.org/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=d7a4076c-e8c1-42fe-81e0-bdecb1e8074a&lang=en-US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.BIONYM_LOCAL&DataInputs=Matcher_1=LEVENSHTEIN;Matcher_4=NONE;Matcher_5=NONE;Matcher_2=NONE;Matcher_3=NONE;Threshold_1=0.6;Threshold_2=0.6;Accuracy_vs_Speed=MAX_ACCURACY;MaxResults_2=10;MaxResults_1=10;Threshold_3=0.4;Taxa_Authority_File=FISHBASE;Parser_Name=SIMPLE;MaxResults_4=0;Threshold_4=0;MaxResults_3=0;MaxResults_5=0;Threshold_5=0;Use_Stemmed_Genus_and_Species=false;Activate_Preparsing_Processing=true;SpeciesAuthorName=Gadus%20morhua

Shall we have a call on Monday 14 at 11.00 CET?

#15 Updated by Jeroen Steenbeek over 3 years ago

Hi @gianpaolo.coro@isti.cnr.it

Ok, even better. It would be great if the entire process of remotely running EwE can be orchestrated via WPS (just as @julien.barde@ird.fr suggested for connecting EwE to taxonomy searches and species trait data). I am all ears. This would also facilitate using different User Interfaces to configure runs (using EwE, using a web interface on BB, etc) as long as the resulting run package consists of a run script + parametersization, optionally a model, and optionally a series of spatial-temporal time series.

This sounds nice, clean, and highly extensible to me, and building a WPS interface for EwE to configure and run models on BB should be doable. We just need to define the protocols well, but let's make sure that the EwE WPS integration for remote execution follows standards already in place at BlueBRIDGE

Yes please to a Skype call, but can we do it a bit later, say 2pm GMT+2 / 1pm GMT+1?

#16 Updated by Gianpaolo Coro over 3 years ago

During the long Skype call of today some points were clarified:

  1. EwE is made up of a sequence of software components: Ecopath, Ecosim and Ecospace. These component cannot be factorized in the form of a workflow, since their are tightly combined,
  2. A "model" in EwE is a configuration of one of more of the above components. Building one of these configurations may require long time (even one year or more),
  3. In the planned integration for BB, EwE will feed models with GIS data from the BB geospatial e-Infrastructure and with taxa information from SPD,
  4. Invoking models and retrieving information from the e-Infrastructure requires providing every process via WPS. The BB GIS information catalog (GeoNetwork) should be directly accessed by EwE-desktop,
  5. Models require geospatial information at a certain resolution and in ASCII format. A process to produce ASCII GIS data with provided resolution from other formats is required on StatMan,
  6. EwE models require running on the StatMan machines directly.

The integration plan should go through several phases with increasing difficulty:

  1. integration of the base software provided by Jeroen,
  2. evaluation of the proof-of-concept algorithm on StatMan resulting from step 1,
  3. identification of the parameters to display on the algorithm interface. These are the parameters that are shared between all the EwE algorithms,
  4. creation of separate interfaces (connectors) for "famous" EwE algorithms. This will require to add a new set of input parameters in the implementation of point 3. The rest of the implementation should remain the same,
  5. provisioning of all the above steps via WPS,
  6. feeding models with GIS and taxa data.

@pasquale.pagano@isti.cnr.it : since Jeroen (@jeroen.steenbeek@gmail.com) will be at the BB technical meeting (TCOM) in January, discussion on EwE may deserve one separate session, involving also a practical demonstration of EwE-desktop.

#17 Updated by Jeroen Steenbeek over 3 years ago

I am still waiting to find the time to select a working version of the Global Ocean Model with Villy Christensen. Hopefully we can start moving on this the first working week of the new year.

#18 Updated by Gianpaolo Coro about 3 years ago

#19 Updated by Gianpaolo Coro about 3 years ago

  • Related to Task #1368: Install and configure "Mono" to run EwE stock assessment algorithms added

#20 Updated by Gianpaolo Coro about 3 years ago

Dear Jeroen @jeroen.steenbeek@gmail.com, thanks to the great work of @paolo.fabriani@eng.it we now have an algorithm running EwE on the e-Infrastructure, endowed with the general interface we discussed about. I attach two screenshots of the algorithm GUI and of the output. The algorithm is currently available on our development e-Infrastructure at this address:

https://dev.d4science.org/group/devvre/statistical-manager (search for "Ecopath with Ecosim")

Perhaps, a rapid Skype call could clarify how it works. Once you agree with the interface and the output, I can enable the algorithm as a WPS service too.
Behind the scene, the algorithm is smart enough to (i) download the EwE exe directly from the Workspace (from this shared folder https://goo.gl/2HMf4j), (ii) execute the software on the input provided, (iii) save the output and delete the local software dump. This means that it directly uses the EwE version uploaded on the Workspace (i.e. there is no need to re-deploy the software on our machines). In other words, using a new EwE version is just a matter of overwriting a file on the Workspace and modifying the Version.txt file.
Finally, the name of the model is read from the run_config.xml file, thus the algorithm assumes the "model_file" tag always exists.

I think we have done a step forward on this topic. We will discuss at the TCOM about this, but your feedback is crucial.

#21 Updated by Jeroen Steenbeek about 3 years ago

Dear @gianpaolo.coro@isti.cnr.it and @paolo.fabriani@eng.it,

This is great work, a good step forward - especially the structure that obtains the executable on the fly is a major improvement. I do not have access to the gCube infrastructure to test this, but it looks promising.

The run_config.xml file indicates which model parameter set to load (e.g., Georgia_Strait.eiixml). Screenshot 1 shows that both files have to be provided to order a run, which is a logical start. We discussed that in the future users can also specify to run a 'famous model' that is already available on the BB infrastructure. In that case users will not be able to provide the .eiixml parameter file for upload; the file already exists and users must indicate in the run_config.xml file that the famous model should be run instead. Does that make sense? We can design this change at a later stage.

Perhaps we can also look into using versioning to our advantage. What if:
* the name of the EwE zip file indicates which version of EwE it contains? Right now the EwE zip file is called EwECmd.zip and the version.txt file contains a line "V3_64bit". We can combine this, and rename the zip file to V3_6bbit.zip instead
* the run_config.xml file gets an additional XML tag V3_64bit, that informs the statistical manager which executable should be run.
This way we can even offer different console applications for different purposes through the same run interface.

Perhaps we should discuss by Skype soon. This week my availabilitu is limited to mornings only because of daycare.

#22 Updated by Gianpaolo Coro about 3 years ago

Hi Jeroen, can we talk on Friday at 11.00 am CET?

#23 Updated by Jeroen Steenbeek about 3 years ago

Hi @gianpaolo.coro@isti.cnr.it, 11am on Friday will be perfect.

#24 Updated by Gianpaolo Coro about 3 years ago

OK for tomorrow morning.

#25 Updated by Jeroen Steenbeek about 3 years ago

@gianpaolo.coro@isti.cnr.it, looking at your impressive presentation about the Statistical Manager deployment tool for R scripts, I think we could/should do something similar for "famous" EwE models.

I think that famous model authors could be given some autonomy uploading their own famous model to the infrastructure. From an uploaded model.xml file we can easily distill configurable elements within a model. These configurable elements could be displayed in the EwE VRE. Via this VRE UI users interested in executing the famous model can make choices, from which a run_config.xml file is generated.

I am making note of the idea here to lodge it into our discussion. Let's assess if this is how we want to continue.

Great, great work on the VREs and Statistical Manager. I am very impressed

#26 Updated by Gianpaolo Coro about 3 years ago

Dear @jeroen.steenbeek@gmail.com as follow up to your last post, here are my considerations: the WPS version of the EwE algorithm is going to accept both a configuration and a model file from a client. Alternatively, these files could be available at a public http URL (WPS accepts both the ways). The interface you are describing, to build the run config file, would be ad hoc for EwE. Thus, I wonder if you want this interface to be embedded into EwE or you want a web interface as an alternative. Please, consider that the current BlueBRIDGE workspace is already a means to store files and get public URLs. Overall, we are ready to discuss about this, please let us know once you have a clearer view of the best approach.

#27 Updated by Jeroen Steenbeek about 3 years ago

Dear @gianpaolo.coro@isti.cnr.it,

You are correct that the interface to create the configuration file for running a famous model is somewhat ad hoc, but it is conceptually not any different from the interfaces generated for algorithms on the Statistical Manager.

Similar to these algorithms, each famous EwE model has the same subset of variables that can be tweaked through the configuration file, but the values of the parameters vary for each famous model. To illustrate: model A contains 1 Ecospace scenario, where external driver layers 'PP' and 'SST' can be driven with external data. Model B may have 2 Ecospace scenarios at different spatial resolutions, with only one driver layer 'PP'. Model C may have 2 Ecospace scenarios that each cover different time spans, with three driver maps 'PP', 'SST' and 'Salinity'

Building a user interface within EwE to provide the user with adequate choices for configuring a BB run is relatively straight forward. However, if we want to execute famous models from outside the EwE desktop application - directly from a web page - it will benefit usability if the users can configure an EwE run directly from a web page rather than having to manually construct and upload a configuration file, or having to install the EwE Desktop software.

Does that make sense?

I would consider this web-based configuration a low priority target. First we need to get the EwE Desktop Software path to work, and get the Global Ocean Model up and running. Once these two are operational we can perhaps start considering this web-based configuration UI. And as I offered at TCOM I would be more than happy to assist in its construction or take the coding upon myself.

#28 Updated by Pasquale Pagano over 2 years ago

  • Start date changed from Sep 04, 2016 to Sep 06, 2016
  • Due date changed from Sep 30, 2016 to Sep 22, 2016

due to changes in a related task

#29 Updated by Pasquale Pagano 9 months ago

  • Status changed from In Progress to Closed

Also available in: Atom PDF