Task #8791

Project WP #654: WP6 - Supporting Blue Economy: VREs Development [Months: 1-30]

Project Task #656: T6.2 Strategic Investment analysis and Scientific Planning/Alerting VRE [Months: 1-30]

Project Activity #1633: Blue Economy VRE#2 Software Implementation, Integration and Deployment (Stage 2)

Task #1948: Design and implement algorithms for investment opportunity evaluation

Create a library that offers the sea surface temperature for a specific location/time

Added by Dimitris Katris almost 2 years ago. Updated almost 2 years ago.

Status:ClosedStart date:May 30, 2017
Priority:NormalDue date:
Assignee:Konstantinos Giannakelos% Done:

100%

Category:-
Sprint:WP06
Infrastructure:
Milestones:
Duration:

Description

In order to construct a performance model for a given location, the simul fish service needs to know the sea surface temperature during a year.

This library will be able to support this scenario by returning the temperature for a given location and period of time.


Related issues

Related to BlueBRIDGE - Project Activity #7598: Development of the data provider support library Closed Sep 06, 2017

History

#1 Updated by Dimitris Katris almost 2 years ago

  • Status changed from New to In Progress

#2 Updated by Dimitris Katris almost 2 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Closed

#3 Updated by Leonardo Candela almost 2 years ago

Rather than a library, what about realizing this as a dataminer process? This might improve the reuse potential.

Dunno where the data come from ...

#4 Updated by Dimitris Katris almost 2 years ago

We have used this dataset: http://marine.copernicus.eu/services-portfolio/access-to-products/?option=com_csw&view=details&product_id=SST_MED_SST_L4_NRT_OBSERVATIONS_010_004

The reason why we want to have the data packed in a library is firstly because this offered us the ability to provide a very fast implementation and secondly because we need these data to be used in the parallel execution of fitness functions. In order to do so efficiently our initial thought was to send them packed in a library instead of invoking another service from multiple spark nodes. At a later stage we have also thought of adding these data in the geoanalytics platform and fetch them from there. I am not really aware of what a data mining process would offer to us because these data are already analysed and available, so we just need to retrieve/read them from the library's resources and make them available to the simul fish grown library.

#5 Updated by Pasquale Pagano almost 2 years ago

It is unclear to me the way you are imaging to use this dataset. The dataset covers from 2008 to present but it will become soon obsolete if you package it in a library and then distribute it. If you register it properly either in the geoanalytics platform or in the SDI, you can get the value you need for a specific location/time by using WCS (Web Coverage Service).
If you decide to do so by exploiting the SDI, then we can configure a job that periodically update the data from Copernicus and you will always get up-to-date information.

#6 Updated by Gianpaolo Coro almost 2 years ago

Since the dataset was taken from Copernicus, it is a gridded NetCDF and could be remotely accessed through the OPeNDAP protocol. On DataMiner, we have a process to extract geospatial information from a NetCDF file hosted in the e-Infrastructure:

https://i-marine.d4science.org/group/biodiversitylab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.XYEXTRACTOR

And obviously, another to publish a NetCDF in the e-Infrastructure (indeed on the Thredds service http://thredds.d4science.org/thredds/catalog/public/netcdf/catalog.html), so that you can later reuse it in the process above:

https://i-marine.d4science.org/group/biodiversitylab/data-miner?OperatorId=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.RASTER_DATA_PUBLISHER

Provisioning a geospatial file with a client is a very ancient practice, that has gone extinct and is currently used only when a NetCDF file is too large, is unstructured, and has too high 4D resolution (e.g. Gebco bathymetry), which is not the case of SST.

#7 Updated by Dimitris Katris almost 2 years ago

Our ultimate goal is to do what Lino suggested, register the dataset to geoanalytics and fetch the data through WCS. We have recently added support for raster data (#7144) but we still miss some steps in order to be able to offer the dataset by the platform. If we do so, then we will switch the implementation of the library to get the data from this source and perform some caching/prefetching rather that have them embedded. But we also needed to facilitate the implementation of global performance model evaluation that I2S currently develops and offer this first version of the library fast so that they can proceed with their tasks.
Finally our intention is not actually to spend effort to set up an automatic procedure that refreshes the data. It was not very clear from the ticket description but the simul fish growth library doesn't really need to receive real time generated data or search for data for a specific datetime (let's say 03/01/2017). What this library needs is the temperature or oxygen level per fortnight by taking into account the mean value of three years. But it is not so crucial to have the latest measurements available and it is not in our short term plans to try to automate this procedure.

#8 Updated by Denis Pyriochos almost 2 years ago

Also available in: Atom PDF