|Status:||Closed||Start date:||Apr 18, 2016|
|Assignee:||Paolo Fabriani||% Done:|
#1 Updated by Julien Barde over 3 years ago
Ichtyop model can be driven by various types of data: satellite products (eg OSCAR above) or model outputs (ROMS, MARS3D, Drakkar).
OSCAR is intersting as it is global and quite light compared to the others but OSCAR is just 2D for spatial dimensions (no data in the water columns). At some point we could be interested to run some experiments at a regional scale where OSCAR data might not be sufficient.
Could you let us know what is the storage that can be made available to make these data available within the infrastructure ?
For example Drakkar model would require a dozen of TB.
#2 Updated by Pasquale Pagano over 3 years ago
D4Science assigns quota to each VRE. Usually the storage quota for each VRE is limited to 5 TB but we can easily increase that amount to 12 TB. However we need to plan this activity and report details on the needs such as:
- total amount of storage required
- when the activity starts and how the storage will be consumed, e.g. all the 12 TB must be available since day 1 of the VRE
- any other information that may help us in better provision the storage, e.g. the services used to access the data (is it Thredds? should they be indexed by GeoNetwork?)
#3 Updated by Paolo Fabriani over 3 years ago
- Status changed from New to In Progress
- File geoexplorer-oscar.png added
We're proceeding with the script to import OSCAR NetCDF in the D4Science Thredds service.
Each file in the catalog  is composed of 4 layers:
- Ocean Surface Zonal Currents (u)
- Ocean Surface Meridional Currents (v)
- Ocean Surface Zonal Currents Maximum Mask (um)
- Ocean Surface Meridional Currents Maximum Mask (vm)
We're importing each of them independently so that they will be visible as maps in the geoexplorer (see attached screenshot).
A preview of one of the imported files  is available on the our thredds development instance 
Regarding the size, OSCAR is currently 25GB, but the actual request on the server is 4-times bigger, since each file contains 4 layers imported separately. This leads of 100GB of needed space.
While we continue to work on the import script, please let us know any comment/suggestion you might have.
#4 Updated by Julien Barde over 3 years ago
At this stage we just need to check that we can make Ichthyop work within the infra with standard access to OSCAR data (instead of OPeNDAP access currently used).
As we succesfully deployed a first version of Ichthyop on the infra yesterday, it would be nice to have all OSCAR data within the infra ASAP.
To make a try we need you to send us the path to access these data from a R script. Something like /home/gcube/.../OCAR/oscar_vel2009_180.nc ?
Splitting the original netcdf in different files is not going to help in our case. These variables are supposed to stay together because they describe the components of vectors for the sea surface currents. Having separated files might be blocking for the execution of the current Ichthyop java code. So I anticipate that we will need the original netCDF files to be made available without any modification as delivered by NASA server.
We don't need the (meta)data to be displayed by Geonetwork or Openlayers for now. I still don't understand why you need to have one variable per netCDF file to visualize the data with openlayers while Thredds and godiva natively manages the visualization of multiple variables within openlayers:
So having all OSCAR data will be sufficient for now. There is no rush for additionnal data to be used (to get Ichthyop driven by other data like Drakkar). We are not ready now. You have samples of datasets below:
The storage for Drakkar High resolution model outputs should be around 12TB but if needed we can compress the dataset if needed with internal compression of netCDF files (by using nccopy command for example). It would then require only 4TB but with a increasing time for reading data. Anyway this is too early now.
We still need to work on the integration and execution of Ichthyop with SAI/Statman and see with you how the current code could be parallelized to run multiple simulations.
Storing more data than OSCAR will only be needed once we can run multiple simulations from a single experiment in Statman. We would like to run thousands of simulations at different dates.
I am not sure about which VRE should be used for Ichthyop simulations. We can use the stock assessment VRE first.
#5 Updated by Gianpaolo Coro over 3 years ago
let me explain the rationale of the proposed solution, so that we have a common ground for discussing. Having the content of the files published as maps would satisfy the KPI requirements of WP10 (#972). On the other hand, each file will also contain all the other layers inside. Files will be duplicated for performance reasons, in order to increase the visualisation and data extraction performance of other processes. Nevertheless, for your scopes they will contain all the information.
Thus, you will be able to retrieve the OPeNDAP links to the files containing all the layers inside. Search can be done using GeoExplorer. I would advice to keep using OPeNDAP for these files, which should be much faster when used via the infrastructure services. Further, you could even use a different file for each layer to further enhance access and retrieval performance.
In other words, the solution we would like to go for, should be useful both to the project and to the performance enhancements of the scripts.
#7 Updated by Paolo Fabriani over 3 years ago
The script for importing the oscar dataset is ready. It currently imports both the yearly .nc files as they are, and the four layers individually.
I've tested the script using the dev infrastructure. Unfortunately the host ran out of disk space; so you can only find there a partial import for 1993 (please disregard data for 1992, resulting from previous tests - to be removed) :
As soon as we'll fix the problem with the disk space, I'll import the whole dataset.
#8 Updated by Pasquale Pagano over 3 years ago
We are abusing of the development infrastructure. It is not designed to support end-user applications but just to test their functionality. So, if possible we should import the full dataset on production more than on the development infrastructure. In case you need the full dataset in development please estimate the amount of disk space you need and open a sub tasks to ask for additional disk space.
Moreover, I wish to understand how you run this script. Has it been designed to be executed as a task of the Executor? Is it executed manually?
#9 Updated by Paolo Fabriani over 3 years ago
Before moving to the production infrastructure I'd like to have a feedback to check data are ok or not. Probably the whole dataset is not needed to have such a feedback; but maybe more than one year is desirable. Julien, please let me know.
As of now, the dataset is imported manually with a Java client. If having it as a task of the Executor is valuable, we can work this direction.
#10 Updated by Julien Barde over 3 years ago
Data should be Ok as we have been already executing some simulations locally with these datasets. However it's worth to try it on the infra but I need you to indicate me what is the path for the dataset that I can use. For example: /home/gcube/thredds_catalog/OSCAR/oscar_vel1993_180.nc ?
#11 Updated by Gianpaolo Coro over 3 years ago
Hi @firstname.lastname@example.org , can we organize a Skype call with Paolo to understand what's going on with this activity?
Paolo has imported some files on the infrastructure in the dev environment that you can use.
We currently have some constraints on this activity:
1 - the NetCDFs should be possibly hosted on Thredds. In order to save resources, we cannot make the files always present on all the StatMan machines. On the other hand, having the files downloaded before the execution of the algorithms may introduce too large overhead for the executions.
2 - publishing NetCDFs on Thredds directly from the scripts is discouraged, since this may saturate the infrastructure resources too. Files should be produced as local to the execution so that StatMan will later save them (automatically) on the Infra Storage System. Only a selection of the NetCDF files should be published, using another algorithm, but not from the scripts. Furthermore, the publication algorithm is going to be modified in the next weeks in order to meet publication policies that are necessary to build the infrastructure catalogue.
Thus, before debugging any R code, I would like to ask you to evaluate the above constraints and maybe do some tests with the files imported by Paolo.
#12 Updated by Julien Barde over 3 years ago
yes we can skype this afternoon.
As I said we need a path to use the netCDF files directly on the file system of the infra. We don't want to use the netCDF with OPeNDAP. Could you provide it ?
We need to publish at least some netCDF files directly. In the example above netCDF files are very small.
At this stage we don't understand how we can work as we usually do with thredds on the infra.
#13 Updated by Gianpaolo Coro over 3 years ago
- File WPS Connector for R.zip added
Hi Julien, we need to talk about this, also for the sustainability of the solution you want to implement.
I also attach one example of R code publishing the NetCDF file you found issues with. I think the issue was that you were passing the HTML page of the file on Thredds, instead of the Http link to the real file.
E.g. this is the page you indicated (click on it):
But this is the http link to the file (click to test):
that you find as the "HTTPServer" link in the first page.
#15 Updated by Gianpaolo Coro over 3 years ago
- % Done changed from 0 to 80
All the OSCAR files have been imported on Thredds (thredds-d-d4s.d4science.org/thredds/dodsC/public/netcdf/) in the dev environment through the Dataminer and Data Transfer services. The files are indexed on GeoNetwork too (http://geoserver-dev2.d4science-ii.research-infrastructures.eu/geonetwork/srv/en/main.home).
They are ready to be used for testing purposes in the Ichthyop model. Later we will move them to the production environment.
#16 Updated by Julien Barde over 3 years ago
- File oscar-third_vel1992_2016-1.ncml added
As we can't access OSCAR files by using a usual file path, could you please aggregate all the files (one netCDF per year) within a single netCDF file in order to enable Ichthyop to access all OSCAR data through a single OPeNDAP link ?
You can generate such a netCDF for the whole serie with a NCML file like the one in attached file.
There is an example of expected result here:
Having all files packaged within a single netCDF file would be more simple as well to get a single metadata for the whole serie of images as well (and not one metadata sheet per netCDF file / per year).
I couldn't find OSCAR data on geonetwork
#17 Updated by Gianpaolo Coro over 3 years ago
the amount of data we have uploaded on Thredds is now 131 GB: you find both one NetCDF file for each variable and one NetCDF file containing all the variables for each year.
I will send you the credentials to access to geonetwork, but you can already see the maps and the meta using our GeoExplorer in the VRE (https://dev.d4science.org/group/devvre/geo1).
We are not familiar with NCML and NetCDF merging. I have to figure out how it works to merge NetCDF files using a NCML. It seems only a matter of building a NCML like the one you uploaded and make it available through Thredds, is it right?
#18 Updated by Julien Barde over 3 years ago
in this case we would like to use such a NCML file (virtual file pointing physical netCDF files) to create a physical one (some current limitations in Ichthyop data access methods). Here is a tutorial for wirting netCDF from NCML with java:
You need to edit the NCML file that I sent to adapt the repository path where netCDF files are stored (line 4). Let me know if it doesn't work (I can try to write a new NCML file pointing netCDF files to be merged by using OPeNDAP instead of file path).
#19 Updated by Julien Barde over 3 years ago
- File test_oscar_opendap_julien_BlueBridge.ncml added
in the attached file, you will find an example of NCML file aggregating remote files (through OPeNDAP) from BlueBridge.
The same file is used in our thredds server to give access to all your files from a single virtual file / OPeNDAP access:
It should be possible to use this virtual file to generate a single physical netCDF file for all OSCAR datasets (with a command like java -Xmx1g -classpath netcdfAll-4.3.jar ucar.nc2.dataset.NetcdfDataset -in myFile.ncml -out myFile.nc).
#21 Updated by Gianpaolo Coro over 3 years ago
- File ICHTHYOP.png added
Just as a note, I have just executed the Ichthyop model from QGIS. Attached is one example of input window and output. The output is delivered as a http link, that QGIS does not download automatically but stores in a table (automatically imported in the Project space). A screenshot is in attachment.
More information on how to configure QGIS to work with the infrastructure is here: #4070
#22 Updated by Julien Barde over 3 years ago
@email@example.com, please let me know if you need additional information to package all yearly files within a single netCDF file. The availability of this file is blocking for us as we have to run hundreds of simulations from 1993 to 2015. So we need all OSCAR images to be made available through a single file.
#24 Updated by Paolo Fabriani over 3 years ago
We've managed to create the merged netCDF file for all OSCAR datasets (year 1992 to 2016).
In particular, the command executed is:
java -Xmx1g -classpath log4j-1.2.17.jar:netcdfAll-4.6.5.jar:slf4j-log4j12-1.7.21.jar ucar.nc2.dataset.NetcdfDataset -in oscar-descriptor.ncml -out oscar-1999-2016.nc -isLargeFile -netcdf4
We had to add the last two options (-isLargeFile and -netcdf4) to cope with java exceptions probably due to sizes and dimensions of the dataset.
Here is the console output, not very much (also, no feedback is given during the execution, which took about one hour).
NetcdfDatataset read from oscar-descriptor.ncml write to /mnt/oscar-1992-2016.nc NetCDF-4 C library loaded (jna_path='null', libname='netcdf'). Netcdf nc_inq_libvers='188.8.131.52 of Dec 10 2015 16:44:18 $' isProtected=false finished=true
The file produced is about 4.4GB. As a reference, we also ran the command on few years (1992-1999; 1/3 of the whole dataset); the file size is 2.2GB.
Are those the expected sizes? How can we check the produced file is ok?
#26 Updated by Gianpaolo Coro over 3 years ago
When I run the script on the service I get the following:
NetcdfDatataset read from oscar_opendap_merger_test.ncml write to oscar-1999-2016.nc Exception in thread "main" java.lang.UnsupportedOperationException: Couldn't load NetCDF C library (see log for details). at ucar.nc2.jni.netcdf.Nc4Iosp.create(Nc4Iosp.java:2249) at ucar.nc2.NetcdfFileWriter.create(NetcdfFileWriter.java:804) at ucar.nc2.FileWriter2.write(FileWriter2.java:196) at ucar.nc2.dataset.NetcdfDataset.main(NetcdfDataset.java:1888)
Please @firstname.lastname@example.org , clarify which package you installed to make it work.
#27 Updated by Gianpaolo Coro over 3 years ago
- File oscar_opendap_merger - Local.ncml added
I have run the following instruction locally to the service, after installed the packages reported in #4245
java -Xmx1g -classpath log4j-1.2.9.jar:netcdfAll-4.6.6.jar:slf4j-log4j12-1.7.9.jar ucar.nc2.dataset.NetcdfDataset -in "oscar_opendap_merger - Local.ncml" -out oscar-1999-2016.nc -isLargeFile -netcdf4
The NCML is the one in attachment. I will report about the final result.
#28 Updated by Gianpaolo Coro over 3 years ago
We installed netcdf 4.1.3 libraries and run your command, but we obtain the following error:
ucar.ma2.InvalidRangeException: Illegal Range for dimension 3: last requested 1080 > max 481 at ucar.ma2.Section.fill(Section.java:179) at ucar.nc2.Variable.read(Variable.java:709) at ucar.nc2.Variable.read(Variable.java:683) at ucar.nc2.ncml.AggregationOuterDimension$DatasetOuterDimension.read(AggregationOuterDimension.java:772) at ucar.nc2.ncml.AggregationOuterDimension.reallyRead(AggregationOuterDimension.java:296) at ucar.nc2.dataset.VariableDS._read(VariableDS.java:537) at ucar.nc2.Variable.read(Variable.java:709) at ucar.nc2.Variable.read(Variable.java:655) at ucar.nc2.FileWriter2.copySome(FileWriter2.java:467) at ucar.nc2.FileWriter2.copyVarData(FileWriter2.java:386) at ucar.nc2.FileWriter2.write(FileWriter2.java:199) at ucar.nc2.dataset.NetcdfDataset.main(NetcdfDataset.java:1888) java.io.IOException: Illegal Range for dimension 3: last requested 1080 > max 481 at ucar.nc2.FileWriter2.copySome(FileWriter2.java:487) at ucar.nc2.FileWriter2.copyVarData(FileWriter2.java:386) at ucar.nc2.FileWriter2.write(FileWriter2.java:199) at ucar.nc2.dataset.NetcdfDataset.main(NetcdfDataset.java:1888) Exception in thread "main" java.io.IOException: Illegal Range for dimension 3: last requested 1080 > max 481 at ucar.nc2.FileWriter2.copySome(FileWriter2.java:487) at ucar.nc2.FileWriter2.copyVarData(FileWriter2.java:386) at ucar.nc2.FileWriter2.write(FileWriter2.java:199) at ucar.nc2.dataset.NetcdfDataset.main(NetcdfDataset.java:1888)
Paolo @email@example.com, we need to know exactly which packages (and versions) you installed in order to repeat your process on Thredds.
#29 Updated by Paolo Fabriani over 3 years ago
I ran the command on a CentOS 7 machine.
The three jars included in the command have been installed manually in the work directory:
log4j-1.2.17.jar slf4j-log4j12-1.7.21.jar netcdfAll-4.6.5.jar
The only package I installed from the centos repository is netcdf ver. 184.108.40.206
#32 Updated by Gianpaolo Coro over 3 years ago
- File oscar_opendap_merger.ncml added
Paolo, we have installed netcdf 4.4.0 and I'm using netcdfAll-4.6.5.jar but I still have the error.
Please, could you run your process using the ncml file I have just attached? This is supposed to be the same ncml you have used.
#33 Updated by Paolo Fabriani about 3 years ago
I've tried running again the command on different platforms (Fedora and Debian; but not CentOS which is no longer available to me) and I always got the exception you reported. Definitely, the run I reported as successful above wasn't actually OK, although I noticed no exception.
Now, I finally managed to create a merged file for the years 1992-2015. The file oscar-1992-2015.nc (9.2 GB) is available at:
Note that the file does not contain year 2016 as, apparently, the exceptions above are related to 'oscar_vel2016_180.nc'. Infact, leaving this file out, the command completes correctly.
@firstname.lastname@example.org, can you start using this file and check it's ok? Although not complete, I think it can still be useful to you.
I've tried re-importing 'oscar_vel2016_180.nc' using the dataminer, as done in the past, with no success. Here is the exception I got:
Error while executing the embedded process for: org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.RASTER_DATA_PUBLISHER org.n52.wps.server.ExceptionReport: Error while executing the embedded process for: org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.RASTER_DATA_PUBLISHER at org.gcube.dataanalysis.wps.statisticalmanager.synchserver.web.ExecuteRequest.call(ExecuteRequest.java:638) at org.n52.wps.server.request.Request.call(Request.java:1) at org.gcube.common.authorization.library.AuthorizedTasks$1.call(AuthorizedTasks.java:32) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Name is null at org.n52.wps.algorithm.annotation.AnnotationBinding$ExecuteMethodBinding.execute(AnnotationBinding.java:96) at org.n52.wps.server.AbstractAnnotatedAlgorithm.run(AbstractAnnotatedAlgorithm.java:54) at org.gcube.dataanalysis.wps.statisticalmanager.synchserver.web.ExecuteRequest.call(ExecuteRequest.java:607) ... 6 more Caused by: java.lang.NullPointerException: Name is null at java.lang.Enum.valueOf(Enum.java:235) at org.gcube.dataanalysis.geo.utils.GeospatialDataPublicationLevel.valueOf(GeospatialDataPublicationLevel.java:1) at org.gcube.dataanalysis.geo.algorithms.RasterDataPublisher.process(RasterDataPublisher.java:54) at org.gcube.dataanalysis.ecoengine.interfaces.StandardLocalExternalAlgorithm.compute(StandardLocalExternalAlgorithm.java:61) at org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mapping.AbstractEcologicalEngineMapper.run(AbstractEcologicalEngineMapper.java:418) at org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.RASTER_DATA_PUBLISHER.run(RASTER_DATA_PUBLISHER.java:29) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.n52.wps.algorithm.annotation.AnnotationBinding$ExecuteMethodBinding.execute(AnnotationBinding.java:89) ... 8 more java.lang.RuntimeException: Name is null at org.n52.wps.algorithm.annotation.AnnotationBinding$ExecuteMethodBinding.execute(AnnotationBinding.java:96) at org.n52.wps.server.AbstractAnnotatedAlgorithm.run(AbstractAnnotatedAlgorithm.java:54) at org.gcube.dataanalysis.wps.statisticalmanager.synchserver.web.ExecuteRequest.call(ExecuteRequest.java:607) at org.n52.wps.server.request.Request.call(Request.java:1) at org.gcube.common.authorization.library.AuthorizedTasks$1.call(AuthorizedTasks.java:32) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException: Name is null at java.lang.Enum.valueOf(Enum.java:235) at org.gcube.dataanalysis.geo.utils.GeospatialDataPublicationLevel.valueOf(GeospatialDataPublicationLevel.java:1) at org.gcube.dataanalysis.geo.algorithms.RasterDataPublisher.process(RasterDataPublisher.java:54) at org.gcube.dataanalysis.ecoengine.interfaces.StandardLocalExternalAlgorithm.compute(StandardLocalExternalAlgorithm.java:61) at org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mapping.AbstractEcologicalEngineMapper.run(AbstractEcologicalEngineMapper.java:418) at org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.transducerers.RASTER_DATA_PUBLISHER.run(RASTER_DATA_PUBLISHER.java:29) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.n52.wps.algorithm.annotation.AnnotationBinding$ExecuteMethodBinding.execute(AnnotationBinding.java:89) ... 8 more
We'll continue investigating the issue.
#34 Updated by Julien Barde about 3 years ago
#35 Updated by Gianpaolo Coro about 3 years ago
@email@example.com the RASTER_DATA_PUBLISHER algorithm has changed and now has a new parameter, that's why it does not work with your previous calls.
Nevertheless, for the activity in this ticket, we need only to change the data on Thredds, since the metadata are OK on Geonetwork.
Thus, I have downloaded the file again from the NASA website directly on the server.
Please, could you retry to run the merging process?
#36 Updated by Paolo Fabriani about 3 years ago
No success with the new file.
Same exception reported in #3563#note-28.
The produced file is oscar-1992-2016.nc, available at http://thredds-d-d4s.d4science.org/thredds/catalog/public/netcdf/catalog.html
#37 Updated by Gianpaolo Coro about 3 years ago
Hi @firstname.lastname@example.org, I think there is a general issue that goes beyond us, when merging the OSCAR files with the latest vel. 2016 file.
I think it is related to the information representation inside the file.
As also Paolo says, I guess if you can live without it.
#39 Updated by Paolo Fabriani over 2 years ago
- % Done changed from 80 to 100
- Status changed from In Progress to Closed
I guess this ticket can be closed.
@email@example.com , I you think it worth giving a further try to import data for 2016 ("oscar_vel2016_180.nc") and, maybe, initial data for 2017, please reopen it.