Project

General

Profile

Actions

Feature #21995

open

GRSF public VRE - change behavior to access time dependant data

Added by Aureliano Gentile 4 months ago. Updated about 1 month ago.

Status:
Feedback
Priority:
High
Target version:
Start date:
Sep 13, 2021
Due date:
% Done:

90%

Estimated time:

Description

According to the agreed actions discussed here https://support.d4science.org/projects/stocksandfisherieskb/wiki/21-06-22-GRSF_webinar#Actions

Please we would need to change the behavior of the public GRSF VRE https://i-marine.d4science.org/web/grsf/data-catalogue:

  • no time-dependent data should be publicly displayed, i.e. the sections "Stock Data" and "Fishery Data" should be removed from the public access.

The current behavior for the section "Data and Resources" is already fine as it is. (A login is prompted when clicking on time series files)


Files


Related issues

Related to StocksAndFisheriesKB - Bug #22310: Missing Scientific Advice Resource in some GRSF RecordsClosedLuca FrosiniOct 27, 2021

Actions
Actions #1

Updated by Pasquale Pagano 4 months ago

  • Status changed from New to In Progress
  • Assignee changed from Francesco Mangiacrapa to Aureliano Gentile

The Stock Data section reports metadata published in the GRSF record. We cannot manage access policies on portion of metadata.

If you need that Stock Data as the following should not be shown, we need to republish all the records.

Stock Data
Field   Value
Abundance Level     1.13076923076923 [Unit: BdivBmsypref - Rep. Year or Assessment Id: WGHMM-SOLEVIIIab-1982-2013-ICESIMP2016 - Ref. Year: 2012 - Data Owner: WGHMM - DB Source: RAM]
Abundance Level     0.907692307692308 [Unit: BdivBmsypref - Rep. Year or Assessment Id: WGHMM-SOLEVIIIab-1982-2013-ICESIMP2016 - Ref. Year: 2009 - Data Owner: WGHMM - DB Source: RAM]

Please note that if this information will be removed from the metadata, it will disappear also for registered users.

Actions #2

Updated by Aureliano Gentile 4 months ago

Dear all, a kind reminder, is it possible to go ahead and remove the "Stock Data" and "Fishery Data" sections from the records of the GRSF public VRE?
From Lino's above comment I understand it is matter of republishing. I guess it can be done now, considering that the FIRMS TWG is over and the Steering Committee will be on the 18-21 of October, so it would be good to proceed in this week. With thanks in advance.

Actions #3

Updated by Pasquale Pagano 4 months ago

Francesco Mangiacrapa and Yannis Marketakis, please make a plan for the activities requested by Aureliano. If the requirement is not clear, please comment this issue.

Actions #4

Updated by Yannis Marketakis 4 months ago

  • Assignee changed from Aureliano Gentile to Yannis Marketakis

We will start re-publishing the public records (in GRSF public VRE) without time-dependent information right away.
I'll let you know as soon as update is completed.

Actions #5

Updated by Yannis Marketakis 4 months ago

All (but one) the public records have been updated in GRSF VRE without their time-dependent resources.

The only one for which we faced an issue is https://data.d4science.org/ctlg/GRSF/71a730e8-4fbb-30a2-bef8-2c19ded17e73, for which a HTTP 500 error occurs from https://ckan-grsf.d4science.org/api/3/action/package_update.
The detailed log, as well as the JSON representation of the record, are attached
Francesco Mangiacrapa do you have any clue about this?

Actions #6

Updated by Francesco Mangiacrapa 4 months ago

Hi Yannis Marketakis

thanks for your feedback.

Yannis Marketakis wrote in #note-5:

All (but one) the public records have been updated in GRSF VRE without their time-dependent resources.

The only one for which we faced an issue is https://data.d4science.org/ctlg/GRSF/71a730e8-4fbb-30a2-bef8-2c19ded17e73, for which a HTTP 500 error occurs from https://ckan-grsf.d4science.org/api/3/action/package_update.
The detailed log, as well as the JSON representation of the record, are attached
Francesco Mangiacrapa do you have any clue about this?

Luca Frosini could you check the logs? Using the JSON source attached, you can submit the update and check the logs

Actions #7

Updated by Pasquale Pagano 4 months ago

Yannis Marketakis, I think that there is a big mistake here. The CSV files have to be published while the time dependant data in the metadata should not.
Checking the new status of GRSF, I saw that the resources, i.e. CSV, are not anymore published and this is wrong.
Could you check and confirm?

Actions #8

Updated by Yannis Marketakis 4 months ago

Pasquale Pagano from our side we cannot do it that way.
For each record, we are providing a single JSON object (with or without time-dependent data).
The CSV are resources are constructed @ GRSF publisher side from these JSON objects

Actions #9

Updated by Aureliano Gentile 4 months ago

I confirm there is some misunderstanding, when we made the ticket we intended: "remove from the record page the sections "Stock Data" and "Fishery Data""

while the CSV files etc under Data and Resources can stay. And actually it is important to be visible so to show the kind and amount of data available for each record.

While if you see this example https://data.d4science.org/ctlg/GRSF/2a636ae3-b8f8-3b6f-ac92-0abe6bfa3993 the Stock Data is still there with information on assessment.

Thanks a lot for your patience for solving the matter.

Actions #10

Updated by Yannis Marketakis 4 months ago

Thanks, Aureliano Gentile for further clarifying that.
As I described we cannot distinguish information shown under Stock Data and Data and Resources, since we are providing them in a single JSON file.
The Publisher inspects that information from the JSON file and constructs the CSV resources. So, the only way to handle this is at the publisher side.

Actions #11

Updated by Pasquale Pagano 4 months ago

We need to change the implementation of the service to handle it then. We are evaluating the work to do and we will let you know asap.

Actions #12

Updated by Francesco Mangiacrapa 4 months ago

Hi Aureliano Gentile and all,

just to be sure of the new requirement: "no time-dependent data should be publicly displayed, i.e. the sections "Stock Data" and "Fishery Data" should be removed from the public access.".

"...from the public access", do you mean that (all the time dependant fields of) the sections "Stock Data" and "Fishery Data" should only be removed from GRSF Catalogue (that is public)? Should they remain in the GRSF_ADMIN and GRSF_PRE Catalogue (that are private)?

Actions #13

Updated by Aureliano Gentile 4 months ago

Yes, no changes in GRSF_ADMIN and GRSF_PRE Catalogue VREs.

The sections "Stock Data" and "Fishery Data" should be removed from all record pages only in the public GRSF VRE.

The section "Data and Resources" stay as it was until yesterday, prompting a login in case a user clicks on a csv file, on a link, etc.

With many thanks
A.

Actions #14

Updated by Aureliano Gentile 3 months ago

Dear Francesco Mangiacrapa and colleagues, the FIRMS FSC12 meeting is over and the accomplishment of this ticket would be needed. Would you please provide an update of where we stand? Do you need any further clarifications?
With thanks in advance,
Aureliano

Actions #15

Updated by Yannis Marketakis 3 months ago

For your information, we've updated all the records in GRSF VRE with their time-dependent information attached.

Actions #16

Updated by Luca Frosini 3 months ago

Hi Yannis Marketakis

I have patched the service to skip publishing such data in the configured VRE.
I have also migrated the code to store the dataset in the workspace to StorageHub (the service which replaced the old HomeLibrary which is going to be dismissed in the next months).
You can validate the new version of the service by using the GRSF service available in the pre-production infrastructure (available only for developers only).

Here is the pre-production portal
https://pre.d4science.org/

You have been already added to the GRSF_Pre VRE (in the preproduction infrastructure) which must not be confused with the GRSF_Pre VRE available in the production infrastructure.
https://pre.d4science.org/group/grsf_pre

The grsf-publisher-ws service to use is available at
https://smart-grsfpre.pre.d4science.org

I going to send you privately via email the token of grsf.publisher user to be used to contact the service.

Actions #17

Updated by Yannis Marketakis 3 months ago

THanks Luca Frosini

I'll test it right away.
I'll keep you posted about the results.

Actions #18

Updated by Yannis Marketakis 3 months ago

Thanks Luca Frosini . I just tested it and I confirm that it works for most of the time-dependent types.
The only cases for which I noticed that the time-dependent data are still there (under the additional info section) are:

I guess we should hide those as well. Aureliano Gentile could you please confirm?

Actions #19

Updated by Aureliano Gentile 3 months ago

yes I confirm, many thanks.
All time-dependent data should be hidden, i.e. the sections "Stock Data" and "Fishery Data" should be made available only upon registration (no longer a self-registration but with authorization).

Actions #20

Updated by Luca Frosini 3 months ago

Aureliano Gentile cannot access the VRE in the preproduction infrastructure which is for developers only. I created some screenshots so he can check the records.

Yannis Marketakis you are right I didn't consider such fields but just the fields inside the "Stock Data" and "Fishery Data" as indicated by the following comment:

Aureliano Gentile wrote in #note-2:

Dear all, a kind reminder, is it possible to go ahead and remove the "Stock Data" and "Fishery Data" sections from the records of the GRSF public VRE?

Aureliano Gentile just let me know if I have to hide also these fields or not.

Actions #21

Updated by Luca Frosini 3 months ago

Aureliano Gentile wrote in #note-19:

All time-dependent data should be hidden, i.e. the sections "Stock Data" and "Fishery Data" should be made available only upon registration (no longer a self-registration but with authorization).

This is not possible

Actions #22

Updated by Aureliano Gentile 3 months ago

1- those fields were erroneously added out of the sections stock and fisheries data, all time-dependent data should be under those sections. And therefore they should be hidden/removed as well.

2- the laconic answer "this is not possible" does not help much. According to today's current behavior of the GRSF VRE, the access to data is upon user-login, so it is not clear why it should not be possible. What am I missing, what kind of additional clarifications should I provide? I stand ready for any assistance should you require to accomplish the task.

Actions #23

Updated by Pasquale Pagano 3 months ago

Let me clarify the overall picture.

The VRE is moderated and the users require authorization to become a member of it.

The catalogue is open access and it is accessible by anonymous users. Those users can access only the metadata and not the resources (files). Removing the information from the metadata satisfies the community requirements expressed by Aureliano.

We are going to change the configuration to include also Biomass and State and Trend that are time-dependent.
Is Scientific Advice time-dependent?
If yes, why the CSV is not published as a resource?
If not, why we should remove it?

Actions #24

Updated by Aureliano Gentile 3 months ago

Is Scientific Advice time-dependent? => yes it is information changing across time, time-dependent

If yes, why the CSV is not published as a resource? => I do not know, maybe because it is narrative text and not data purely? But indeed also narrative text has its own reporting year/reference year and can be included in a csv file. I concur.

Thanks a lot Lino, Gianluca and Francesco. And sorry for the complexity, but these are sensitive data and we need to progress step by step.

Actions #25

Updated by Pasquale Pagano 3 months ago

Ok, Scientific Advice is time-dependent but it was not published as such. It is published as a string and not as a list as for example "fishing_pressure": [{"year":2014, "value":"...", "unit": "..."}, {"year":2015, "value":"...", "unit": "..."}, ...]
Yannis Marketakis could you check it? We should decide how to deal with it.

Actions #26

Updated by Yannis Marketakis 3 months ago

I think we should treat scientific advice in a similar manner with state and trend, which are properly handled as resources.
Both these fields contain short narratives and attributes like value and unit does not exist for them.

State and Trend is added as follows:

 "state_and_trend_of_marine_resources": [
    {
      "value": "Both fishery-independent and dependent data show similar trends...",
      "reference_year": 2018
    }
  ],

Does it make sense?

Actions #27

Updated by Pasquale Pagano 3 months ago

Hi Yannis, Luca Frosini changed the implementation of Scientific Advice and now it is the same type of State and Trend. You should change the publication of this field in the json.

As soon as we have deployed the new service in PRE, we will notify you and you will make another submission.

Actions #28

Updated by Luca Frosini 3 months ago

Hi Yannis Marketakis

we have redeployed the service with the agreed changes.

I have tested the solution using as example the following representation for scientific_advice and it seems working.

 "scientific_advice" : [ {"value": "ICES advises on the basis of the precautionary approach that catches in 2013 should be reduced to the lowest possible level and that effective technical measures should be implemented to reduce discards in the Nephrops (TR2) fleet. [Rep. Year or Assessment ID: 2012, Data Owner: ICES, DB Source: FIRMS]", "reference_year": 2012 } ]

Then, I have deleted all the Records you have published yesterday

Can you please retry?

Actions #29

Updated by Yannis Marketakis 3 months ago

Thanks Luca Frosini and Pasquale Pagano

I just tested that and I confirm that it works as expected.

Especially as regards the update on scientific advice (the new json structure), I guess that this is something that needs to be applied in the service deployed and used from GRSF Admin VRE as well (so that scientific advice will be handled as resources from the GRSF Admin catalogue as well).

Actions #30

Updated by Pasquale Pagano 3 months ago

Thanks for your tests. Since everything is fine, I ask Luca Frosini to release the service for the production. As soon as it will become available in production, we should schedule the republication of the public GRSF that will allow us to close this issue.

We will manage GRSF_Admin in another issue.

Actions #31

Updated by Aureliano Gentile 3 months ago

With many thanks, indeed the Admin VRE should have all time-dependent data under the same section. The issue was already noticed in the past but we did not give it priority at that time.

Actions #32

Updated by Yannis Marketakis 3 months ago

Thanks for the update and confirmation.
So, I'm waiting for the green light from Luca Frosini to republish records in GRSF VRE.

Actions #33

Updated by Luca Frosini 3 months ago

  • Related to Bug #22310: Missing Scientific Advice Resource in some GRSF Records added
Actions #34

Updated by Pasquale Pagano 2 months ago

Yannis Marketakis, everything is fine for the republication of the data in production.

Please let us know if

a) you need to update GRSF or we can proceed with the complete removal of all records (and files) first and then you will perform a bulk new publication. The former will require less effort but the latter will allow us to clean useless files from the storage. We opt for the latter but feel free to choose the one you need;

b) you need to upgrade also GRSF_Pre in production. In this case, the same question as above: update or new bulk publication.

Thanks

Actions #35

Updated by Yannis Marketakis 2 months ago

Thanks Pasquale Pagano for the update.

As regards the process, both options are perfectly fine for me. If you prefer purging everything from the VRE, I'm OK. So please go ahead and let me know as soon as I can start the publication process.

The same applies for GRSF PRE VRE as well

Actions #36

Updated by Luca Frosini 2 months ago

So, if you agree I'm going to start to clean the production GRSF_Pre VRE (i.e https://blue-cloud.d4science.org/group/grsf_pre/ )

Then, if everything will be properly published we will do the same for GRSF.

As soon as, I have the green light I'll start the cleaning process.

Actions #37

Updated by Yannis Marketakis 2 months ago

I agree. Let's proceed as you described.

Actions #38

Updated by Luca Frosini 2 months ago

The cleaning process is running

Actions #39

Updated by Aureliano Gentile 2 months ago

With many thanks.

Actions #40

Updated by Luca Frosini 2 months ago

Yannis Marketakis I'll advise you as soon as the cleaning process has been completed.

In the meanwhile, we have separated the service instance serving the production GRSF_Pre VRE from the instance service GRSF and GRSF_Admin. This allows both of us to experiment with new features and bug fixing without compromising the GRSF and GRSF_Admin functionality.

The new host to perform the requests for the production GRSF_Pre is https://smart-grsfpre.d4science.org
(Please note that is different from the host used to serve the preproduction GRSF_Pre VRE used for the development test which is https://smart-grsfpre.pre.d4science.org)

Instead, for GRSF and GRSF_Admin the host is still https://smart-grsf-d4s.d4science.org
Please do not hesitate to contact me if something is not clear.

Actions #41

Updated by Luca Frosini 2 months ago

Aureliano Gentile do we have to remove the time-series from metadata also in GRSF_Pre or just in GRSF?

Actions #42

Updated by Aureliano Gentile 2 months ago

The requested behavior is only for the public GRSF VRE.
In "Admin" and "Pre" the data can be displayed in full (with the fix you just made, i.e scientific advice...) with no restrictions (as those VREs are already restricted to selected users).
Thanks again

Actions #43

Updated by Luca Frosini 2 months ago

In Admin and Pre 'Biomass', 'State and Trend' and 'Scientific Advice' must be displayed in 'Stock Data' Section and not in addition info, right?

Actions #44

Updated by Aureliano Gentile 2 months ago

Correct, thanks.

Actions #45

Updated by Luca Frosini 2 months ago

Yannis Marketakis you have the green light to publish in GRSF_Pre VRE.

Actions #46

Updated by Yannis Marketakis 2 months ago

Thanks Luca Frosini
Before starting, I noticed that there are still 2 records visible in GRSF PRE. Should I wait for them to be removed as well?

Actions #47

Updated by Luca Frosini 2 months ago

Yannis Marketakis my fault that I was not clear.

I have cleaned the GRSF_Pre in production (which must not be confused with GRSF_Pre in preproduction which has the two records you mentioned):
https://blue-cloud.d4science.org/group/grsf_pre/data-catalogue

To publish in such a VRE you must use the following host:
https://smart-grsfpre.d4science.org/

The user is grsf.publisher and the token is still the same you used in the past (feel free to contact me privately for the token if you are in trouble).

Actions #48

Updated by Yannis Marketakis 2 months ago

Thanks Luca Frosini

You were perfectly clear on this. I've started publishing on GRSF PRE.

Actions #49

Updated by Yannis Marketakis 2 months ago

Luca Frosini I've started the publication and after publishing some hundreds it started throwing some errors (basically timeout connections errors).
Moreover, the GRSF PRE catalog reports a Server Error (Click on GRSF Pre Records Management --> Records)

Actions #50

Updated by Luca Frosini 2 months ago

It seems that CKAN crashed because it does not support such a workload.

Francesco Mangiacrapa has found that the following two records were present n Solr but not in the CKAN DB.

  • d30df0b5-a11e-3f45-a3d1-b43e47344519
  • d561c422-e5cb-33df-bb90-9dc05cc53930

We are going to restart CKAN and resync Solr.

Yannis Marketakis I'll advise you when you can start the publishing process again, but please slow down the rate?

Actions #51

Updated by Yannis Marketakis 2 months ago

Thanks.
I'll inject some seconds of inactivity between the records publishing process as you suggested.

Actions #52

Updated by Francesco Mangiacrapa 2 months ago

resync from DB to SOLR, done.

Now, the previous error has been fixed (https://i-marine.d4science.org/group/grsf_pre/data-catalogue?path=dataset)

Yannis Marketakis you can restart the publishing...

However, I just noticed a mismatching between:

  1. the statistics reported in the home page https://i-marine.d4science.org/group/grsf_pre/data-catalogue (796 records);
  2. the dataset published and reported in the dataset page https://i-marine.d4science.org/group/grsf_pre/data-catalogue (804 records).

Maybe, the previous crash generated the issue... but this problem is with CKAN and not the new grsf_publisher service (that we are testing), so we can ignore such error and try to investigate this eventually

Actions #53

Updated by Yannis Marketakis about 2 months ago

  • Status changed from In Progress to Feedback
  • Assignee changed from Yannis Marketakis to Aureliano Gentile

The GRSF republishing in GRSF PRE has been completed.
Aureliano Gentile you are more than welcome to perform a check.
As soon as everything is fine, we could do the same with GRSF Admin and GRSF VREs (we'll document that in separate issue as suggested by CNR team)

Actions #54

Updated by Aureliano Gentile about 2 months ago

Dear CNR and FORTH colleagues, thanks, I checked (in a call with Yannis) and we spotted the following in this example https://data.d4science.org/ctlg/GRSF_Pre/69c20914-4411-3819-a5b7-f3500766833c :

  • BOX Data and Resources: all time dependent data seem properly displayed
  • BOX Additional Info : the content "Biomass" should not be displayed here rather under the box "Stock Data", similarly to any other time dependent data. The box Additional Info should contain only those metadata related to the record identity (i.e. citation, domain, url, QR, flags, polygons, status etc.)

Thanks in advance for taking action accordingly. (We are almost there, it seems)

Actions #55

Updated by Pasquale Pagano about 2 months ago

  • Assignee changed from Aureliano Gentile to Francesco Mangiacrapa

Please Francesco Mangiacrapa and Luca Frosini change the configuration of the record to move Biomass in the Box Data and Resources. Thanks.

Actions #56

Updated by Aureliano Gentile about 2 months ago

Dear all, please, I have been asked where we stand with this ticket regarding applying the new behavior in the public GRSF VRE (i.e. this ticket). Your kind actions to finalize the work would be appreciated.
With thanks
Aureliano

Actions #57

Updated by Luca Frosini about 1 month ago

Aureliano Gentile apart Biomass which has to be moved in the 'Stock Data' "box", are there any other fields to move in another "box"? Can you please enumerate them?
Are there any other fields that have to be hidden for not logged people? Can you please enumerate them?

Actions #58

Updated by Luca Frosini about 1 month ago

Please note that GRSF_Pre is a test for the data which will be visible in GRSF VRE so need to fix it all before updating GRSF.

Actions #59

Updated by Aureliano Gentile about 1 month ago

(So we are back to the initial request of this ticket) ;-)

Please find hereafter the list of time-dependent data which should be within the box "DATA and RESOURCES" and the box "STOCK DATA".

  • Abundance Level
  • Abundance Level (FIRMS Standard)
  • Catches
  • FAO Categories
  • Fishing Pressure
  • Fishing Pressure (FIRMS Standard)
  • Landings
  • Biomass
  • Scientific advice
  • State and Trend

In this example in GRSF PRE https://data.d4science.org/ctlg/GRSF_Pre/51b7e41c-f37f-3fca-8864-4d04f8d50261 the Biomass is still wrongly in the box "Additional Info".

Hope it helps and thanks again.

Actions #60

Updated by Luca Frosini about 1 month ago

Sorry Yannis Marketakis if I use the Italian in the following, but I want to be clear with Aureliano Gentile

Aureliano Gentile wrote in #note-59:

(So we are back to the initial request of this ticket) ;-)

Sinceramente questa parte del commento potevi risparmiartela.
Visto che siamo al 59-imo commento evidentemente le cose non sono chiare.
In più, nonostante te lo chieda esplicitamente non solo rispondi così ma continui a fornirmi informazioni parziali e frammentate.
I campi che mi indichi sotto sono riferiti a Stock e non ci vedo non vedo "Data owner" (che era stato fatto sparire in dev).
Se adesso mappo questi campi senza "Data owner" ci troviamo "back to the initial request of this ticket".

Riguardo a Fishery? Io in dev ho fatto sparire solamente: Data owner, Catches e Landings. C'è altro?

Please find hereafter the list of time-dependent data which should be within the box "DATA and RESOURCES" and the box "STOCK DATA".

  • Abundance Level
  • Abundance Level (FIRMS Standard)
  • Catches
  • FAO Categories
  • Fishing Pressure
  • Fishing Pressure (FIRMS Standard)
  • Landings
  • Biomass
  • Scientific advice
  • State and Trend

In this example in GRSF PRE https://data.d4science.org/ctlg/GRSF_Pre/51b7e41c-f37f-3fca-8864-4d04f8d50261 the Biomass is still wrongly in the box "Additional Info".

Hope it helps and thanks again.

Siccome non ha aiutato ed io sono duro di comprendonio, me li rielenchi tutti a modo divisi per tipo?

Grazie

PS: sono Luca non Gianluca

Actions #61

Updated by Luca Frosini about 1 month ago

  • Assignee changed from Francesco Mangiacrapa to Aureliano Gentile
Actions #62

Updated by Aureliano Gentile about 1 month ago

Dear Luca,

I am very sorry for the misunderstanding of the requirements and the misinterpretation of my words.
I think one of the issues is the misaligned language between the developers and the information managers.
Considering the whole thread, I would suggest having a call on Skype or equivalent with you and the colleagues to clarify all aspects, as well as to rebuild some mutual trust.
Looking forward to hearing from you
Aureliano

Actions

Also available in: Atom PDF