Task #449

BlueBRIDGE - Project WP #629: WP4 - VREs Deployment and Operation [Months: 1-30]

BlueBRIDGE - Project Task #630: T4.1 BlueBRIDGE Infrastructure Operation [Months: 1-30]

BlueBRIDGE - Task #1367: Pre-production infrastructure creation

New root VO creation for pre production

Added by Massimiliano Assante over 4 years ago. Updated almost 4 years ago.

Status:ClosedStart date:Aug 03, 2015
Priority:UrgentDue date:Dec 23, 2015
Assignee:Daniele Pavia% Done:

100%

Category:Other
Sprint:PreProd Infrastructure
Infrastructure:Pre-Production
Milestones:
Duration: 103

Description

As a follow up of #372, a new root VO should be created with name "d4s"

ServiceMap_d4s.xml Magnifier (750 Bytes) Daniele Pavia, Dec 11, 2015 04:54 PM

ExceptionBadWsdlUrl.txt Magnifier (5.65 KB) Roberto Cirillo, Jan 25, 2016 04:25 PM

History

#1 Updated by Pasquale Pagano over 4 years ago

  • Status changed from New to Rejected

Duplicated, see #451

#2 Updated by Massimiliano Assante over 4 years ago

  • Status changed from Rejected to In Progress

This is not a duplicate ticket, this ticket is meant to track the activity regarding the new Root VO Creation while the other one (https://support.d4science.org/issues/451) is meant for the VO under this Root VO.

#3 Updated by Massimiliano Assante almost 4 years ago

  • Status changed from In Progress to Closed

In the BlueCommons Meeting of November 2015, we agreed that this activity would be in charge of Engineering. This ticket is no longer necessary

#4 Updated by Massimiliano Assante almost 4 years ago

  • Parent task set to #1367
  • Assignee changed from Roberto Cirillo to Paolo Fabriani
  • Status changed from Closed to In Progress
  • Due date changed from Sep 04, 2015 to Dec 23, 2015

We reopened the ticket and assigned to ENG, however the root vo name should be named /d4s

#5 Updated by Massimiliano Assante almost 4 years ago

  • Description updated (diff)

#6 Updated by Paolo Fabriani almost 4 years ago

  • Assignee changed from Paolo Fabriani to Daniele Pavia

#7 Updated by Daniele Pavia almost 4 years ago

New /d4s root VO has been deployed. We need to take a round of functional tests in order to check if everything is working as intended. A new Servicemap has been attached to this ticket.

#8 Updated by Roberto Cirillo almost 4 years ago

In the serviceMap file, I don't see the ports related to the service endpoints.
For example in dev we have:

<Service name ="ISICAllQueryPT" endpoint ="http://dlib01.isti.cnr.it:8080/wsrf/services/gcube/informationsystem/collector/XQueryAccess"/>

what are the ports used in "d4science1.esl.eng.it", "d4science2.esl.eng.it" ?

#9 Updated by Massimiliano Assante almost 4 years ago

  • Priority changed from Normal to Urgent

#10 Updated by Daniele Pavia almost 4 years ago

Hi, no port was specified in the ServiceMap file because we use the standard http port (80). It should be fine unless the servicemap parser expects a port in the endpoint even when it's a standrd port for the protocol.

#11 Updated by Roberto Cirillo almost 4 years ago

The port 80 is a privileged port. Have you also installed the gCore container under root user? It is strongly recommended to install the gCore container under a no privileged user.

#12 Updated by Daniele Pavia almost 4 years ago

  • % Done changed from 80 to 90

So,
"The port 80 is a privileged port."
The apache reverse proxy standing in front of each web service listens to port 80 then forwards requests to the actual service port.

"Have you also installed the gCore container under root user? It is strongly recommended to install the gCore container under a no privileged user."
No, each deployed GHN is running under its own unprivileged user.

As of now, the root VO should be running fine. endpoints are:
IS-Collector (/d4s) - http://d4science1.esl.eng.it:80/
IS-Registry/Notifier (/d4s) - http://d4science2.esl.eng.it:80/
Resource Manager/Broker (/d4s) - http://d4science3.esle.eng.it:80/

the portal is available at: http://d4science4.esl.eng.it:80/ , there are a few issues pending with the configuration though.

Let's see if we can test it out.

#13 Updated by Roberto Cirillo almost 4 years ago

I've tried to perform a registration test on scope /d4s but it fails because on the root-VO (/d4s) there are 3 Registry GCoreEndpoint and 3 IS-collector gCoreEndpoint in status ready but only one works. The old GCoreEndpoint should be cleaned.

#14 Updated by Daniele Pavia almost 4 years ago

The sweeper seems unable to clean up expired nodes that belong to an unexisting VO (rather, a previously configured VO, /d4science.org).

Here's the stack trace from catalina.out:

Applying sweep
2015-12-18 15:39:45,889 WARN resources.AbstractResourceManager [http-9000-10,warn:42] %[PORTAL] 90544902 [http-9000-10] WARN org.gcube.resourcemanagement.support.server.managers.resources.AbstractResourceManager - *** [RMP] [SCOPE-MGR] Using DEFAULT scope manager
Action->APPLY_GHN_MOVE_TO_UNREACHABLE
2015-12-18 15:39:45,890 INFO sweeper.Sweeper [http-9000-10,applySweep:144] %[PORTAL] 90544903 [http-9000-10] INFO org.gcube.resourcemanagement.support.server.sweeper.Sweeper - Cleaning up 07724f80-78d4-11e5-99e3-97a458518898 APPLY_GHN_MOVE_TO_UNREACHABLE
2015-12-18 15:39:45,900 INFO jaxws.StubCache [http-9000-10,get:70] %[PORTAL] 90652532 [http-9000-10] INFO org.gcube.common.clients.stubs.jaxws.StubCache - using cached stub for interface org.gcube.resources.discovery.icclient.stubs.CollectorStub
2015-12-18 15:39:45,904 INFO icclient.ICClient [http-9000-10,callService:75] %[PORTAL] 90652536 [http-9000-10] INFO org.gcube.resources.discovery.icclient.ICClient - executing query for $resource in collection('/db/Profiles/GHN')//Resource where ( $resource/ID/string() eq '07724f80-78d4-11e5-99e3-97a458518898') return $resource
2015-12-18 15:39:45,925 INFO icclient.ICClient [http-9000-10,callService:83] %[PORTAL] 90652557 [http-9000-10] INFO org.gcube.resources.discovery.icclient.ICClient - executed query for $resource in collection('/db/Profiles/GHN')//Resource where ( $resource/ID/string() eq '07724f80-78d4-11e5-99e3-97a458518898') return $resource in 21 ms
2015-12-18 15:39:45,951 INFO publisher.RegistryPublisher [http-9000-10,update:126] %[PORTAL] 90652583 [http-9000-10] INFO org.gcube.informationsystem.publisher.RegistryPublisher - update operation: updating resource07724f80-78d4-11e5-99e3-97a458518898 on scope: /d4science.org
java.lang.IllegalStateException: a map for /d4science.org is undefined
at org.gcube.common.scope.impl.ScopedServiceMap.currentMap(ScopedServiceMap.java:63)
at org.gcube.common.scope.impl.ScopedServiceMap.endpoint(ScopedServiceMap.java:44)
at org.gcube.resources.discovery.icclient.ICClient.getStub(ICClient.java:92)
at org.gcube.resources.discovery.icclient.ICClient.submit(ICClient.java:57)
at org.gcube.resources.discovery.client.impl.DelegateClient.submit(DelegateClient.java:50)
at org.gcube.informationsystem.publisher.utils.RegistryStubs.getEndPoints(RegistryStubs.java:49)
at org.gcube.informationsystem.publisher.utils.RegistryStubs.getStubs(RegistryStubs.java:62)
at org.gcube.informationsystem.publisher.RegistryPublisherImpl.registryUpdate(RegistryPublisherImpl.java:180)
at org.gcube.informationsystem.publisher.RegistryPublisherImpl.update(RegistryPublisherImpl.java:128)
at org.gcube.resourcemanagement.support.server.sweeper.Sweeper.applySweep(Sweeper.java:156)
at org.gcube.portlets.admin.resourcesweeper.server.SweeperServiceImpl.applySweep(SweeperServiceImpl.java:81)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.google.gwt.user.server.rpc.RPC.invokeAndEncodeResponse(RPC.java:561)
at com.google.gwt.user.server.rpc.RemoteServiceServlet.processCall(RemoteServiceServlet.java:208)
at com.google.gwt.user.server.rpc.RemoteServiceServlet.processPost(RemoteServiceServlet.java:248)
at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:62)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66)
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:745)

Is there a way to force the removal of such nodes?

#15 Updated by Roberto Cirillo almost 4 years ago

From monitor I see a lot of resources registered on the IS-Collector with a bad scope (/d4science): GenericResource, ServiceEndpoint, etc.
I guess, the only way to resolve this issue is to delete the exist db on the IS-Collector. The IS-Collector should restart with an empty db. In addition, the registry service need a clean state before restart the IS-Collector. Before perform these steps, please take care to shutdown or delete all the services that have a bad scope.
For me it is very difficult to follow these steps without a remote access. Could I have a remote access to these services?

#16 Updated by Pasquale Pagano almost 4 years ago

The deployment of a newly created infrastructure requires a proper configuration and a defined order in the starting up of the different nodes. This procedure is difficult and time consuming without CNR support. This support cannot really be offered without having access to the logs and configuration files. Is it possible to activate the procedure to grant access to @roberto.cirillo@isti.cnr.it ?

#17 Updated by Daniele Pavia almost 4 years ago

Root VO has been re-deployed and seems to be working fine. The previusly released servicemap is still valid. Liferay Portal is still availble at http://d4science4.esl.eng.it/ - I guess we can finally begin validating the d4s infrastructure@ENG.

#18 Updated by Roberto Cirillo almost 4 years ago

I've tried to register a generic resource on root-VO.
Unfortunately, I've got an exception (in attachment). I guess, we have this exception because this isn't the real IS-Collector address but it is the proxy address. There is a way to redirect this url "http://d4science1.esl.eng.it/wsrf/services/gcube/informationsystem/collector/XQueryAccess?wsdl" to the right url?

#19 Updated by Daniele Pavia almost 4 years ago

Hi, seems the issue was related to local ulimits, http://d4science1.esl.eng.it/wsrf/services/gcube/informationsystem/collector/XQueryAccess?wsdl seems to answer correctly now. Please let me know if other issues arise.

#20 Updated by Roberto Cirillo almost 4 years ago

I've performed another test but I've received a "SoapFaultException" exception.
The same exception that is thrown by the monitor now: "http://d4science4.esl.eng.it/web/guest/monitor". I guess there is a problem with the IS-Collector service. Could you check this service?

#21 Updated by Massimiliano Assante almost 4 years ago

Is the preproduction root VO working properly? Please provide an update on this activity @daniele.pavia@eng.it @roberto.cirillo@isti.cnr.it

#22 Updated by Massimiliano Assante almost 4 years ago

@daniele.pavia@eng.it or @paolo.fabriani@eng.it could you please update the status of this activity by reporting the issues that are still open (if any) not allowing this Root VO to work properly?

#23 Updated by Paolo Fabriani almost 4 years ago

Reported issues seems to be solved.

@daniele.pavia@eng.it, is there any further issue blocking @roberto.cirillo@isti.cnr.it in checking it works fine from CNR/outside?

#24 Updated by Daniele Pavia almost 4 years ago

@massimiliano.assante@isti.cnr.it @roberto.cirillo@isti.cnr.it @paolo.fabriani@eng.it we should be ready for another test, let's give it a go

#25 Updated by Roberto Cirillo almost 4 years ago

I'm glad to inform you that a new "GenericResource" resource has been correctly registered on preproduction infrastructure under the "/d4s" scope.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Resource version="0.4.x">
    <ID>29242d26-c168-4bbe-9a78-7562af5d6b08</ID>
    <Type>GenericResource</Type>
    <Scopes>
        <Scope>/d4s</Scope>
    </Scopes>
    <Profile>
        <SecondaryType>BotoxTest</SecondaryType>
        <Name>TestNewPreproduction</Name>
        <Description>TestNewPreproduction</Description>
        <Body>
            <SourceProperties>

                <creationTime>2016-01-26T14:33:33.116+02:00</creationTime>

                <user>true</user>
            </SourceProperties>
            <other>test</other>
        </Body>
    </Profile>
</Resource>

#26 Updated by Massimiliano Assante almost 4 years ago

That's a very good news, however from what I see in the monitor http://d4science4.esl.eng.it/web/guest/monitor it seems that all the nodes are expired (their last update time is of more than 1 month ago) any idea how?

#27 Updated by Gabriele Giammatteo almost 4 years ago

We also noticed that, but we couldn't find any clue in the logs. I do not see any exception, but I see few of these messages in the log:

 2016-02-22 08:09:43,068 ERROR generic.GCUBEGenericBulkPublisher [BulkPublisher,error:72] GCUBEGenericBulkPublisher: Unable to publish resources for Profiles/GHN in scope /d4s java.lang.ArrayIndexOutOfBoundsException 2

Can it be related? What if we stop/clean the GHN? Should this solve the problem?

#28 Updated by Roberto Cirillo almost 4 years ago

I'm sorry but I don't know. One line of log is not enough for an analysis. What log is it? What container is it?

#29 Updated by Pasquale Pagano almost 4 years ago

Two months ago I was saying the following:

The deployment of a newly created infrastructure requires a proper configuration and a defined order in the starting up of the different nodes. This procedure is difficult and time consuming without CNR support. This support cannot really be offered without having access to the logs and configuration files. Is it possible to activate the procedure to grant access to @roberto.cirillo@isti.cnr.it ?

I did not get any answer. We are loosing time and effort. It is not possible to continue in this way. If not possible to configure those machines with CNR support (and it is apparently impossible without it) I believe that we need to recognize that the work cannot be performed at ENG site and reassign the work at CNR.

#30 Updated by Paolo Fabriani almost 4 years ago

In the past we no longer activated the procedure since most (if not all) the problems we had were related to network configurations, names, proxies, ports, etc.. that were outside the scope of gcube. Now the issue seems to be related to gcube and, as you say, CNR support would be needed.

We're acvivating the request for external access. In the meantime we've setup a TeamViewer server to allow Roberto/CNR to inspect the issue. We've already shared the setup with Roberto. Tomorrow we'll sync to at the issue closely.

#31 Updated by Roberto Cirillo almost 4 years ago

The problem has been fixed.
Now all the containers are correctly registered to the infrastructure.
There is another non blocking problem: some old containers have been registered to the infrastructure with a bad scope: "/d4s/d4s"
For this particular scope, the servicemap filename corresponding to the root-vo "/d4s" has the same filename of the vo service map "/d4s/d4s": "d4s.servicemap".
The portal is not able to delete the resources because it doesn't find the vo servicemap.
Tomorrow, we could delete these resource directly from exist db with the aid of @lucio.lelii@isti.cnr.it

#32 Updated by Roberto Cirillo almost 4 years ago

I see a SoapFaultException from monitor, now. The IS-Collector has been restarted?

#33 Updated by Roberto Cirillo almost 4 years ago

  • % Done changed from 90 to 100
  • Status changed from In Progress to Closed

The problem is due to an error in the exist backup procedure started at 00:00 AM.
I've restored the db and deleted the resources registered to the infra with a bad scope.
Now the root VO is ready to use. I'm going to close this ticket.

Also available in: Atom PDF