Bug #6220

HomeLibrary: Accounting exception and all log blocked for severale minutes

Added by Roberto Cirillo about 3 years ago. Updated almost 3 years ago.

Status:ClosedStart date:Dec 14, 2016
Priority:NormalDue date:
Assignee:Alessandro Pieve% Done:

100%

Category:Other
Sprint:zz - UnSprintable
Milestones:
Duration:

Description

In HL webapp I've seen a not usual accounting exception and after that we have a time hole for several minutes (from 18:09 to 18:22. In all log files). I don't know if the exception is related but it's need to investigate. The exception is the following:

infrastructures.eu FINISHED IN 38874 millis
18:09:41.985 [http-bio-8080-exec-27269] DEBUG RequestManager: check wich handler should be excluded /GetDisplayNameServlet
18:09:41.910 [RxComputationScheduler-2] WARN  Endpoint: [couchbase01-d4s.d4science.org/146.48.122.169:11210][KeyValueEndpoint]: Could not connect to endpoint, retrying with delay 4096 MILLISECONDS: 
com.couchbase.client.deps.io.netty.channel.ConnectTimeoutException: Connect callback did not return, hit safeguarding timeout.
        at com.couchbase.client.core.endpoint.AbstractEndpoint$3.call(AbstractEndpoint.java:317) ~[core-io-1.3.6.jar:na]
        at com.couchbase.client.core.endpoint.AbstractEndpoint$3.call(AbstractEndpoint.java:310) ~[core-io-1.3.6.jar:na]
        at rx.internal.operators.SingleOperatorOnErrorResumeNext$2.onError(SingleOperatorOnErrorResumeNext.java:69) [rxjava-1.1.8.jar:1.1.8]
        at rx.Single$18.onError(Single.java:1883) [rxjava-1.1.8.jar:1.1.8]
        at rx.observers.SafeSubscriber._onError(SafeSubscriber.java:152) [rxjava-1.1.8.jar:1.1.8]
        at rx.observers.SafeSubscriber.onError(SafeSubscriber.java:115) [rxjava-1.1.8.jar:1.1.8]
        at rx.observers.SerializedObserver.onError(SerializedObserver.java:153) [rxjava-1.1.8.jar:1.1.8]
        at rx.observers.SerializedSubscriber.onError(SerializedSubscriber.java:78) [rxjava-1.1.8.jar:1.1.8]
        at rx.internal.operators.OperatorTimeoutBase$TimeoutSubscriber$1.onError(OperatorTimeoutBase.java:187) [rxjava-1.1.8.jar:1.1.8]
        at rx.Single$1$1.onError(Single.java:85) [rxjava-1.1.8.jar:1.1.8]
        at rx.Single$3.call(Single.java:505) [rxjava-1.1.8.jar:1.1.8]
        at rx.Single$3.call(Single.java:501) [rxjava-1.1.8.jar:1.1.8]
        at rx.Single$1.call(Single.java:90) [rxjava-1.1.8.jar:1.1.8]
        at rx.Single$1.call(Single.java:70) [rxjava-1.1.8.jar:1.1.8]
        at rx.Observable.unsafeSubscribe(Observable.java:9861) [rxjava-1.1.8.jar:1.1.8]
        at rx.internal.operators.OperatorTimeoutBase$TimeoutSubscriber.onTimeout(OperatorTimeoutBase.java:200) [rxjava-1.1.8.jar:1.1.8]
        at rx.internal.operators.OperatorTimeout$1$1.call(OperatorTimeout.java:41) [rxjava-1.1.8.jar:1.1.8]
        at rx.internal.schedulers.EventLoopsScheduler$EventLoopWorker$2.call(EventLoopsScheduler.java:189) [rxjava-1.1.8.jar:1.1.8]
        at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) [rxjava-1.1.8.jar:1.1.8]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_51]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_51]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) [na:1.7.0_51]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) [na:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
        at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
18:22:29.607 [localhost-startStop-1] TRACE Bootstrap: smartgears is starting
18:22:29.644 [localhost-startStop-1] TRACE Provider: reading container configuration @ /home/gcube/SmartGears-Bundle/SmartGears/container.xml 
18:22:30.125 [localhost-startStop-1] TRACE Provider: setting persistence location for container @ /home/gcube/SmartGears-Bundle/SmartGears/state
18:22:30.191 [localhost-startStop-1] INFO  Provider: loading persisted state for container

In other containers I see usually another kind of exception:

2016-12-14 19:03:03,439 ERROR persistence.PersistenceBackend [AccountingScheduledThread-10,accountWithFallback:121] {minInvocationTime=18, serviceClass=InformationSystem, host=node66.p.d4science.research-infrastructures.eu:8080, scope=/d4science.research-infrastructures.eu/gCubeApps/BiOnym, operationCount=9, callerHost=146.48.122.93, endTime=1481717141875, maxInvocationTime=56, serviceName=IS-Collector, consumerId=dataminer6-p-d4s.d4science.org:80, startTime=1481715519513, id=40d43b12-8907-4a10-8ee7-2c7677ab9fb8, duration=22, operationResult=SUCCESS, calledMethod=execute, aggregated=true, creationTime=1481717141886, recordType=ServiceUsageRecord, callerQualifier=TOKEN} was not accounted succesfully from PersistenceCouchBase. Trying to use FallbackPersistenceBackend.
java.lang.RuntimeException: java.util.concurrent.TimeoutException
    at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:75)
    at com.couchbase.client.java.CouchbaseBucket.upsert(CouchbaseBucket.java:353)
    at com.couchbase.client.java.CouchbaseBucket.upsert(CouchbaseBucket.java:348)
    at org.gcube.documentstore.persistence.PersistenceCouchBase.createItem(PersistenceCouchBase.java:137)
    at org.gcube.documentstore.persistence.PersistenceCouchBase.reallyAccount(PersistenceCouchBase.java:164)
    at org.gcube.documentstore.persistence.PersistenceBackend.accountWithFallback(PersistenceBackend.java:103)
    at org.gcube.documentstore.persistence.PersistenceBackendMonitor.elaborateFile(PersistenceBackendMonitor.java:50)
    at org.gcube.documentstore.persistence.PersistenceBackendMonitor.elaborateFallbackFile(PersistenceBackendMonitor.java:101)
    at org.gcube.documentstore.persistence.PersistenceBackendMonitor.run(PersistenceBackendMonitor.java:120)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.util.concurrent.TimeoutException
    ... 16 more


Related issues

Related to gCube - Feature #6277: Document store Lib - Handle the case of repeated timeout... Closed Dec 21, 2016

History

#1 Updated by Alessandro Pieve about 3 years ago

  • Status changed from New to In Progress

For the first execption I'm still investigating, but for now I can say that is correct because in this time (~18:07) couchbase01-d4s was actually down for an upgrade.
For more information, could you please tell me the java-sdk version and give me other logs before this block ?

For the second execption, it happened for too many active connections (error seen on machines with huge load and an instability of the Couchbase cluster(one node server go in pend or down
as happened yesterday (this error does not block the machine))).

In this condition the document-store-lib provided an additional check:
For three consecutive timeout execption it renew configuration (close a old connection, reload a configuration and prepare a new connection).

#2 Updated by Alessandro Pieve almost 3 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Closed

#3 Updated by Alessandro Pieve almost 3 years ago

  • Related to Feature #6277: Document store Lib - Handle the case of repeated timeouts exception from couchbase cluster added

Also available in: Atom PDF