Support #3811

Statistical manager refuses connection

Added by Kostas Hirikakis over 3 years ago. Updated over 3 years ago.

Status:ClosedStart date:May 06, 2016
Priority:UrgentDue date:
Assignee:_InfraScience Systems Engineer% Done:

100%

Category:-
Infrastructure:
Milestones:
Duration:

Description

Testing the Statistical Manager. Test from the tutorial. We are using portal bundle 4.0.0-SNAPSHOT with tomcat 7.0.62.

  1. starting tomcat
  2. launching a servlet (see catalina.out#415). The servlet is just invoking TestStatistical's main (SimulMedium.java#58). TestStatistical is an exact copy from the tutorial.
  3. At catalina.out#623 it says that we received a map.
  4. At catalina.out#630 it says "current scope found(ScopeProvider): /gcube/devsec/devVRE" which is correct
  5. At catalina.out#667 there is a mongo pooled connection and at #670 it cannot open it (connection refused)

Notes:
1. Running wget -qO- http://ipecho.net/plain we get 193.92.78.53, which is the ip address we already gave you to register.
2. The eclipse project is attached. It is only a servlet wrapper for TestStatistical
3. The full tomcat log is attached

servlet.tar.gz - Eclipse servlet mavenized project (5.14 KB) Kostas Hirikakis, Apr 27, 2016 10:57 PM

tomcat_logs.tar.gz - Full tomcat log (15.8 KB) Kostas Hirikakis, Apr 27, 2016 10:57 PM

tomcat_logs.tar.gz (15.8 KB) Kostas Hirikakis, Apr 30, 2016 12:48 PM

tomcat_logs.tar.gz (31.5 KB) Kostas Hirikakis, May 04, 2016 09:03 AM

netstat.out.tar.gz (916 KB) Kostas Hirikakis, May 04, 2016 09:19 PM

tomcat_logs.tar.gz (30.9 KB) Kostas Hirikakis, May 04, 2016 09:22 PM

tcpFlow.log (2.37 KB) Roberto Cirillo, May 05, 2016 02:59 PM

catalina(pamvotis_1).out (68.4 KB) Panagiota Koltsida, May 12, 2016 01:24 PM


Subtasks

Support #3911: Allow static ip to access mongodb serverClosed_InfraScience Systems Engineer


Related issues

Related to gCube - Support #4136: Cannot access mongo database server Closed May 26, 2016

History

#1 Updated by Andrea Dell'Amico over 3 years ago

  • Assignee changed from _InfraScience Systems Engineer to Panagiota Koltsida

#2 Updated by Kostas Hirikakis over 3 years ago

i2s troubleshooted this with @fabio.sinibaldi@isti.cnr.it via email. Since support has been moved here and the assignee changed, we will continue here.

Fabio's last response was at 28/4:

*Hi Kostas,
well, the good news is that actually it's not the same error.
On the previous logs you had
com.mongodb.MongoSocketOpenException: Exception opening socket
but now you have
com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message.

We checked our logs and it seems that messages are correctly received by mongo instances, thus the problem might be that somehow you cannot receive the response. It might by because you're behind a firewall which block receiving messages on some ports.
Could you investigate on your side about that possibility?
The port used should be 27017.

Fabio*

We added simple http-get code just before invoking TestStatistical.main. As you can see at the attached logs at catalina.out#365-397, the app invokes the server at port 27017 and consumes the output just fine. Our target server [1] serves html rather than mongo but that doesn't affect the port issue: our gcube portal can receive messages at 27017 (or any other port, as we tested some random ones).

How we should proceed?

[1] dummy target server: a simple iis server behind a router. The router listens to port 27017 and redirects to iisserver:80. The router is at another ISP network.

#3 Updated by Kostas Hirikakis over 3 years ago

  • Status changed from New to In Progress

#4 Updated by Panagiota Koltsida over 3 years ago

  • Assignee changed from Panagiota Koltsida to Fabio Sinibaldi

#5 Updated by Fabio Sinibaldi over 3 years ago

  • Assignee changed from Fabio Sinibaldi to Roberto Cirillo

#6 Updated by Roberto Cirillo over 3 years ago

@kostashirikakis@i2s.gr this is very strange. The same code works properly from my machine.
Please, could you reproduce this issue? We are going to dump the tcp connection.

#7 Updated by Kostas Hirikakis over 3 years ago

In case you need a marked execution, this one ran at 4 May 2016, 8:40 CET.

#8 Updated by Roberto Cirillo over 3 years ago

Thank you Kostas.

From mongodb log I see

2016-05-04T08:39:54.456+0200 I NETWORK  [initandlisten] connection accepted from 193.92.78.53:55652 #1884081

where 55652 is the client's port. Every client has to use a different port and its assigned by the os.
This morning your client has tried to receive data from the following ports:

27017 > 55643
27017 > 55646
27017 > 55648
27017 > 55650
27017 > 55652

I guess you are not able to receive data from these ports. Could you try to set the port of your target server to 55643 and retry your test, please?

#9 Updated by Kostas Hirikakis over 3 years ago

I guess you are referring to the client tcp ports assigned during the communication. I am attaching the output of the

netstat -all -c

command. The capture started just before launching tomcat and stopped right after the connection timed out. I included everything so you can filter out as necessary.
I am also attaching tomcat logs, just in case.

#10 Updated by Roberto Cirillo over 3 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Feedback

Kostas Hirikakis wrote:

I guess you are referring to the client tcp ports assigned during the communication. I am attaching the output of the

Yes, I'm referring to the client tcp port assigned by your os.

From your netstat I see the following:

tcp        0     59 192.168.1.112:58042     146.48.123.72:27017     ESTABLISHED

where "146.48.123.72" is our mongodb server and "192.168.1.112" is your local address.
While in mongodb server log I see:

2016-05-04T19:45:32.460+0200 I NETWORK  [initandlisten] connection accepted from 193.92.78.53:58042 #1894689

but your ip address is not the same: from mongodb I see "193.92.78.53", from your netstat I see "192.168.1.112".

So, I guess it is your fw or your nat system that doesn't correctly manage the connection on that port and reset the connection. We cannot do anything about this by our side.

In attachment the tcp connection flow.

#11 Updated by Roberto Cirillo over 3 years ago

#12 Updated by Kostas Hirikakis over 3 years ago

  • Status changed from Feedback to In Progress

The sample project ran from a portal hosted at 195.134.66.105 produced the same problem. The ip is allowed to access the MongoDB server and (luckily) it is not behind any router.
Please advise.

#13 Updated by Roberto Cirillo over 3 years ago

  • Status changed from In Progress to Closed

I'm going to close this ticket. We have to try with the new ip, if the problem is the same we have to find a workaround for this issue, if possible.

#14 Updated by Tommaso Piccioli over 3 years ago

The behavior is alwais the same, after the first correct handshake for the tcp connection there is a reset of the connection coming from your network IP, after that mongo try to comunicate again but evidently the connection was already dropped, your side.

  7 2016-05-06 13:45:53.107239 195.134.66.105 -> 146.48.123.72 TCP 74 57276 > 27017 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=536420872 TSecr=0 WS=128
  8 2016-05-06 13:45:53.107297 146.48.123.72 -> 195.134.66.105 TCP 74 27017 > 57276 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSval=3880630795 TSecr=536420872 WS=128
  9 2016-05-06 13:45:53.172115 195.134.66.105 -> 146.48.123.72 TCP 66 57276 > 27017 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=536420889 TSecr=3880630795
 10 2016-05-06 13:45:53.172349 195.134.66.105 -> 146.48.123.72 MONGO 124 57276 > 27017 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=58 TSval=536420889 TSecr=3880630795
 11 2016-05-06 13:45:53.172388 146.48.123.72 -> 195.134.66.105 TCP 66 27017 > 57276 [ACK] Seq=1 Ack=59 Win=5888 Len=0 TSval=3880630811 TSecr=536420889
 12 2016-05-06 13:45:53.173386 146.48.123.72 -> 195.134.66.105 MONGO 625 27017 > 57276 [PSH, ACK] Seq=1 Ack=59 Win=5888 Len=559 TSval=3880630811 TSecr=536420889
 13 2016-05-06 13:45:53.238327 195.134.66.105 -> 146.48.123.72 TCP 66 57276 > 27017 [ACK] Seq=59 Ack=560 Win=30336 Len=0 TSval=536420906 TSecr=3880630811
 14 2016-05-06 13:45:53.238587 195.134.66.105 -> 146.48.123.72 MONGO 125 57276 > 27017 [PSH, ACK] Seq=59 Ack=560 Win=30336 Len=59 TSval=536420906 TSecr=3880630811
 15 2016-05-06 13:45:53.238811 146.48.123.72 -> 195.134.66.105 MONGO 889 27017 > 57276 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3880630828 TSecr=536420906
 16 2016-05-06 13:45:53.240033 195.134.66.105 -> 146.48.123.72 TCP 54 57276 > 27017 [RST, ACK] Seq=1 Ack=1 Win=29312 Len=0
 17 2016-05-06 13:45:53.503147 146.48.123.72 -> 195.134.66.105 MONGO 889 [TCP Retransmission] 27017 > 57276 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3880630894 TSecr=536420906
 18 2016-05-06 13:45:54.030144 146.48.123.72 -> 195.134.66.105 MONGO 889 [TCP Retransmission] 27017 > 57276 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3880631026 TSecr=536420906
...
 19 2016-05-06 13:45:55.086154 146.48.123.72 -> 195.134.66.105 MONGO 889 [TCP Retransmission] 27017 > 57276 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3880631290 TSecr=536420906

...
...
3275 2016-05-06 14:53:30.934149 146.48.123.72 -> 195.134.66.105 MONGO 889 [TCP Retransmission] 27017 > 33108 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3881645252 TSecr=537433283
3966 2016-05-06 15:03:02.234149 146.48.123.72 -> 195.134.66.105 MONGO 889 [TCP Retransmission] 27017 > 33543 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3881788077 TSecr=537577928
4955 2016-05-06 15:16:47.479604 195.134.66.105 -> 146.48.123.72 TCP 74 34149 > 27017 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=537784466 TSecr=0 WS=128
4956 2016-05-06 15:16:47.479674 146.48.123.72 -> 195.134.66.105 TCP 74 27017 > 34149 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 SACK_PERM=1 TSval=3881994388 TSecr=537784466 WS=128
4957 2016-05-06 15:16:47.572079 195.134.66.105 -> 146.48.123.72 TCP 66 34149 > 27017 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=537784483 TSecr=3881994388
4958 2016-05-06 15:16:47.572167 195.134.66.105 -> 146.48.123.72 MONGO 124 34149 > 27017 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=58 TSval=537784483 TSecr=3881994388
4959 2016-05-06 15:16:47.572193 146.48.123.72 -> 195.134.66.105 TCP 66 27017 > 34149 [ACK] Seq=1 Ack=59 Win=5888 Len=0 TSval=3881994411 TSecr=537784483
4960 2016-05-06 15:16:47.573418 146.48.123.72 -> 195.134.66.105 MONGO 625 27017 > 34149 [PSH, ACK] Seq=1 Ack=59 Win=5888 Len=559 TSval=3881994411 TSecr=537784483
4961 2016-05-06 15:16:47.638905 195.134.66.105 -> 146.48.123.72 TCP 66 34149 > 27017 [ACK] Seq=59 Ack=560 Win=30336 Len=0 TSval=537784506 TSecr=3881994411
4962 2016-05-06 15:16:47.639174 195.134.66.105 -> 146.48.123.72 MONGO 125 34149 > 27017 [PSH, ACK] Seq=59 Ack=560 Win=30336 Len=59 TSval=537784506 TSecr=3881994411
4963 2016-05-06 15:16:47.639371 146.48.123.72 -> 195.134.66.105 MONGO 889 27017 > 34149 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3881994428 TSecr=537784506
4964 2016-05-06 15:16:47.640624 195.134.66.105 -> 146.48.123.72 TCP 54 34149 > 27017 [RST, ACK] Seq=1 Ack=1 Win=29312 Len=0
4965 2016-05-06 15:16:47.926156 146.48.123.72 -> 195.134.66.105 MONGO 889 [TCP Retransmission] 27017 > 34149 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3881994500 TSecr=537784506
4966 2016-05-06 15:16:48.503180 146.48.123.72 -> 195.134.66.105 MONGO 889 [TCP Retransmission] 27017 > 34149 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3881994644 TSecr=537784506
...
4967 2016-05-06 15:16:49.654153 146.48.123.72 -> 195.134.66.105 MONGO 889 [TCP Retransmission] 27017 > 34149 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3881994932 TSecr=537784506
...

We have the same behavior running the app on our server but there is no reset and the connection continue with the data transmission and is correctly closed at the end.
You should try to understand why your network resets the connection, I have no way to investigate further on our side at the moment.

#15 Updated by Kostas Hirikakis over 3 years ago

  • Status changed from Closed to In Progress

I am sorry but I do not understand why this issue closed as the connection to the server is still refused.
We tried from two different static ip addresses,
* one behind a firewall (193.92.78.53) and
* one directly connected to the interne (195.134.66.105).
The behavior is the same: connection refused.
These two servers are connecting to a bunch of servers and ports, never failing, always completing the communication, data transmitted and closed at the end.
I don't see evidence why this side of the tcp connection drops the ball.
From what I see from the logs, the server refuses connection and times out, not the client.

#16 Updated by Panagiota Koltsida over 3 years ago

The colleagues from I2S asked me to try the to use the same client from my machine. It seems that I get the same error regarding the mongoDB connection.
My ip is: 195.134.66.42 and I tested the client in tomcat 7. Attached you can find the logs from catalina.out

Do you think it could be a problem related to the application server setup or maybe we miss any needed file?

#17 Updated by Roberto Cirillo over 3 years ago

  • Status changed from In Progress to Closed

Kostas Hirikakis wrote:

I am sorry but I do not understand why this issue closed as the connection to the server is still refused.
We tried from two different static ip addresses,
* one behind a firewall (193.92.78.53) and
* one directly connected to the interne (195.134.66.105).
The behavior is the same: connection refused.

Dear Kostas,
It is not true! The behavior is a ReadTimeOut and not a ConnectionRefused as you say. Where do you have seen ConnectionRefused?
In addition from tcp dump log the reset come from your network:

4964 2016-05-06 15:16:47.640624 195.134.66.105 -> 146.48.123.72 TCP 54 34149 > 27017 [RST, ACK] Seq=1 Ack=1 Win=29312 Len=0
4965 2016-05-06 15:16:47.926156 146.48.123.72 -> 195.134.66.105 MONGO 889 [TCP Retransmission] 27017 > 34149 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3881994500 TSecr=537784506
4966 2016-05-06 15:16:48.503180 146.48.123.72 -> 195.134.66.105 MONGO 889 [TCP Retransmission] 27017 > 34149 [PSH, ACK] Seq=560 Ack=118 Win=5888 Len=823 TSval=3881994644 TSecr=537784506

In particular, in this line there is the Reset [RST]:

4964 2016-05-06 15:16:47.640624 195.134.66.105 -> 146.48.123.72 TCP 54 34149 > 27017 [RST, ACK] Seq=1 Ack=1 Win=29312 Len=0

These two servers are connecting to a bunch of servers and ports, never failing, always completing the communication, data transmitted and closed at the end.
I don't see evidence why this side of the tcp connection drops the ball.

Maybe you use other standard ports and not a dynamic tcp ports.
The evidence come from tcpdump log:

4964 2016-05-06 15:16:47.640624 195.134.66.105 -> 146.48.123.72 TCP 54 34149 > 27017 [RST, ACK] Seq=1 Ack=1 Win=29312 Len=0

We could find a workaround if you couldn't resolve this issue. I'm going to re-close this ticket because we can't do anything about this issue.

#18 Updated by Gianpaolo Coro over 3 years ago

Dear Kostas, I think the best way for you to exploit the processing tools of the infrastructure is to use WPS, where you embed the file to process in the request and you get http links to download the results from. This will make you avoid these networking issues. Please, check this example I created for you and try to either use it directly or reporduce it in Java (using the same basic http requests): #3987

#19 Updated by Roberto Cirillo over 3 years ago

  • Status changed from Closed to Feedback

#20 Updated by Kostas Hirikakis over 3 years ago

Dear Roberto
the java Statistical Manager library (including whichever infrastructure supports it) have three different, auxiliary clients to serve. I can assure you that one of them (mine) is a fresh pure Debian installation, nothing added, nothing touched. Every client fails the simple test found in the documentation page. We are talking about three auxiliary clients, one at i2s and two at NKUA.
Even if this a fault on the client, if something is missing or is misconfigured, please advise on how to solve it.
In other words, what are the sw and/or hw specifications for "portal bundle 4.0.0-SNAPSHOT with tomcat 7.0.62" in order to be able to run the demo servlet?

#21 Updated by Kostas Hirikakis over 3 years ago

Gianpaolo, we will switch to the WPS if we cannot overtake the java client difficulties, what else could we do? That means that we lost a respectable amount of time troubleshooting, which is not even the hard point. WPS instead of Java means we are loosing strong types, available methods and all this stuff. Instead we get into XML reading and parsing, with all the potential problems it bears (malform, invalidation, incompletion etc).

#22 Updated by Tommaso Piccioli over 3 years ago

Kostas, could you please do other test today, asap? from one or, better,from all your clients?
We have completely disabled our firewall on the mongo servers.
After the test, we will double-check if there are other filters on our network.

#23 Updated by Panagiota Koltsida over 3 years ago

Hi @tommaso.piccioli@isti.cnr.it . I have tried with Kostas client but now I get another exception

org.gcube.common.clients.exceptions.DiscoveryException: no endpoints found for org.gcube.resources.discovery.client.queries.impl.XQuery@1dedd78f=declare namespace ic = 'http://gcube-system.org/namespaces/informationsystem/registry'; for $resource in collection('/db/Profiles/RunningInstance')//Document/Data/ic:Profile/Resource, $entry in $resource/Profile/AccessPoint/RunningInstanceInterfaces/Endpoint where ($resource/Profile/ServiceClass/text() eq 'DataAnalysis') and ($resource/Profile/ServiceName/text() eq 'statistical-manager-gcubews') and ($resource/Profile/DeploymentData/Status/text() eq 'ready') and ($entry/@EntryName/string() eq 'gcube/data/analysis/statisticalmanager/statisticalmanagerfactory') return $entry/text()
at org.gcube.common.clients.delegates.DiscoveryDelegate.make(DiscoveryDelegate.java:118)
at org.gcube.common.clients.delegates.AsyncProxyDelegate.make(AsyncProxyDelegate.java:56)
at org.gcube.data.analysis.statisticalmanager.proxies.StatisticalManagerDefaultFactory.getAlgorithms(StatisticalManagerDefaultFactory.java:177)

Is the statistical manager running on devVRE?

#24 Updated by Fabio Sinibaldi over 3 years ago

Hi, yes the service wasn't available in that VRE, now it is.
Please, keep in mind that you can inspect scopes status on the resource monitor instances in our portal (in this case https://dev.d4science.org/web/guest/monitor).

#25 Updated by Kostas Hirikakis over 3 years ago

Now that the Statistical Manager service is up I tested and got timeout error. @tommaso.piccioli@isti.cnr.it on the other hand I don't know you current firewall status so this may be a void test.

#26 Updated by Tommaso Piccioli over 3 years ago

Firewall was disabled and the tcpdump shows always the same pattern, there is a connection reset coming from your network IP.

#27 Updated by Kostas Hirikakis over 3 years ago

@tommaso.piccioli@isti.cnr.it just to be clear, the connection reset comes from "our IPs", as already described (ie this is not a special case affecting one client).

#28 Updated by Denis Pyriochos over 3 years ago

  • Status changed from Feedback to In Progress

#29 Updated by Denis Pyriochos over 3 years ago

  • Related to Support #4136: Cannot access mongo database server added

#30 Updated by Roberto Cirillo over 3 years ago

Dear Kostas, please, could you perform another test from all your clients asap?

#31 Updated by Denis Pyriochos over 3 years ago

The difficult setup (portal behind router) seems to be working - no timeouts. We will be testing on more servers over the next couple of days.

#32 Updated by Tommaso Piccioli over 3 years ago

  • Assignee changed from Roberto Cirillo to _InfraScience Systems Engineer

Please let us know if you are done with this issue tests and everything is fine

#33 Updated by Tommaso Piccioli over 3 years ago

  • Status changed from In Progress to Feedback

#34 Updated by Denis Pyriochos over 3 years ago

  • Status changed from Feedback to Closed

Also available in: Atom PDF