Ticket #42 (closed defect: fixed)
I problem with probes? this is in the latest pre
Reported by: | bradshaw | Owned by: | desai |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | bcfg2-server | Version: | |
Keywords: | Cc: |
Description
Unknown failure Traceback (most recent call last):
File "/usr/lib/python2.3/site-packages/Bcfg2/Client/Proxy.py", line 38, in run_me\thod
ret = apply(method, self._authinfo + method_args)
File "/usr/lib/python2.3/xmlrpclib.py", line 1032, in call
return self.send(self.name, args)
File "/usr/lib/python2.3/xmlrpclib.py", line 1319, in request
verbose=self.verbose
File "/usr/lib/python2.3/xmlrpclib.py", line 1065, in request
self.send_content(h, request_body)
File "/usr/lib/python2.3/xmlrpclib.py", line 1179, in send_content
connection.endheaders()
File "/usr/lib/python2.3/httplib.py", line 715, in endheaders
self._send_output()
File "/usr/lib/python2.3/httplib.py", line 600, in _send_output
self.send(msg)
File "/usr/lib/python2.3/httplib.py", line 567, in send
self.connect()
File "/usr/lib/python2.3/httplib.py", line 988, in connect
ssl = socket.ssl(sock, self.key_file, self.cert_file)
File "/usr/lib/python2.3/socket.py", line 73, in ssl
return _realssl(sock, keyfile, certfile)
sslerror: (8, 'EOF occurred in violation of protocol') Failed to download probes from bcfg2
The server process was running, but it wasn't logging anything for 2 days, so I can't tell you what happened on the server. All I can say is that a restart fixed the problem.
Attachments
Change History
comment:2 Changed 17 years ago by bradshaw
- Reporter changed from anonymous to bradshaw
I just had this happen again with the 0.8.2pre3 code I am running on Chiba. So here is what I can show you.
An strace of the server seems normal. Just running the select loop.
select(7, [4 6], [], [], {9, 384000}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout)
and new connections seem to run just fine. So my thought is that the congestion control or whatever is busted. I seem to only see this when I parallel shell the command across the cluster.
Any thoughts on how to better troubleshoot this?
comment:3 Changed 17 years ago by desai
What is the client side doing when it hangs? (try stracing/lsofing it) Is this easily reproducable?
comment:4 Changed 17 years ago by desai
- Status changed from assigned to closed
- Resolution set to fixed
I think this is the same problem reported by Pedro today. Basically, SSL negotiation errors weren't explicitly handled in the xmlrpc code. This caused transient SSL error conditions to sink the client. (no retries, and the unexpected condition hits a fatal error path)I believe this is fixed in [055dd056560b7b05ec7c1e2e9d0c5dd699e4e71a] (SVN r1848)
We need to strace and lsof this if it happens again.