Modify

Ticket #42 (closed defect: fixed)

Opened 13 years ago

Last modified 12 years ago

I problem with probes? this is in the latest pre

Reported by: bradshaw Owned by: desai
Priority: major Milestone:
Component: bcfg2-server Version:
Keywords: Cc:

Description

Unknown failure Traceback (most recent call last):

File "/usr/lib/python2.3/site-packages/Bcfg2/Client/Proxy.py", line 38, in run_me\thod

ret = apply(method, self._authinfo + method_args)

File "/usr/lib/python2.3/xmlrpclib.py", line 1032, in call

return self.send(self.name, args)

File "/usr/lib/python2.3/xmlrpclib.py", line 1319, in request

verbose=self.verbose

File "/usr/lib/python2.3/xmlrpclib.py", line 1065, in request

self.send_content(h, request_body)

File "/usr/lib/python2.3/xmlrpclib.py", line 1179, in send_content

connection.endheaders()

File "/usr/lib/python2.3/httplib.py", line 715, in endheaders

self._send_output()

File "/usr/lib/python2.3/httplib.py", line 600, in _send_output

self.send(msg)

File "/usr/lib/python2.3/httplib.py", line 567, in send

self.connect()

File "/usr/lib/python2.3/httplib.py", line 988, in connect

ssl = socket.ssl(sock, self.key_file, self.cert_file)

File "/usr/lib/python2.3/socket.py", line 73, in ssl

return _realssl(sock, keyfile, certfile)

sslerror: (8, 'EOF occurred in violation of protocol') Failed to download probes from bcfg2

The server process was running, but it wasn't logging anything for 2 days, so I can't tell you what happened on the server. All I can say is that a restart fixed the problem.

Attachments

Change History

comment:1 Changed 13 years ago by desai

  • Status changed from new to assigned

We need to strace and lsof this if it happens again.

comment:2 Changed 13 years ago by bradshaw

  • Reporter changed from anonymous to bradshaw

I just had this happen again with the 0.8.2pre3 code I am running on Chiba. So here is what I can show you.

An strace of the server seems normal. Just running the select loop.

select(7, [4 6], [], [], {9, 384000}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout) select(7, [4 6], [], [], {15, 0}) = 0 (Timeout)

and new connections seem to run just fine. So my thought is that the congestion control or whatever is busted. I seem to only see this when I parallel shell the command across the cluster.

Any thoughts on how to better troubleshoot this?

comment:3 Changed 13 years ago by desai

What is the client side doing when it hangs? (try stracing/lsofing it) Is this easily reproducable?

comment:4 Changed 13 years ago by desai

  • Status changed from assigned to closed
  • Resolution set to fixed

I think this is the same problem reported by Pedro today. Basically, SSL negotiation errors weren't explicitly handled in the xmlrpc code. This caused transient SSL error conditions to sink the client. (no retries, and the unexpected condition hits a fatal error path)I believe this is fixed in [055dd056560b7b05ec7c1e2e9d0c5dd699e4e71a] (SVN r1848)

WARNING! You need to establish a session before you can create or edit tickets. Otherwise the ticket will get treated as spam.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.