[ovs-dev] [PATCH v3] [python] Don't mix system poll/monkeypatched select
Ilya Maximets
i.maximets at ovn.org
Wed Jun 16 10:24:41 UTC 2021
On 6/15/21 10:06 PM, Ilya Maximets wrote:
> On 6/11/21 4:29 PM, Terry Wilson wrote:
>> This is a partial revert of c1aa16d19, but keeps its test.
>>
>> Applications that use eventlet cannot use poll() without blocking.
>> To fix an issue where the check_connection_completion() code would
>> not detect connection errors when using select(), we switched to
>> using the system poll() for checking the connection while leaving
>> the select-based poller the default. This mixes monkeypatched and
>> unpatched code and caused eventlet-based code to block every time
>> we connect.
One more question:
Why exactly this happens? I mean, we're calling poll(0) that should
not block, why eventlet is blocked in this case?
>>
>> According to the connect(2) man page, you can detect connect()
>> errors when using select():
>>
>> After select(2) indicates writability, use getsockopt(2) to read
>> the SO_ERROR option at level SOL_SOCKET to determine whether
>> connect() completed successfully (SO_ERROR is zero) or
>> unsuccessfully (SO_ERROR is one of the usual error codes listed
>> here, explaining the reason for the failure)
>>
>> so this patch does that instead.
>>
>> Signed-off-by: Terry Wilson <twilson at redhat.com>
>> ---
>> python/ovs/poller.py | 35 ++---------------------------------
>> python/ovs/socket_util.py | 9 ++++++---
>> 2 files changed, 8 insertions(+), 36 deletions(-)
>>
>> diff --git a/python/ovs/poller.py b/python/ovs/poller.py
>> index 3624ec865..7b3238d6d 100644
>> --- a/python/ovs/poller.py
>> +++ b/python/ovs/poller.py
>> @@ -31,22 +31,14 @@ except ImportError:
>> SSL = None
>>
>> try:
>> - from eventlet import patcher as eventlet_patcher
>> + import eventlet.patcher
>>
>> def _using_eventlet_green_select():
>> - return eventlet_patcher.is_monkey_patched(select)
>> + return eventlet.patcher.is_monkey_patched(select)
>> except:
>> - eventlet_patcher = None
>> -
>> def _using_eventlet_green_select():
>> return False
>>
>> -try:
>> - from gevent import monkey as gevent_monkey
>> -except:
>> - gevent_monkey = None
>> -
>> -
>> vlog = ovs.vlog.Vlog("poller")
>>
>> POLLIN = 0x001
>> @@ -265,26 +257,3 @@ class Poller(object):
>> def __reset(self):
>> self.poll = SelectPoll()
>> self.timeout = -1
>> -
>> -
>> -def get_system_poll():
>> - """Returns the original select.poll() object. If select.poll is
>> - monkey patched by eventlet or gevent library, it gets the original
>> - select.poll and returns an object of it using the
>> - eventlet.patcher.original/gevent.monkey.get_original functions.
>> -
>> - As a last resort, if there is any exception it returns the
>> - SelectPoll() object.
>> - """
>> - try:
>> - if _using_eventlet_green_select():
>> - _system_poll = eventlet_patcher.original("select").poll
>> - elif gevent_monkey and gevent_monkey.is_object_patched(
>> - 'select', 'poll'):
>> - _system_poll = gevent_monkey.get_original('select', 'poll')
>> - else:
>> - _system_poll = select.poll
>> - except:
>> - _system_poll = SelectPoll
>> -
>> - return _system_poll()
>> diff --git a/python/ovs/socket_util.py b/python/ovs/socket_util.py
>> index 3faa64e9d..7dd000916 100644
>> --- a/python/ovs/socket_util.py
>> +++ b/python/ovs/socket_util.py
>> @@ -159,8 +159,8 @@ def make_unix_socket(style, nonblock, bind_path, connect_path, short=False):
>>
>>
>> def check_connection_completion(sock):
>> + p = ovs.poller.SelectPoll()
>> if sys.platform == "win32":
>> - p = ovs.poller.SelectPoll()
>> event = winutils.get_new_event(None, False, True, None)
>> # Receive notification of readiness for writing, of completed
>> # connection or multipoint join operation, and of socket closure.
>> @@ -170,12 +170,15 @@ def check_connection_completion(sock):
>> win32file.FD_CLOSE)
>> p.register(event, ovs.poller.POLLOUT)
>> else:
>> - p = ovs.poller.get_system_poll()
>> p.register(sock, ovs.poller.POLLOUT)
>> pfds = p.poll(0)
>> if len(pfds) == 1:
>> revents = pfds[0][1]
>> - if revents & ovs.poller.POLLERR or revents & ovs.poller.POLLHUP:
>> + if revents & ovs.poller.POLLOUT:
>> + # This is how select() indicates a connect() error
>
> This wording doesn't sound correct to me as it might be an error and might
> be not, IIUC. POLLOUT here just means that something happened. It might
> be an error or the actual POLLOUT.
> (Ugh... select()/poll() are so inconsistent across different platforms)
>
>> + return sock.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
>
> Looking at the history, it seems that ESX doesn't support SO_ERROR (at least
> some old versions), so we need to avoid use of it.
> See commit d6cedfd9d29d ("socket-util: Avoid using SO_ERROR.")
>
> We, likely, need to treat this in the same path as error. i.e. invoke send()
> and catch the exception. However, for the POLLOUT case we should not
> log and return the error if send() succeeds. But this doesn't sound like
> a good idea to send NULL bytes to everyone on success...
>
> So, some options here:
>
> 1. Keep as is, i.e. system poll()
> --> python will be blocked by poll()
>
> 2. Fully revert c1aa16d19, i.e. just use select()
> --> connections will be treated as established even if they are not
> until the actual read/write.
>
> 3. Use select() + getsockopt(SO_ERRROR), effectively revert d6cedfd9d29d
> --> give up on systems that doesn't support SO_ERRROR. (ESX?)
>
> 4. Use select() + getsockopt(SO_ERRROR), keep returning 0 (success) on
> ENOPROTOOPT
> --> systems that doesn't support might think that connections are
> established while they are not.
>
> 5. Try to detect if SO_ERRROR supported or not and use system poll() or
> select() + getsockopt(SO_ERRROR) based on that fact.
> --> better performance on systems that supports and same as now on
> systems that doesn't.
>
> Option 3 is, probably, OK, but I don't know.
> Option 4 might be the best choice, but I'm not quiet sure how to detect.
> Is checking for ENOPROTOOPT enough (same actually applies to option 3
> as I'm not sure if ESX actually reports ENOPROTOOPT)?
>
> Also, we're using select() in C code for windows. Do we have the same
> issue there? Alin, do you know if check_connection_completion() correctly
> detects connection failure on windows, e.g. if 'Check stream open block'
> tests are working?
>
> Any thoughts?
>
> Ben, what do you think?
>
> Best regards, Ilya Maximets.
>
More information about the dev
mailing list