[ovs-dev] [PATCH v3] [python] Don't mix system poll/monkeypatched select

Wed Jun 16 10:24:41 UTC 2021

On 6/15/21 10:06 PM, Ilya Maximets wrote:
> On 6/11/21 4:29 PM, Terry Wilson wrote:
>> This is a partial revert of c1aa16d19, but keeps its test.
>>
>> Applications that use eventlet cannot use poll() without blocking.
>> To fix an issue where the check_connection_completion() code would
>> not detect connection errors when using select(), we switched to
>> using the system poll() for checking the connection while leaving
>> the select-based poller the default. This mixes monkeypatched and
>> unpatched code and caused eventlet-based code to block every time
>> we connect.

One more question:
Why exactly this happens?  I mean, we're calling poll(0) that should
not block, why eventlet is blocked in this case?

>>
>> According to the connect(2) man page, you can detect connect()
>> errors when using select():
>>
>>   After select(2) indicates writability, use getsockopt(2) to read
>>   the SO_ERROR option at level SOL_SOCKET to determine whether
>>   connect() completed successfully (SO_ERROR is zero) or
>>   unsuccessfully (SO_ERROR is one of the usual error codes listed
>>   here, explaining the reason for the failure)
>>
>> so this patch does that instead.
>>
>> Signed-off-by: Terry Wilson <twilson at redhat.com>
>> ---
>>  python/ovs/poller.py      | 35 ++---------------------------------
>>  python/ovs/socket_util.py |  9 ++++++---
>>  2 files changed, 8 insertions(+), 36 deletions(-)
>>
>> diff --git a/python/ovs/poller.py b/python/ovs/poller.py
>> index 3624ec865..7b3238d6d 100644
>> --- a/python/ovs/poller.py
>> +++ b/python/ovs/poller.py
>> @@ -31,22 +31,14 @@ except ImportError:
>>      SSL = None
>>  
>>  try:
>> -    from eventlet import patcher as eventlet_patcher
>> +    import eventlet.patcher
>>  
>>      def _using_eventlet_green_select():
>> -        return eventlet_patcher.is_monkey_patched(select)
>> +        return eventlet.patcher.is_monkey_patched(select)
>>  except:
>> -    eventlet_patcher = None
>> -
>>      def _using_eventlet_green_select():
>>          return False
>>  
>> -try:
>> -    from gevent import monkey as gevent_monkey
>> -except:
>> -    gevent_monkey = None
>> -
>> -
>>  vlog = ovs.vlog.Vlog("poller")
>>  
>>  POLLIN = 0x001
>> @@ -265,26 +257,3 @@ class Poller(object):
>>      def __reset(self):
>>          self.poll = SelectPoll()
>>          self.timeout = -1
>> -
>> -
>> -def get_system_poll():
>> -    """Returns the original select.poll() object. If select.poll is
>> -    monkey patched by eventlet or gevent library, it gets the original
>> -    select.poll and returns an object of it using the
>> -    eventlet.patcher.original/gevent.monkey.get_original functions.
>> -
>> -    As a last resort, if there is any exception it returns the
>> -    SelectPoll() object.
>> -    """
>> -    try:
>> -        if _using_eventlet_green_select():
>> -            _system_poll = eventlet_patcher.original("select").poll
>> -        elif gevent_monkey and gevent_monkey.is_object_patched(
>> -                'select', 'poll'):
>> -            _system_poll = gevent_monkey.get_original('select', 'poll')
>> -        else:
>> -            _system_poll = select.poll
>> -    except:
>> -        _system_poll = SelectPoll
>> -
>> -    return _system_poll()
>> diff --git a/python/ovs/socket_util.py b/python/ovs/socket_util.py
>> index 3faa64e9d..7dd000916 100644
>> --- a/python/ovs/socket_util.py
>> +++ b/python/ovs/socket_util.py
>> @@ -159,8 +159,8 @@ def make_unix_socket(style, nonblock, bind_path, connect_path, short=False):
>>  
>>  
>>  def check_connection_completion(sock):
>> +    p = ovs.poller.SelectPoll()
>>      if sys.platform == "win32":
>> -        p = ovs.poller.SelectPoll()
>>          event = winutils.get_new_event(None, False, True, None)
>>          # Receive notification of readiness for writing, of completed
>>          # connection or multipoint join operation, and of socket closure.
>> @@ -170,12 +170,15 @@ def check_connection_completion(sock):
>>                                   win32file.FD_CLOSE)
>>          p.register(event, ovs.poller.POLLOUT)
>>      else:
>> -        p = ovs.poller.get_system_poll()
>>          p.register(sock, ovs.poller.POLLOUT)
>>      pfds = p.poll(0)
>>      if len(pfds) == 1:
>>          revents = pfds[0][1]
>> -        if revents & ovs.poller.POLLERR or revents & ovs.poller.POLLHUP:
>> +        if revents & ovs.poller.POLLOUT:
>> +            # This is how select() indicates a connect() error
> 
> This wording doesn't sound correct to me as it might be an error and might
> be not, IIUC.  POLLOUT here just means that something happened.  It might
> be an error or the actual POLLOUT.
> (Ugh... select()/poll() are so inconsistent across different platforms)
> 
>> +            return sock.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
> 
> Looking at the history, it seems that ESX doesn't support SO_ERROR (at least
> some old versions), so we need to avoid use of it.
> See commit d6cedfd9d29d ("socket-util: Avoid using SO_ERROR.")
> 
> We, likely, need to treat this in the same path as error. i.e. invoke send()
> and catch the exception.  However, for the POLLOUT case we should not
> log and return the error if send() succeeds.  But this doesn't sound like
> a good idea to send NULL bytes to everyone on success...
> 
> So, some options here:
> 
> 1. Keep as is, i.e. system poll()
>    --> python will be blocked by poll()
> 
> 2. Fully revert c1aa16d19, i.e. just use select()
>    --> connections will be treated as established even if they are not
>        until the actual read/write.
> 
> 3. Use select() + getsockopt(SO_ERRROR), effectively revert d6cedfd9d29d
>    --> give up on systems that doesn't support SO_ERRROR. (ESX?)
> 
> 4. Use select() + getsockopt(SO_ERRROR), keep returning 0 (success) on
>    ENOPROTOOPT
>    --> systems that doesn't support might think that connections are
>        established while they are not.
> 
> 5. Try to detect if SO_ERRROR supported or not and use system poll() or
>    select() + getsockopt(SO_ERRROR) based on that fact.
>    --> better performance on systems that supports and same as now on
>        systems that doesn't.
> 
> Option 3 is, probably, OK, but I don't know.
> Option 4 might be the best choice, but I'm not quiet sure how to detect.
> Is checking for ENOPROTOOPT enough (same actually applies to option 3
> as I'm not sure if ESX actually reports ENOPROTOOPT)?
> 
> Also, we're using select() in C code for windows.  Do we have the same
> issue there?  Alin, do you know if check_connection_completion() correctly
> detects connection failure on windows, e.g. if 'Check stream open block'
> tests are working?
> 
> Any thoughts?
> 
> Ben, what do you think?
> 
> Best regards, Ilya Maximets.
>