[ovs-dev] [PATCH] netlink-socket: Check for null sock in nl_sock_recv__()

Murilo Opsfelder Araújo muriloo at linux.ibm.com
Thu Nov 18 19:56:59 UTC 2021


On 11/16/21 19:31, Ilya Maximets wrote:
> On 10/25/21 19:45, David Christensen wrote:
>> In certain high load situations, such as when creating a large number of
>> ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
>> as null, resulting in a segmentation fault when 'sock' is later
>> dereferenced, such as when calling recvmsg().
> 
> Hi, David.  Thanks for the patch.
> 
> It's OK to check for a NULL pointer there, I guess.  However,
> do you know from where it was actually called?  This function,
> in general, should not be called without the actual socket,
> so we, probably, should fix the caller instead.
> 
> Best regards, Ilya Maximets.

Hi, Ilya Maximets.

When I looked at the coredump file, ch->sock was nil and was passed to nl_sock_recv():

(gdb) l
2701
2702        while (handler->event_offset < handler->n_events) {
2703            int idx = handler->epoll_events[handler->event_offset].data.u32;
2704            struct dpif_channel *ch = &dpif->channels[idx];

(gdb) p idx
$26 = 4
(gdb) p *dpif->channels at 5
$27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock = 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 0x1001ae09510, last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll = 60756950}, {sock = 0x0,
     last_poll = 61340749}}


The above snippet is from lib/dpif-netlink.c and the caller is dpif_netlink_recv_vport_dispatch().

The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() via ch->sock parameter.
In that function, it tried to access sock->fd when calling recvmsg(), causing the segfault.

I'm not enough experienced in Open vSwitch to explain why sock was nil at that given index.
The fix seems worth, though.

Cheers!

> 
>>
>> The ovs-vswitchd.log will display something like this:
>>
>>      fatal_signal(revalidator138)|WARN|terminating with signal 11 (signal 11)
>>
>> Tested this change under the same circumstances that originally generated
>> the segmentation fault and it ran successfully for four days without any
>> issue.
>>
>> Signed-off-by: Murilo Opsfelder Araujo <muriloo at linux.ibm.com>
>> Signed-off-by: David Christensen <drc at linux.vnet.ibm.com>
>> IBM-BZ: #193057
>> ---
>>   lib/netlink-socket.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
>> index 5867de564..3ab4d8485 100644
>> --- a/lib/netlink-socket.c
>> +++ b/lib/netlink-socket.c
>> @@ -653,6 +653,10 @@ nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf, int *nsid, bool wait)
>>       int *ptr;
>>       int error;
>>   
>> +    if (sock == NULL) {
>> +        return ECONNRESET;
>> +    }
>> +
>>       ovs_assert(buf->allocated >= sizeof *nlmsghdr);
>>       ofpbuf_clear(buf);
>>   
>>
> 


-- 
Murilo


More information about the dev mailing list