[ovs-dev] [PATCH ovn] expr: Combine multiple ipv4 with wildcard mask.

Dumitru Ceara dceara at redhat.com
Wed Mar 24 09:53:29 UTC 2021


On 3/22/21 1:10 PM, gmingchen(陈供明) wrote:
> 
> 
> On 2021/3/19, 9:02 PM, "Dumitru Ceara" <dceara at redhat.com> wrote:
> 
>     On 3/10/21 2:29 PM, gmingchen(陈供明) wrote:
>     > From: Gongming Chen <gmingchen at tencent.com>
>     > 
>     > This patch merges ipv4 addresses with wildcard masks, and replaces this
>     > ipv4 addresses with the combined ip/mask. This will greatly reduce the
>     > entries in the ovs security group flow table, especially when the host
>     > size is large.
>     > 
>     > Analysis in the simplest scenario, a network 1.1.1.0/24 network,create 253
>     > ports(1.1.1.2-1.1.1.254).
>     > Only focus on the number of ip addresses, the original 253 addresses will
>     > be combined into 13 addresses.
>     > 1.1.1.2/31
>     > 1.1.1.4/30
>     > 1.1.1.8/29
>     > 1.1.1.16/28
>     > 1.1.1.32/27
>     > 1.1.1.64/26
>     > 1.1.1.128/26
>     > 1.1.1.192/27
>     > 1.1.1.224/28
>     > 1.1.1.240/29
>     > 1.1.1.248/30
>     > 1.1.1.252/31
>     > 1.1.1.254
>     > 
>     > Some scenes are similar to the following:
>     >     1.1.1.2, 1.1.1.6
>     > After the combine:
>     >     1.1.1.2/255.255.255.251
>     > You can use ovn-match-ip utility to match ip.
>     > such as:
>     >     ovs-ofctl dump-flows br-int | ovn-match-ip 1.1.1.6
>     >     1.1.1.2/255.255.255.251 will show.
>     > 
>     > Simple description of the algorithm.
>     > There are two situations
>     > 1. Combine once
>     > such as:
>     >     1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1
>     >     Combined into: 1.1.1.0/31, 1.0.1.0/31
>     > 2. Combine multiple times
>     >     1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1
>     >     Combined into: 1.0.1.0/255.254.255.254
>     > 
>     > Considering the actual scene and simplicity, the first case is used to
>     > combine once.
>     > 
>     > ...00...
>     > ...01...
>     > ...10...
>     > ...11...
>     > "..." means the same value omitted.
>     > Obviously, the above value can be expressed as ...00.../11100111. This
>     > continuous interval that can be represented by one or several wildcard
>     > masks is called a segment.
>     > Only if all 2<<n values from the minimum value 00...(n) to 11...(n)
>     > exist, can they be combined into 00...(n)/00...( n)
>     > 
>     > First sort all the values by size. Iterate through each value.
>     > 1. Find a new segment, where two values differ only by 1 bit, such as
>     > ...0... and ...1...
>     >     diff = ip_next ^ ip
>     >     if (diff & (diff-1)) == 0
>     >         new_segment = true
>     > The first non-zero place in the high direction of ip is the end of the
>     > segment(segment_end).
>     > For example...100... and...101..., the segment_end is ...111...
>     > 
>     > 2. Count the number of consecutive and less than continuous_size in the
>     > segment.
>     >     diff = ip_next - ip
>     >     if (diff & (diff-1)) == 0 && ip_next <= segment_end
>     >         continuous_size++
>     > 
>     > 3. Combine different ip intervals in the segment according to
>     > continuous_size.
>     > In continuous_size, from the highest bit of 1 to the lowest bit of 1, in
>     > the order of segment start, each bit that is 1 is a different ip interval
>     > that can be combined with a wildcard mask.
>     > For example, 000, 001, 010:
>     >     continuous_size: 3 (binary 11), segment_start: 000
>     >     mask: ~(1 << 1 - 1) = 110; ~(1 << 0 - 1) = 111;
>     >     Combined to: 000/110, 010/111
>     > 
>     > 4. The ip that cannot be recorded in a segment will not be combined.
>     > 
>     > Signed-off-by: Gongming Chen <gmingchen at tencent.com>
>     > ---
> 
>     Hi Gongming,
> 
>     Sorry for the delayed review.
> 
>     I have a few general remarks/concerns and some specific comments inline.
>      First, the general remarks.
> 
>     I'm wondering if it would make more sense for this wildcard combination
>     of IPs to be done outside OVN, in the CMS, for the following reasons:
> 
>     - it's IPv4 specific.
>     - the CMS usually has more information about when it is matching on IP
>     sets and can probably optimize the match it uses in the ACL.
>     - the code in expr_const_sets_add() used to be a relatively direct
>     translation from sets of constants to sets of port names/IPs; this
>     changes with the optimization your proposing.
>     - the new utility, ovn-match-ip, would be useful but I'm worried that
>     users will often forget to use it.
> 
>     I'm curious about your and the maintainers' opinion on these points.
> 
> Hi Dumitru,
> First, thanks for your review.
> 
> On the one hand, I agree with your point of view. The cms side is more
> clear about the details of the ip, and the ip can be better optimized.
> However, on the other hand, I think that ovn, as a network provider, should
> not assume that upper layers such as cms have already made these optimizations.
> In fact, some cms do not have such a function, such as openstack, k8s,
> and they hope that this specific and complex work will be realized by
> the network provider.
> In terms of better serving the upper layer or upper layer use, this
> optimization function of ovn simplifies the upper layer work, which
> will just make users more satisfied.
> This is some of my views from the perspective of cms.

I see, I'll let Numan, Mark, and others share their opinions on this too.

> 
>     Another general remark is to increment the patch version when submitting
>     a new revision.  From what I can see this specific patch is at v7.  When
>     sending a v8 you could add "-v8" when generating the patch, e.g.:
> 
>     git format-patch --subject-prefix="PATCH ovn" -v8 -M origin/master
> 
>     Would generate a patch with subject:
> 
>     "Subject: [PATCH ovn v8] ...."
> 
>     This usually helps when reviewing patches that go through multiple
>     iterations.
> 
> Thanks.
> 
>     More specific comments inline.
> 
>     >  debian/ovn-common.install |   1 +
>     >  lib/expr.c                | 219 ++++++++++++++++++++++++++++++++++++++
>     >  rhel/ovn-fedora.spec.in   |   1 +
>     >  tests/ovn.at              | 136 +++++++++++------------
>     >  tests/test-ovn.c          |   9 ++
>     >  utilities/automake.mk     |   5 +-
>     >  utilities/ovn-match-ip.in |  78 ++++++++++++++
>     >  7 files changed, 382 insertions(+), 67 deletions(-)
>     >  create mode 100755 utilities/ovn-match-ip.in
>     > 
>     > diff --git a/debian/ovn-common.install b/debian/ovn-common.install
>     > index 8e5915724..0095df918 100644
>     > --- a/debian/ovn-common.install
>     > +++ b/debian/ovn-common.install
>     > @@ -5,6 +5,7 @@ usr/bin/ovn-ic-nbctl
>     >  usr/bin/ovn-ic-sbctl
>     >  usr/bin/ovn-trace
>     >  usr/bin/ovn-detrace
>     > +usr/bin/ovn-match-ip
>     >  usr/share/ovn/scripts/ovn-ctl
>     >  usr/share/ovn/scripts/ovndb-servers.ocf
>     >  usr/share/ovn/scripts/ovn-lib
>     > diff --git a/lib/expr.c b/lib/expr.c
>     > index f061a8fbe..0a7203817 100644
>     > --- a/lib/expr.c
>     > +++ b/lib/expr.c
>     > @@ -1060,6 +1060,200 @@ expr_constant_set_destroy(struct expr_constant_set *cs)
>     >      }
>     >  }
>     >  
>     > +struct ip_in
>     > +{
>     > +    uint32_t *ip;
>     > +    size_t size;
>     > +};
> 
>     I don't think we really need this, we could just pass the IPv4 address
>     array to combine_ipv4_in_mask(), along with the number of items in the
>     array.
> 
> Yes, in combine_ipv4_in_mask(), size of ipv4 array is not necessary.
> But I think that the structure composed of *ip and size is more helpful for
> expr_const_sets_add to initialize an array memory of n_values size instead
> of accurately malloc the entry size of LEX_F_IPV4.
> Because expr_const_sets_add does not know the number of LEX_F_IPV4
> when initializing the ipv4 array.
> What do you think?

In expr_const_sets_add() you currently allocate 'n_values' entries in
ip_in_data.ip and then update ip_in_data.size to the exact number of
non-masked IPv4 tokens.

I don't see a need for 'size' to be stored in 'ip_in_data', it can just
be local to expr_const_sets_add() and passed as argument to
combine_ipv4_in_mask().

> 
>     > +
>     > +struct ip_out_entry
>     > +{
>     > +    uint32_t ip;
>     > +    uint32_t mask;
>     > +    bool masked;
>     > +};
>     > +
>     > +struct ip_out
>     > +{
>     > +    struct ip_out_entry *entries;
>     > +    size_t used;
>     > +    size_t size;
>     > +};
> 
>     AFAIU, we can never have more "out" entries than "in" entries.  This
>     means we could be conservative and always allocate as many output
>     entries as ip_in entries and pass the preallocated struct ip_out_entry
>     array to combine_ipv4_in_mask().
> 
>     This would also remove the need to resize the "out" array.
> 
> Yes,  good idea.
> 
>     > +
>     > +static int
>     > +compare_mask_ip(const void *a, const void *b)
> 
>     The implementation doesn't really use the fact that 'a' and 'b' are IPs.
>      They're just compared as integers.  We might want to use a different
>     name and move this function to a common module as it could potentially
>     be used in multiple places in the future.
> 
> How about adding compare_uint32s(const void *a_, const void *b_) to lib/util.c?

Sounds good to me.  I guess you mean lib/ovn-util.c though.

> 
>     > +{
>     > +    uint32_t a_ = *(uint32_t *)a;
>     > +    uint32_t b_ = *(uint32_t *)b;
>     > +
>     > +    return a_ < b_ ? -1 : a_ > b_;
>     > +}
>     > +
>     > +/* Function to check ip return data and xrealloc. */
>     > +static void
>     > +check_realloc_ip_out(struct ip_out *ip_out_data){
>     > +    if (ip_out_data->used >= ip_out_data->size) {
>     > +        ip_out_data->entries = x2nrealloc(ip_out_data->entries,
>     > +                                    &ip_out_data->size,
>     > +                                    sizeof *ip_out_data->entries);
>     > +    }
>     > +}
> 
>     As mentioned in the comment earlier, we can probably avoid using this
>     function.
> 
> Yes.
> 
>     > +
>     > +/* Simple description of the algorithm.
>     > + * There are two situations
>     > + * 1. Combine once
>     > + * such as:
>     > + *     1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1
>     > + *     Combined into: 1.1.1.0/31, 1.0.1.0/31
>     > + * 2. Combine multiple times
>     > + *     1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1
>     > + *     Combined into: 1.0.1.0/255.254.255.254
>     > + *
>     > + * Considering the actual scene and simplicity, the first case is used to
>     > + * combine once.
>     > + *
>     > + * ...00...
>     > + * ...01...
>     > + * ...10...
>     > + * ...11...
>     > + * "..." means the same value omitted.
>     > + * Obviously, the above value can be expressed as ...00.../11100111. This
>     > + * continuous interval that can be represented by one or several wildcard
>     > + * masks is called a segment.
>     > + * Only if all 2<<n values from the minimum value 00...(n) to 11...(n)
>     > + * exist, can they be combined into 00...(n)/00...( n)
>     > + *
>     > + * First sort all the values by size. Iterate through each value.
>     > + * 1. Find a new segment, where two values differ only by 1 bit, such as
>     > + * ...0... and ...1...
>     > + *     diff = ip_next ^ ip
>     > + *     if (diff & (diff-1)) == 0
>     > + *         new_segment = true
>     > + * The first non-zero place in the high direction of ip is the end of the
>     > + * segment(segment_end).
>     > + * For example...100... and...101..., the segment_end is ...111...
>     > + *
>     > + * 2. Count the number of consecutive and less than continuous_size in the
>     > + * segment.
>     > + *     diff = ip_next - ip
>     > + *     if (diff & (diff-1)) == 0 && ip_next <= segment_end
>     > + *         continuous_size++
>     > + *
>     > + * 3. Combine different ip intervals in the segment according to
>     > + * continuous_size.
>     > + * In continuous_size, from the highest bit of 1 to the lowest bit of 1, in
>     > + * the order of segment start, each bit that is 1 is a different ip interval
>     > + * that can be combined with a wildcard mask.
>     > + * For example, 000, 001, 010:
>     > + *     continuous_size: 3 (binary 11), segment_start: 000
>     > + *     mask: ~(1 << 1 - 1) = 110; ~(1 << 0 - 1) = 111;
>     > + *     Combined to: 000/110, 010/111
>     > + *
>     > + * 4. The ip that cannot be recorded in a segment will not be combined. */
> 
>     Thanks for adding the detailed description here.  However, it's a bit
>     confusing because the names used in the description don't match the
>     variable names used in the implementation below.  Maybe it's better to
>     not use names in the description?
> 
> Ok, It would be better to replace it with a general description word.

Thanks.

> 
>     > +static void
>     > +combine_ipv4_in_mask(struct ip_in *ip_in_data, struct ip_out *ip_out_data){
> 
>     Please move the curly brace on a new line.
> 
> Ok,  thanks.
> 
>     > +    bool recording = false;
>     > +    uint32_t i, diff, connect, start, end, continuous_size, mask_base;
> 
>     Some of these can be moved in the specific blocks where they're used.
> 
> Ok.
> 
>     > +
>     > +    start = 0;
>     > +    continuous_size = 0;
>     > +    mask_base = 0;
>     > +    end = 0;
>     > +
>     > +    qsort(ip_in_data->ip, ip_in_data->size, sizeof(uint32_t),
> 
>     Please use "sizeof *ip_in_data->ip".
> 
> Ok, thanks.
> 
>     > +          compare_mask_ip);
>     > +    memset(ip_out_data, 0, sizeof(struct ip_out));
>     > +
>     > +    for (i = 0; i < ip_in_data->size; i++) {
>     > +        if (i + 1 >= ip_in_data->size) {
>     > +            if (!recording) {
>     > +                goto end_when_not_recording;
>     > +            } else {
>     > +                goto end_when_recording;
>     > +            }
> 
>     IMO it makes the code very hard to read if we have gotos jumping inside
>     if/else branches or inside loops.  This is also usually a sign that
>     we're trying to avoid code duplication.  A potentially simpler and more
>     readable way to do that is to add functions for the specific operations
>     we don't want to duplicate.
> 
> Ok,  I will use function instead of goto.

Cool.

> 
> Thanks,
> Gongming

Thanks,
Dumitru



More information about the dev mailing list