[ovs-dev] [iovisor-dev] [RFC PATCH 00/11] OVS eBPF datapath.

William Tu u9012063 at gmail.com
Thu Jun 28 14:19:35 UTC 2018


Hi Alexei,

Thanks a lot for the feedback!

On Wed, Jun 27, 2018 at 8:00 PM, Alexei Starovoitov
<alexei.starovoitov at gmail.com> wrote:
> On Sat, Jun 23, 2018 at 05:16:32AM -0700, William Tu wrote:
>>
>> Discussion
>> ==========
>> We are still actively working on finishing the feature, currently
>> the basic forwarding and tunnel feature work, but still under
>> heavy debugging and development.  The purpose of this RFC is to
>> get some early feedbacks and direction for finishing the complete
>> features in existing kernel's OVS datapath (the net/openvswitch/*).
>
> Thank you for sharing the patches.
>
>> Three major issues we are worried:
>>   a. Megaflow support in BPF.
>>   b. Connection Tracking support in BPF.
>
> my opinion on the above two didn't change.
> To recap:
> A. Non scalable megaflow map is no go. I'd like to see packet classification
> algorithm like hicuts or efficuts to be implemented instead, since it can be
> shared by generic bpf, bpftiler, ovs and likely others.

We did try the decision tree approach using dpdk's acl lib. The lookup
speed is 6 times faster than the magaflow using tuple space.
However, the update/insertion requires rebuilding/re-balancing the decision
tree so it's way too slow. I think hicuts or efficuts suffers the same issue.
So decision tree algos are scalable only for lookup operation due to its
optimization over tree depth, but not scalable under
update/insert/delete operations.

On customer's system we see megaflow update/insert rate around 10 rules/sec,
this makes decision tree unusable, unless we invent something to optimize the
update/insert time or incremental update of these decision tree algo.
Now my backup plan is to implement megaflow in BPF.

> B. instead of helpers to interface with conntrack the way ovs did, I prefer
> a generic conntrack mechanism that can be used out of xdp too
>

OK. We will work on this direction.

>>   c. Verifier limitation.
>
> Not sure what limitations you're concerned about.
>

Mostly related to stack.  The flow key OVS uses (struct sw_flow_key)
is 464 byte. We trim a lot, now around 300 byte, but still huge, considering
the BPF's stack limit is 512 byte.

We can always break the large program then tail call, but sometimes
the register spills on the stack, and when restore, the states is gone
and verifier fails.  This is more difficult for us to work around.

Below is an example:
----
at 203: r7 is a const and store on stack (r10 - 248)
at 250: r2 reads (r10 - 248) back.
at 251: fails the verifier

from 27 to 201: R0=map_value(id=0,off=0,ks=4,vs=4352,imm=0)
R7=inv(id=0,umax_value=31,var_off=(0x0; 0x1f))
R9=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
201: (7b) *(u64 *)(r10 -256) = r0
202: (27) r7 *= 136
203: (7b) *(u64 *)(r10 -248) = r7
204: (bf) r6 = r0
205: (0f) r6 += r7
206: (b7) r8 = 2
207: (15) if r6 == 0x0 goto pc+93
 R0=map_value(id=0,off=0,ks=4,vs=4352,imm=0)
R6=map_value(id=0,off=0,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R7=inv(id=0,umax_value=4216,var_off=(0x0; 0x1ff8)) R8=inv2
R9=ctx(id=0,off=0,imm=0) R10=fp0,call_-1 fp-256=map_value
208: (b7) r1 = 681061
209: (63) *(u32 *)(r10 -200) = r1
210: (18) r1 = 0x6b73616d20746573
212: (7b) *(u64 *)(r10 -208) = r1
213: (bf) r1 = r10
214: (07) r1 += -208
215: (b7) r2 = 12
216: (85) call bpf_trace_printk#6
217: (bf) r7 = r6
218: (07) r7 += 8
219: (61) r1 = *(u32 *)(r6 +8)
 R0=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R7_w=map_value(id=0,off=8,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R8=inv2 R9=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
fp-256=map_value
220: (15) if r1 == 0x7 goto pc+82
 R0=inv(id=0) R1=inv(id=0,umax_value=4294967295,var_off=(0x0;
0xffffffff)) R6=map_value(id=0,off=0,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R7=map_value(id=0,off=8,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R8=inv2 R9=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
fp-256=map_value
221: (55) if r1 != 0x4 goto pc+228
 R0=inv(id=0) R1=inv4
R6=map_value(id=0,off=0,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R7=map_value(id=0,off=8,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R8=inv2 R9=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
fp-256=map_value
222: (61) r1 = *(u32 *)(r9 +80)
223: (7b) *(u64 *)(r10 -264) = r1
224: (61) r6 = *(u32 *)(r9 +76)
225: (b7) r1 = 0
226: (73) *(u8 *)(r10 -198) = r1
227: (b7) r1 = 2674
228: (6b) *(u16 *)(r10 -200) = r1
229: (18) r1 = 0x6568746520746573
231: (7b) *(u64 *)(r10 -208) = r1
232: (bf) r1 = r10
233: (07) r1 += -208
234: (b7) r2 = 11
235: (85) call bpf_trace_printk#6
236: (bf) r1 = r6
237: (07) r1 += 14
238: (79) r2 = *(u64 *)(r10 -264)
239: (2d) if r1 > r2 goto pc+61
 R0=inv(id=0) R1=pkt(id=0,off=14,r=14,imm=0)
R2=pkt_end(id=0,off=0,imm=0) R6=pkt(id=0,off=0,r=14,imm=0)
R7=map_value(id=0,off=8,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R8=inv2 R9=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
fp-256=map_value fp-264=pkt_end
240: (71) r1 = *(u8 *)(r7 +10)
 R0=inv(id=0) R1_w=pkt(id=0,off=14,r=14,imm=0)
R2=pkt_end(id=0,off=0,imm=0) R6=pkt(id=0,off=0,r=14,imm=0)
R7=map_value(id=0,off=8,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R8=inv2 R9=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
fp-256=map_value fp-264=pkt_end
241: (73) *(u8 *)(r6 +0) = r1
242: (71) r1 = *(u8 *)(r7 +11)
 R0=inv(id=0) R1_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))
R2=pkt_end(id=0,off=0,imm=0) R6=pkt(id=0,off=0,r=14,imm=0)
R7=map_value(id=0,off=8,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R8=inv2 R9=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
fp-256=map_value fp-264=pkt_end
243: (73) *(u8 *)(r6 +1) = r1
244: (71) r1 = *(u8 *)(r7 +12)
 R0=inv(id=0) R1_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))
R2=pkt_end(id=0,off=0,imm=0) R6=pkt(id=0,off=0,r=14,imm=0)
R7=map_value(id=0,off=8,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R8=inv2 R9=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
fp-256=map_value fp-264=pkt_end
245: (73) *(u8 *)(r6 +2) = r1
246: (71) r1 = *(u8 *)(r7 +13)
 R0=inv(id=0) R1_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff))
R2=pkt_end(id=0,off=0,imm=0) R6=pkt(id=0,off=0,r=14,imm=0)
R7=map_value(id=0,off=8,ks=4,vs=4352,umax_value=4216,var_off=(0x0;
0x1ff8)) R8=inv2 R9=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
fp-256=map_value fp-264=pkt_end
247: (73) *(u8 *)(r6 +3) = r1
248: (79) r4 = *(u64 *)(r10 -256)
249: (bf) r1 = r4
250: (79) r2 = *(u64 *)(r10 -248)
251: (0f) r1 += r2
math between map_value pointer and register with unbounded min value
is not allowed


More information about the dev mailing list