[ovs-dev] [unixctl_py 6/6] python: Port unixctl to Python.

Fri Mar 2 20:55:08 UTC 2012

On Fri, Mar 02, 2012 at 12:44:12PM -0800, Ethan Jackson wrote:
> > I noticed that there's some code that insists on "string" objects in
> > various places, which makes me a little nervous because "string" and
> > "unicode" objects in Python seem almost interchangeable but they are
> > not the same and a test for one will not accept the other.  There's a
> > lot of "if type(json) in [str, unicode]" type code in the OVS python
> > directory for that reason.
> 
> "string" in the unixctl python directory is defined as
> types.StringTypes which is specified as follows:
> 
> "A sequence containing StringType and UnicodeType used to facilitate
> easier checking for any string object. Using this is more portable
> than using a sequence of the two string types constructed elsewhere
> since it only contains UnicodeType if it has been built in the running
> version of Python. For example: isinstance(s, types.StringTypes)."

Oh wow, I didn't notice any of that.  I read "string" as "str".  The
difference is a bit subtle.  It would be a lot less subtle if we wrote
out "types.StringTypes" each place (but maybe I'll get used to the
distinction over time?).

Maybe other code should use this, with whatever spelling we choose.

> > I keep hearing that string concatenation in Python is really slow.
> > How about this (untested):
> 
> Yah it certainly is slower, but I think it's more readable.  This
> feels like premature optimization to me.  I don't feel strongly about
> it though.  If you think it's important I'll change it.

It's probably not important.

> > I don't quite understand the code below, in
> > UnixctlConnection._process_command().  First we check that all of the
> > arguments are "string" objects, then we convert them to strings with
> > "str"?
> 
> So in the first loop we're checking that they are either str or
> unicode, and in the second loop, we convert them to str if they happen
> to be unicode.

OK, I understand now.

> I struggled with whether or not this is the correct
> thing to do as I wrote this.  Since unixctl commands are sort of
> supposed to feel like a command line argc, argv main function, it
> seems more correct to pass in strings than unicodes. But one could
> argue that this isn't correct too I suppose.  Do you feel strongly
> about it?  I'm not really sure what's correct.

Is it safe to do str() on an arbitrary Unicode string without catching
exceptions?  When I type str(u'冯全树') at an interactive Python
prompt on my system (with LC_ALL=en_US.UTF-8), I get:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

but maybe we set stuff up so this doesn't happen inside our code?

Thanks,

Ben.