wnd's weblog, random thoughts on software, technology, and stuff

Python (2.5) and me

28 Dec 2009 13:53:17 rant, software

Python was sold to me as the best things since sliced bread. I can appreciate its principles such as “explicit is better than implicit”, “one way to do a thing”, and even “whitespace matters”. However, to me, there is a lot to compensate for the good stuff.

I have always thought that the fastest, the easiest, and the most convenient way to learn a new (programming) language is to start using it. Not little by little, but just using it as much as you can. Now that I have spent about a month writing Python as Python-like as I can, I have to get a few things out of my head. Obviously I’m still total newbie to Python so don’t take my rant too seriously. After all, so far I’ve only been rewriting one of my (unpublished) 5000-line Perl-scripts to Python from scratch. Rewriting things from scratch is another great way to improve your stuff but that’s another story.

Variable declaration not required

Python does not expect you to declare variables. In fact, as far as I can tell, variables cannot be declared in the first place. I don’t like this because I make typos. I make a lot of typos. Well, maybe not that much. In any case, I hate to hunt for typos in my code only because greater power decided that declaring variables is for losers. Perl doesn’t require declaration either, but at least in Perl I can opt in for that. Nobody has been able to tell me how to do that in Python.

found = False
for item in get_stuff():
    if item.name() == the_one:
        founs = True
        break

if found and foobar:
    # and no, the point isn't in "why can't you call do_stuff while still
    # in the loop?"
    do_stuff

No real encapsulation

There is no real encapsulation. Lack of data encapsulation combined with no declaration needed -variables means that typos in assignments such as foo.bar = 3 can do real damage – without a single warning. If I want to prevent that I need to have identifiers to begin an underscore or two. I just love how the code looks with all those underscores around.

foo.py:
class Foo(object):
    s = True
    def a(self):
        pass

    _hidden = True
    def get_hidden(self):
        return self._hidden

bar.py:
from foo import *
x = Foo()
x.a = False
x.d = False
print x.get_hidden()
x.a()

Python properties aren’t the thing either. When I asked around whether I should access class instance variables directly (thanks to lack of encapsulation) or through get_*() and set_*(), I was basically told that properties was the next big thing. At first properties looked like a great idea but then I tried using them. To begin with, using properties means that you practically must use those ugly _underscore variables. Also, using properties only makes your code more vulnerable to nearly undetectable typos.

foo.py:
class Foo(object):
    def __init__(self):
        self._s = 0

    def get_s(self):
        return self._s
    def set_s(self, s):
        self._s = s
    def del_s(self):
        del self._s
    s = property(get_s, set_s, del_s, 'a')

bar.py:
from foo import *
x = Foo()
x.a = 3    # whereas x.get_a() would have at least given an error!

Scope of variables

I just don’t like the way the scope is defined. I haven’t studied the inner workings of Python so my rant is probably all wrong here. It seems that variables inside methods are always bound to that method but not the block they are used in. Even though I would prefer to be able to reuse variables in nested blocks like in C, I can live with this. However this all gets much more interesting when you try to use the same identifier in and outside a method.

a = True

def x():
    if a:
        print "ok"

print a x()
x()

--- vs. ---

# this example does /not/ work

a = True

def x():
    if a:
        a = False

print a
x()

--- vs. ---

a = True

def x():
    global a
    if a:
        a = False

print a
x()

Coarse-grained exceptions

Exception system is too coarse-grained to my taste. This was one of my first issues with Python. I was writing a parser to extract information from not well-formed text file and create objects using that information. In practice I was using regular expressions to get the data and then calling object.set_foo() to save it. Naturally I added exception handlers to deal with errors, but at the end this turned out to be a bad idea.

import re

class Foo(object):
    a = None
    def set_a(self, a):
        self.a = a

data = 'fooxyzzybar'    # in reality this contains entire source file
a = Foo()
try:
    x = re.search(r'foo(.*?)bar', data)
    a.set_s(x.group(1))    # set_S, not set_A
except AttributeError, e:
    print 'oops'

If the regexp doesn’t match given data, search() returns None. If I’m too lazy to check the return value against None, None.group(1) throws an AttributeError. This can then be easily dealt with. The problem is that if I make a typo in object call in try-except -block and call a function that does not exist, that also throws an AttributeError. Sane languages would die at such an atrocity – but not Python. I guess this is just another way Python encourages you to write nice and cleanly structured code. In any case, I think the exception system could be improved.

UTF-8

This complaint is probably based on completely wrong assumptions and such, but dealing with UTF-8 seems to be unnecessarily complicated. I don’t know what there’s going on under the hood but whatever it is, it can caused me a lot of trouble. I want my application to output UTF-8. Most of my source HTML data is encoded in UTF-8, but it also contains HTML entities such as ä and charrefs such as A and ☎. I use HTMLParser to parse the data, and htmlentitydefs to convert the entities. The problem is that htmlentitydefs.entitydefs has the characters in ISO-8859-1(5) and Python doesn’t like mixing two character sets. It also seemed overly complicated to deal with charrefs, but at the end it all worked out, however what I ended up doing wasn’t pretty. From what I’ve read Python 3 should help with these issues.

def handle_entityref(self, name):
    if name == 'nbsp':
        self.handle_data(' ')
    else:
        try:
            self.handle_data(htmlentitydefs.entitydefs[name].
                    decode('ISO-8859-1').encode('UTF-8'))
        except KeyError:
            self.handle_data(name)

def handle_charref(self, name):
    if name[0] == 'x':
        v = int(name[1:], 16)
    else:
        v = int(name)

    if 128 <= v <= 255:
        try:
            self.handle_data(chr(v).decode('ISO-8859-1').encode('UTF-8'))
        except UnicodeDecodeError and UnicodeEncodeError:
            self.handle_data('0x%x' % v)
    elif 0x100 <= v <= 0xffff:
        try:
            self.handle_data(unichr(v).encode('UTF-8'))
        except UnicodeDecodeError and UnicodeEncodeError:
            self.handle_data('0x%x' % v)
    else:
        self.handle_data('0x%x' % v)

Random complaints

No function overloading. Function overloading can be very useful, but as a C programmer I didn’t grow up using it. However with some Java and C++ experience I’ve learned to like it. Somehow I had come up with the idea that object oriented languages generally support overloading, so I expected Python to support that too. Nope.
There are other rants about Python’s super, too.