Tip: Speed up Python dictionary lookups

Posted on . Reading time: 2 mins. Tags: python.

Last week at work, there was a conversation about managing a fairly large key-value data structure using Python. A colleague recommended checking sys.intern to improve performance. I can't add much beyond what's already covered in the documentation:

sys.intern(string)

Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

Interned strings are not immortal; you must keep a reference to the return value of intern() around to benefit from it.

I thought it would be nice to not only share the existence of that, but to provide code examples of how to use it. I concocted these examples, but when I measured the performance of the code using "interned" strings and the code using "regular" strings, the timings were the same. I iterated a few times to see if I was misunderstanding something, but long story short, I gave up and decided to post on Stak Overflow.

The answer to this enigma was humbling. I wasn't aware of Python's optimizations on string handling. Nick does a good job explaining them, and then offering an actual example where you can see how to use interning effectively.

In conclusion, my advice would be this: whenever you come across claims of a performance boost, verify for yourself that it's actually effective in your specific use case!