Blog
0

Python: Working in Unicode

While working on unicode based characters in python, you’ll often come across this type error message (this cost me a while to fix).


UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

This will happen if you do not set your character encoding in your python file to UTF-8. First you need to make sure the first few lines of your programm looks like this.


#!/usr/bin/python
# coding=utf-8
# -*- encoding: utf-8 -*-

This enables you to write unicode characters in your source code. But this does not enable you to print them in console and you’ll still keep getting the same error I previously mentioned.

To solve this you need to add the following code in your /usr/lib/python2.5/sitecustomize.py file (This might change depending your installed python version)


import sys;
sys.setdefaultencoding('utf-8')

The interesting part is, if you try to do that in your source file, it won’t work, I kept getting this error

AttributeError: 'module' object has no attribute 'setdefaultencoding'

I’m not sure why python does this, but a reasonable explanation would be not to change the character encoding in runtime, that could pose vulnerability to the system.

utf-8 isn’t default in python, maybe they’ll fix this in python 3.0

Post comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>