UnicodeEncodeError and UnicodeDecodeError
UnicodeEncodeError
and UnicodeDecodeError
belong to the UnicodeError
class (which itself is a subclass of ValueError
). Unicode errors are an annoyance in Python 2.X that has largely been remedied by Python 3.X. We will encounter these errors either when encoding or decoding with different codecs. For example, if we parse some files that contain Unicode characters, we may need to encode that data to ASCII in order to process it properly with Python. Specifically, when encoding the data to ASCII, we can decide how we want to handle unsupported Unicode characters. We can perform a number of actions including ignoring and replacing unsupported characters as follows:
>>> unicode_str = u'\xe1\x93\x88my unicode string' >>> unicode_str.encode('ascii') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u...