Skip to content Skip to sidebar Skip to footer

Outputting Unicode Text To An RTF File In Python

I am trying to output unicode text to an RTF file from a python script. For background, Wikipedia says For a Unicode escape the control word \u is used, followed by a 16-bit sig

Solution 1:

Based on the information in your latest edit, I think this function will work properly. Except see the improved version below.

def rtf_encode(unistr):
    return ''.join([c if ord(c) < 128 else u'\\u' + unicode(ord(c)) + u'?' for c in unistr])

>>> test_unicode = u'\xa92012'
>>> print test_unicode
©2012
>>> test_utf8 = test_unicode.encode('utf-8')
>>> print test_utf8
©2012
>>> print rtf_encode(test_utf8.decode('utf-8'))
\u169?2012

Here's another version that's broken down a little to be easier to understand. I also made it consistent in returning an ASCII string rather than keeping Unicode and flubbing it at the join. It also incorporates a fix based on the comments.

def rtf_encode_char(unichar):
    code = ord(unichar)
    if code < 128:
        return str(unichar)
    return '\\u' + str(code if code <= 32767 else code-65536) + '?'

def rtf_encode(unistr):
    return ''.join(rtf_encode_char(c) for c in unistr)

Solution 2:

Mark Ransom's answer isn't quite correct as it'll not encode codepoints over U+7fff correctly, nor will it escape characters below 0x20 as recommended by the RTF standard.

I've created a simple module that encodes python unicode to RTF control codes called rtfunicode, and wrote about the subject on my blog.

In summary, my method uses a regular expression to map the right codepoints to RTF control codes suitable for inclusion in either PyRTF or pyrtf-ng.


Post a Comment for "Outputting Unicode Text To An RTF File In Python"