Outputting Unicode Text To An RTF File In Python

April 02, 2023 Post a Comment

I am trying to output unicode text to an RTF file from a python script. For background, Wikipedia says For a Unicode escape the control word \u is used, followed by a 16-bit sig

Solution 1:

Based on the information in your latest edit, I think this function will work properly. Except see the improved version below.

def rtf_encode(unistr):
    return ''.join([c if ord(c) < 128 else u'\\u' + unicode(ord(c)) + u'?' for c in unistr])

>>> test_unicode = u'\xa92012'
>>> print test_unicode
©2012
>>> test_utf8 = test_unicode.encode('utf-8')
>>> print test_utf8
©2012
>>> print rtf_encode(test_utf8.decode('utf-8'))
\u169?2012

Here's another version that's broken down a little to be easier to understand. I also made it consistent in returning an ASCII string rather than keeping Unicode and flubbing it at the join. It also incorporates a fix based on the comments.

def rtf_encode_char(unichar):
    code = ord(unichar)
    if code < 128:
        return str(unichar)
    return '\\u' + str(code if code <= 32767 else code-65536) + '?'

def rtf_encode(unistr):
    return ''.join(rtf_encode_char(c) for c in unistr)

Solution 2:

Mark Ransom's answer isn't quite correct as it'll not encode codepoints over U+7fff correctly, nor will it escape characters below 0x20 as recommended by the RTF standard.

I've created a simple module that encodes python unicode to RTF control codes called rtfunicode, and wrote about the subject on my blog.

Baca Juga

In summary, my method uses a regular expression to map the right codepoints to RTF control codes suitable for inclusion in either PyRTF or pyrtf-ng.

Python Manual

Outputting Unicode Text To An RTF File In Python

Solution 1:

Solution 2:

Post a Comment for "Outputting Unicode Text To An RTF File In Python"