Outputting Unicode Text To An RTF File In Python
Solution 1:
Based on the information in your latest edit, I think this function will work properly. Except see the improved version below.
def rtf_encode(unistr):
return ''.join([c if ord(c) < 128 else u'\\u' + unicode(ord(c)) + u'?' for c in unistr])
>>> test_unicode = u'\xa92012'
>>> print test_unicode
©2012
>>> test_utf8 = test_unicode.encode('utf-8')
>>> print test_utf8
©2012
>>> print rtf_encode(test_utf8.decode('utf-8'))
\u169?2012
Here's another version that's broken down a little to be easier to understand. I also made it consistent in returning an ASCII string rather than keeping Unicode and flubbing it at the join
. It also incorporates a fix based on the comments.
def rtf_encode_char(unichar):
code = ord(unichar)
if code < 128:
return str(unichar)
return '\\u' + str(code if code <= 32767 else code-65536) + '?'
def rtf_encode(unistr):
return ''.join(rtf_encode_char(c) for c in unistr)
Solution 2:
Mark Ransom's answer isn't quite correct as it'll not encode codepoints over U+7fff correctly, nor will it escape characters below 0x20 as recommended by the RTF standard.
I've created a simple module that encodes python unicode to RTF control codes called rtfunicode
, and wrote about the subject on my blog.
In summary, my method uses a regular expression to map the right codepoints to RTF control codes suitable for inclusion in either PyRTF or pyrtf-ng.
Post a Comment for "Outputting Unicode Text To An RTF File In Python"