Skip to content Skip to sidebar Skip to footer

Unicode Arabic String To User It

i have a variable holding a value like x='مصطفى' and i want to convert it to the form of u'مصطفى' to user it again in some functions .. when i try to do u''+x it alawys

Solution 1:

You have to know what encoding those bytes are in, and them .decode(encoding) them to get a Unicode string. If you received them from some API, utf8 is a good guess. If you read the bytes from a file typed in Windows Notepad, it is more likely some Arabic(?) code page.

PythonWin 2.7.11 (v2.7.11:6d1b6a68f775, Dec  5 2015, 20:32:19) [MSC v.1500 32 bit (Intel)] on win32.
>>> x='مصطفى' # "Just bytes" in whatever encoding my console uses
>>> x         # Looks like UTF-8.
'\xd9\x85\xd8\xb5\xd8\xb7\xd9\x81\xd9\x89'
>>> x.decode('utf8')  # Success
u'\u0645\u0635\u0637\u0641\u0649'
>>> print(x.decode('utf8'))
مصطفى

Solution 2:

thanks I solved it :)

the solution will be to do so

u''.encode('utf-8')+x

Solution 3:

There's two things.

First the meaning of x='مصطفى' is ill-defined, and changes if you save your source file in another encoding. On the other hand x=u'مصطفى'.encode('utf-8') unambiguously means “the bytes you get when you encode that text with UTF-8”.

Second, either use bytes 'abc' or b'abc' or unicode u'abc', but don't mix them. Mixing them in python 2.x produces results which are dependent on where you execute that code. In python 3.x it raises an error (for good reasons).

So given a byte string x, either:

# bytes
'' + x

or:

# unicode, so decode the byte string
u'' + x.decode('utf-8')

Post a Comment for "Unicode Arabic String To User It"