SQLAlchemy: Dealing With CP-1252 Data When Python Is Expecting It To Be UTF-8
I am working with an existing SQLite database and experiencing errors due to the data being encoded in CP-1252, when Python is expecting it to be UTF-8. >>> import sqlite3
Solution 1:
SQLAlchemy and SQLite are behaving normally. The solution is to fix the non-UTF-8 data in the database.
I wrote the below, drawing inspiration from https://stackoverflow.com/a/2395414/1191425 . It:
- loads up the target SQLite database
- lists all columns in all tables
- if the column is a
text
,char
, orclob
type - including variants likevarchar
andlongtext
- it re-encodes the data from theINPUT_ENCODING
to UTF-8.
INPUT_ENCODING = 'cp1252' # The encoding you want to convert from
import sqlite3
db = sqlite3.connect('dnd_fixed.sqlite')
db.create_function('FIXENCODING', 1, lambda s: str(s).decode(INPUT_ENCODING))
cur = db.cursor()
tables = cur.execute('SELECT name FROM sqlite_master WHERE type="table"').fetchall()
tables = [t[0] for t in tables]
for table in tables:
columns = cur.execute('PRAGMA table_info(%s)' % table ).fetchall() # Note: pragma arguments can't be parameterized.
for column_id, column_name, column_type, nullable, default_value, primary_key in columns:
if ('char' in column_type) or ('text' in column_type) or ('clob' in column_type):
# Table names and column names can't be parameterized either.
db.execute('UPDATE "{0}" SET "{1}" = FIXENCODING(CAST("{1}" AS BLOB))'.format(table, column_name))
After this script runs, all *text*
, *char*
, and *clob*
fields are in UTF-8 and no more Unicode decoding errors will occur. I can now Faerûn
to my heart's content.
Solution 2:
If you have a connection URI then you can add the following options to your DB connection URI:
DB_CONNECTION = mysql+pymysql://{username}:{password}@{host}/{db_name}?{options}
DB_OPTIONS = {
"charset": "cp-1252",
"use_unicode": 1,
}
connection_uri = DB_CONNECTION.format(
username=???,
...,
options=urllib.urlencode(DB_OPTIONS)
)
Assuming your SQLLite driver can handle those options (pymysql can, but I don't know 100% about sqllite), then your queries will return unicode strings.
Post a Comment for "SQLAlchemy: Dealing With CP-1252 Data When Python Is Expecting It To Be UTF-8"