Background
I'm in a real mess with unicode and Python. It seems to be a common angst and I've tried using other solutions out there but I just can't get my head around it.
Setup
MySQL Database Setup
- collation_database: utf8_general_ci
- character_set_database: utf8
SQLAlchemy Model
class Product(Base):
id = Column('product_id', Integer, primary_key=True)
name = Column('product_name', String(64)) #Tried using Unicode() but didn't help
Pyramid View
@view_config(renderer='json', route_name='products_search')
def products_search(request):
json_products = []
term = "%%%s%%" % request.params['term']
products = dbsession.query(Product).filter(Product.name.like(term)).all()
for prod in products:
json_prod = {'id': prod.id, 'label': prod.name, 'value': prod.name, 'sku': prod.sku, 'price': str(prod.price[0].price)}
json_products.append(json_prod)
return json_products
Problem
I get encoding errors reported from the json module (which is called as its the renderer for this route) like so:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 37: invalid start byte
The culprit is a "-" (dash symbol) in the prod.name value. Full stack trace here. If the returned products don't have a "-" in then it all works fine!
Tried
I've tried encoding, decoding with various types before returning the json_products variable.
–
, not-
) is getting encoded in cp1252 (which gives byte 0x96). JSON always deals with unicode, so it tries to decode it using UTF-8, and fails. So somewhere you will need a.decode("cp1252")
.