WordPress, DreamHost, UTF-8 encoding and international characters


If you work on any WordPress blogs/sites that are written in any language that has accented characters and happen to be on DreamHost, you have probably noticed character encoding problems. For instance, the new auto-save feature in WordPress or the Ajax categories add field in the category list, can display improperly encoded characters.

If you are seeing things like Actualités, read on brave shared hoster…

Your database should be set as UTF-8 (the WordPress install doesn’t specify a UTF-8 collation for the tables and fields and DreamHost uses the MySQL default of latin1_swedish_ci) for its character set and connection encoding. WordPress is set to use UTF-8 on the WordPress > Options > Reading page. Unfortunately, there is a problem with DreamHost’s setup. The MySQL client that is running on DreamHost’s servers has been set (or left at ) latin-1 encoding. This means that what you type into WordPress goes from UTF-8 text, to latin-1 text as it goes through the MySQL client and then into a UTF-8 database. This can cause a whole host of problems, especially for the Ajax components of WordPress 2.1 since the Ajax connector uses UTF-8 even if your blog is set to latin-1.

For all old content, you will need to convert it into the correct UTF-8 encoding, but for all new content, you can prevent this problem by forcing the MySQL client to use UTF-8 character encoding. Open the file wp-include/wp-db.php in your WordPress directory. Locate the following line:

$this->dbh = @mysql_connect($dbhost, $dbuser, $dbpassword);

On the next line paste this text:

$this->query("SET NAMES UTF8");

This instructs the MySQL client to use UTF-8 for data going to and from the database. Everything should now be working correctly as your UTF-8 posts go through a UTF-8 MySQL client into a UTF-8 database. It would be nice if DreamHost offered a control panel option to somehow manage the character encoding of new databases and the MySQL client.

Querying the database with “SET NAMES UTF8″ immediately after opening your DB connection is also useful if you are developing your own PHP applications on DreamHost servers or using one-click installed software like phpBB, Gallery, Joomla/Mambo, MediaWiki or activeCollab.

{ 10 comments… read them below or add one }

Leave a Comment

{ 1 trackback }

Previous post:

Next post: