Hash to encode iso-8859-1 (western) to html entities

check more here http://www.elizabethcastro.com/html/extras/entities.html

Database migration from cp1251 -> utf8

It was time to migrate http://www.cenite.com database from cp1251 -> utf8, I have prepared one nice script which will show you step by step the commands that you should run in order to migrate your database.

Copy the content to cp1251toUTF8.sh or use the commands manually

The script is making echo instead of running the commands because this will give you a chance to fix an error if occurs.

As a bonus here is a command with which you can convert an all your html pages to utf8 also

find . -name '*.html' -exec recode -v -f windows-1251..UTF-8 \{\} +

This would recursively find all htmls in the current directory.

How to detect character sets

http://linux.die.net/man/1/enca

Migrating Latin1 -> UTF8

We have the mistake to enter all the data in the database (utf8) without setting the right connection encoding (set names utf8). In this case our content is stored as latin1 characters in the utf8 database.

Here is the magic that fixes the encoding found by my bright colleague bl8cki

alter table articles convert to character set latin1;
alter table articles change content content blob;
alter table articles change title title blob;
alter table articles change author author blob;
alter table articles change content content text character set utf8;
alter table articles change title title text character set utf8;
alter table articles change author author text character set utf8;

here is an example

As the example shows it works with cyrilic (pasted in utf8) !