Don’t use strtoupper with Japanese characters

It’s not like you can’t uppercase characters from Japanese, Chinese, Korean, etc languages, but certainly using strtoupper is not the proper method.

I have passed the last 48 hours trying to find an error that was dying one of my view scripts. No exception has been set or displayed and no error or warning what-so-ever was being displayed. Chasing a bug like this is one of the hardest things you can do.

After a lot of slashhamer debugging (echo mktime(); die;) and help from co-workers and friends I finally could find where the bug was.

The problem was that I was trying to display a Japanese encoded page and simply trying to strtoupper the title was failing and dying the whole script.

After this was found out, then the logical exit was using mb_strtoupper to uppercase it, but it also failed. This time it was fully my fault. Mb_strtoupper uses mb_internal_encoding to get (and set) the internal encoding to be used with that function. If you don’t specify the encoding it will simply get the default therefore failing on the function call creating the origin of my whole ghost bug.

The simple and complete solution for this was setting the encoding to UTF-8 while calling the mb_strtoupper function.

Point is, if you are using a multi-language system and is most likely to be under LATIM1, then whenever using the mb_* functions you must set them to UTF-8, otherwise you will be chasing a ghost bug (and it’s not fun).

Setting the mb_* functions is easy, one of the parameters is always the enconding, so, for example, for the mb_strtoupper function, the function call would be:

echo mb_strtoupper(‘大文字mcloide wordpressのドットドットコム’, ‘utf-8’);

Have fun …

Advertisements

About mcloide

Making things simpler, just check: http://www.mcloide.com View all posts by mcloide

6 responses to “Don’t use strtoupper with Japanese characters

  • Twitted by mcloide

    […] This post was Twitted by mcloide […]

  • Felicia

    No offense, but does Japanese have uppercase characters? Maybe I’m losing it but I didn’t think so. Or is it just b/c it’s mixed with English?

    Sounds like an interesting project! 😉 I wish I could use my Japanese on a work proj.

    • mcloide

      Hi Felicia, that’s the whole problem, I believe they don’t just like the Arabic languages and that’s why you should use the mb_strtoupper function to “uppercase” them.

      The system does support about 21 languages and among them are English, Spanish, Japanese, Chinese and Korean.

      In one of the modules we have set up the title in uppercase and the strtoupper function fails for the Japanese, Chinese and Korean languages (I believe more, but that’s the ones I have tested).

      One way to correct the problem is using the mb_strtoupper function. It will work 100% for any language, but you have to set up the encoding. Using mb_strtoupper with UTF-8 encoding the system displays correctly the title in uppercase for English and Spanish, and displays the title correctly for the Japanese, Chinese and Korean languages.

      Imagine having to create a new conditional statement (if, switch) for every single new language you are supporting? That can transform in major maintenance if not done correctly, besides if you have the correct functions to use, why not use them?

      All this mess is why I have created this post. It took me 48 hours to figure out that it was the strtoupper that was failing in the system. I could correctly see everything working on Latin based languages, but not on other encoding languages.

      • mcloide

        btw … the internet is worldwide why not using your japanese knowledge in your projects, blogs, etc. Kanjis are so cool… besides that’s one of the most tech developed countries in the world… I bet if you write you will get their attention..

        I’m helping http://www.goal.com with the new Mobile platform (m.goal.com) and that’s where I see a lot of Japanese, Korean, Chinese, Farsi, Portuguese, Spanish, English, … (these are the ones I remember)….

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: