Discussion:
convert string from Unicode to ISO-2022-JP
(too old to reply)
w***@gmail.com
2006-03-23 13:41:35 UTC
Permalink
Hi,

I've got a string which contains Japanese characters and is stored in
the form of UTF-8.

I would like to know if there would be any character loss when
converting the string from Unicode to ISO-2022-JP. If other words, is
ISO-2022-JP a subset of Unicode?

Someone told me that some common Japanese characters would become ?
when converted to ISO-2022-JP (I'm using Perl). Is that true??

Thanks in advance,
Wing
Jim Kingdon
2006-03-24 05:18:28 UTC
Permalink
Post by w***@gmail.com
Someone told me that some common Japanese characters would become ?
when converted to ISO-2022-JP (I'm using Perl). Is that true??
Don't happen to know abut perl in particular. But if ISO-2022-JP is
just what is defined by http://www.ietf.org/rfc/rfc1468.txt then it is
missing the characters from JIS X 0212. On the other hand, there is
http://www.ietf.org/rfc/rfc1554.txt which includes the 0212 characters.

One good introduction to all this is
http://bibliofile.mc.duke.edu/gww/fonts/Charsets.html (it is slightly
dated but not in terms of the basics).

As for what characters are in 0212, my impression is that they are
mainly used in things like people's names and place names (although
I'm far from a Japanese expert). So they are "common" for some
definition of common.

In general, you will be happier with Unicode than with choices like
ISO-2022-JP. The latter operates by having escape sequences to switch
from one character set to another, and is probably less widely
supported (at least in newer software). But of course there may be
cases where you want/need to use it...

Loading...