Re: UTF-8 fonts

On 25 Sep, Peter Stephenson wrote:
> Borzenkov Andrey wrote:
> > Just to make it clear. Is the aim to use UTF-8 internally or to support
> > (arbitrary) multibyte encoding?
> The first with as much of the second as we can get in without too much

So is your aim to use UTF-8 internally in all cases or only when it is
the selected character set? I would have thought it would be easier to
just use whatever LC_CTYPE (the locale's selected encoding) is
internally and use the mb* functions so things work regardless of
whether or not LC_CTYPE is a multi-byte character encoding. I don't
know much about other multi-byte character encodings that can be used
for the input/output locale but I had gathered they at least have the
level of compatibility with basic ASCII that allows you to use ASCII
characters in string literals. To convert everything to UTF-8
internally, you would have to either use iconv or do messy stuff: the
mb* functions deal with whatever LC_CTYPE is and not UTF-8 (unless
that's what LC_CTYPE happens to be of course).

> We are going to assume that bytes without the top-bit set are ASCII, and
> the remainder require mb* handling.

Isn't it easier to just do mb* handling on everything and not go around
checking the top bit. The mb*() functions should do that sort of stuff
for us. mbrtowc() can be used, discarding the returned wchar_t to, for
example consume one character of a string. So it worries about whatever
the top bit of the bytes are or whatever the underlying multi-byte
character encoding requires.

> > Impossible. Local names are just arbitrary chosen strings; there is no
> > "character set code" defined in any locale definition, at least on Unix.

as has been mentioned: nl_langinfo(CODESET)

> Read the document at the link I gave which suggests otherwise.  However,
> I now think we can in any case leave this to the mb* suite to decide.

Yes, I think we can.

I'm sure you can all use google, but other possibly useful links I had
in my bookmarks are these:

  IBM's patches to various GNU stuff:
  IBM article that serves as a basic intro:


