Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: UTF-8 support

X-seq: zsh-workers 20439
From: Oliver Kiddle <okiddle@xxxxxxxxxxx>
To: David Gómez <david@xxxxxxxxxxxx>
Subject: Re: UTF-8 support
Date: Fri, 01 Oct 2004 21:46:05 +0200
Cc: Zsh-workers <zsh-workers@xxxxxxxxxx>
In-reply-to: <20041001184122.GA9094@fargo>
Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
References: <20041001184122.GA9094@fargo>

--------
David =?iso-8859-15?Q?G=F3mez?= wrote:
> So i conclude from your response that nobody is working on it ;).
> I understand the time problem, everybody is short on time, including

Nothing has been done. A few people may have done some work that was
never posted. I got as far reading up, thinking about what the right
approach would be and adding support for stuff like the following to
print characters given their unicode code point:
  echo '\u20ac'
It seemed a good point to start because it'll be useful for testing.
Unfortunately, I'm very short on time for the rest of this year.

> But i need help to know where to start. What parts of zsh would need 
> to be worked on, only zle? Is there already, some kind of, although

Most parts of the source will need work but it is possible to add
support in individual areas. So don't start with completion, find
something simple like the print builtin (in particular -c and -C
options). Builtins in general are simple because they are relatively
self-contained. If you try to attack zle first, you'll just get fed up
with it being too hard. Once you've got something simple like print
done, another idea for something simple would be to add a Test/U01 test
and add code to make it search for a UTF-8 locale ($langinfo[CODESET] in
the langinfo module will help) and use it for LC_CTYPE.

> minimal, support for utf-8? Also, if you know from some documentation
> about zsh internals, besides from source ;), please point me to it.

The source and comments are the only documentation I know of but you can
always ask on the list. Do you know much about unicode/UTF-8? For the
minimum, read http://www.joelonsoftware.com/articles/Unicode.html
and then read http://www.cl.cam.ac.uk/~mgk25/unicode.html

In my opinion it would be sensible to support multibyte encodings in
general and not just UTF-8. Doing this isn't much effort beyond handling
UTF-8 if we assume basic ASCII compatibility and don't worry about
stateful encodings. There are a few characters which are defined to
display as double width even in proportional fonts so keep that in mind.
You can detect whether UTF-8 is enabled with the C library's locale
functions but we shouldn't need to: functions such as mbrlen do all the
work for us.

Once we've got a few basic areas working, we might want to think about
whether there are any common constructs we should create general
functions for in utils.c.

Oliver

Follow-Ups:
- Re: UTF-8 support
  - From: Peter Stephenson
- Re: UTF-8 support
  - From: David Gómez

References:
- Re: UTF-8 support
  - From: David Gómez

Messages sorted by: Reverse Date, Date, Thread, Author