Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: Bug#451382: i18n is NOT so easy!

X-seq: zsh-workers 24198
From: Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
To: 451382@xxxxxxxxxxxxxxx, zsh-workers@xxxxxxxxxx
Subject: Re: Bug#451382: i18n is NOT so easy!
Date: Sun, 9 Dec 2007 18:01:27 +0000
In-reply-to: <200712071726.lB7HQv76016517@xxxxxxxxxxxxxx>
Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
References: <20071205200825.148710@xxxxxxx> <20071206155436.GA6034@xxxxxxxxxxx> <200712061808.56054.ismail@xxxxxxxxxxxxx> <20071206161022.GA6960@xxxxxxxxxxx> <20071207104413.74da4ef6@news01> <200712071411.lB7EBf2U014439@xxxxxxxxxxxxxx> <20071207171511.GA2937@xxxxxxxxxxx> <200712071726.lB7HQv76016517@xxxxxxxxxxxxxx>

On Fri, 07 Dec 2007 17:26:57 +0000
Peter Stephenson <pws@xxxxxxx> wrote:
> Clint Adams wrote:
> > On Fri, Dec 07, 2007 at 02:11:41PM +0000, Peter Stephenson wrote:
> > > Found it: see thread around
> > > 
> > > http://www.zsh.org/mla/workers/2006/msg00753.html
> > 
> > I think it would be easier to do something like bash's $"" interface to
> > gettext and co-opt that for completion translations.
> 
> As far as I understood it (it doesn't seem to be well documented) that
> only does translations which are pre-compiled into the shell (or rather
> its libraries).  We need something which can be updated with completion
> functions.  It's OK if the definitions are in another file (though we
> could presumably have an interface which adds translations from the
> completion function itself) but it needs to be added at run time.
> 
> Possibly we can still do this with $"...", but I don't like the idea
> that if you change the original message you can no longer find the
> translation, which seems to me to be asking for trouble.

Further thoughts after groping through the gettext documentation for a
bit... this is not a definitive answer (though rather closer to one than
when I originally wrote that two hours and counting ago) but unless I post
it now I'll forget it.  A summary is that I believe we can use the
internationalization functions in the library behind gettext(), to avoid
reinventing the wheel and maintain some compatibility, but it'll take a
bit more care to get this right than simply $"<msgid>" plus
gettext("<msgid>").

I think we have two basic problems with the simplest $"..." / gettext()
interface.

1.

The problem in the last paragraph quoted.  I'm convinced this is a real
problem:  unlike with C programmes, the urge to tinker with strings in
shell functions is strong and if there's no visual cue that this has bad
side effects then the interface is, in my view, fundamentally broken.
To put it another way, only programmers tinker with C programmes while
users are actively encouraged to tinker with shell functions, so the
whole nature of the interface needs to be rethought to make it clear and
robust rather than minimal.

However, this isn't insuperable.  The "msgid" is only by convention the
original string and could be anything; it was designed to be simple in
the case of having many calls to gettext() throughout a programme.  As
we essentially have only one point of entry for translations in shell
functions (the shell's C code is a separate and much simpler problem
since this isn't fundamentally different from any other C programme), we
can do it how we like.  We can, for example, have translation strings
like:

$"_mount_nfs_access_acregmin:specify cached file attributes minimum hold time"

and have the following rule:

- If the string is in the form
    <identifer_character> * ":" . *
  (we might need to make this more complicated eventually), first
  attempt look-up with the identifier characters.  If the lookup doesn't
  return the original string, this is the text we want.
- Otherwise look up with the whole string.  This is for compatibility.  Use
  of this in zsh functions would be deprecated.
- If it still returns the original string but there is an identifier
  part, return the string after the ":".
- Maybe we want some rule about aliasing, it's not clear (we can leave
  it until a use becomes obvious). 

This scheme has various merits:  (i) it is robust about changes to the
English text (ii) the explicit msgid serves as a visual cue that
there's something here that shouldn't be monkeyed with without good
reason (and that even if you change the English text it should mean the
same thing) (iii) the msgid in the catalogues is compact.

2.

Unfortunately there's also the problem of finding message catalogues.
For the same reason that it's designed for simplicity with pre-compiled
programmes, gettext() itself appears to require them to be in a
particular hierarchy the top of which is determined at compile time.

This isn't good enough in our case.  We have functions that are
installed at different places in the function path.  The path can change
and the only clean way of finding message catalogues is using the same
path.  We *could* collect all translations at shell installation and
simply shrug our shoulders saying "that's your lot", but in my view this
is too botched to consider.  (As far as I can tell this is what happens
in bash.)  It's a key part of the way the completion system works that
people can customize it themselves just by writing functions, and even
if adding translations to your own functions is unusual I still don't
think being limited to a predefined set is acceptable.  I don't mind
users (which includes administrators) having to run some utility to add,
or add to, a message catalogue, but I do mind them having to modify the
shell configuration and reinstall; even updating the shell libraries
with something like one of Clint's out-of-tree modules seems a bit over
the top.

However, it seems like we can get something better by interfacing to the
library at a lower level, in particular to catopen() (strictly this is a
different family of interfaces).  That accepts an absolute path to a
catalogue and also uses the environment variable NLSPATH to search for
files.  It's currently unclear to me how to mix use of a shell-specified
directory (determined, in ways we'll need to discuss, from $fpath) with
a user-specified language (since I presume the library has an
intelligent system of fallbacks we don't want to have to imitate).
Unfortunately it looks like this absolute paths aren't portable, either.
If the worst comes to the worst, we may need to alter the environment
variable directly: for example, temporarily either appending or
prepending the zsh directories to it.  (I don't think requiring the user
to modify NLSPATH as well as $fpath is a good idea; I think the shell
should "just find" the right catalogues associated with functions, as
with .zwc files.)

Comments on this are obviously welcome.

To proceed I think we need the following.  The second and third parts
should wait until after 4.3.5 (which I'll make before Christmas, despite
the open bugs, since I haven't seen anything which is obviously worse
than in 4.3.4).  They should also wait until after the first part is
resonably clear.

I. Design:
- finalize the rule for $"..." (or equivalent)
- invent rules for finding the catalogue which should probably be
flexible, ideally allowing both per-fpath-directory and
per-autoloadable-function files while still allowing the user to have
all their own translations collected in one place.  For the last case it
would probably be OK to fall back on NLSPATH.  (I'm not implying
people will use all the mechanisms, just that at this stage we should
plan on flexibility.)
- decide if we want strings in the source to use a similar scheme
or (perhaps better) just normal gettext() rules.

II. Shell source:
- add parsing for $"..."
- add config support for locating libraries for language catalogues and
(where necessary) determining their abilities
- also (a separate job) we should prepare the C code for use of
gettext() --- as I said, this is conceptually simpler but still a lot
of work.  Someone needs to look at gettextize:  this is really part of
the previous point except that we won't want to rely just on the GNU
version; a quick look suggests it assumes a bit to much of a standard
GNU interface in some areas, but I haven't gone into any detail.
- add some trial mechanism behind $"..." using catopen() / catgets() /
catclose().  This is where we're going to need the most fiddling to get
the interface right.

III. Shell functions etc.:
- add a few trial translation files for the completion system and
possibly other files to test the water
- ditto translations for strings in the shell's source code
- write a whole set of utilities that
  - create bare catalogues
  - update catalogues with untranslated strings
  - check for uniqueness of the zsh msgid (needs some subtlety since
  obviously reuse is a good thing:  presumably we need to check that
  the English text after the colon is the same in both cases)
  - install catalogues
  - manipulate (e.g. agglomerate) catalogues
  - list or query what translations are available
  - check catalogues for redundant translations
This is probably the biggest chunk of work.  It would be OK at least
initially to rely on the gettext utilities where possible, but I suspect
that in many areas we're on our own:  it looks like this hasn't been
done before in a way that takes into account end user requirements
adequately (obviously I'd be interested in hearing otherwise).

-- 
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/

Follow-Ups:
- Re: Bug#451382: i18n is NOT so easy!
  - From: Bart Schaefer

References:
- Re: Bug#451382: i18n is NOT so easy!
  - From: Clint Adams
- Re: Bug#451382: i18n is NOT so easy!
  - From: Ismail Dönmez
- Re: Bug#451382: i18n is NOT so easy!
  - From: Clint Adams
- Re: Bug#451382: i18n is NOT so easy!
  - From: Peter Stephenson
- Re: Bug#451382: i18n is NOT so easy!
  - From: Peter Stephenson
- Re: Bug#451382: i18n is NOT so easy!
  - From: Clint Adams
- Re: Bug#451382: i18n is NOT so easy!
  - From: Peter Stephenson

Messages sorted by: Reverse Date, Date, Thread, Author