Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: PATCH: autoconf test for multibyte support



Peter Stephenson <pws@xxxxxxx> wrote:
> If this works I will need to change some of the installation
> documentation.

This changes some documentation.

I'm only guessing it works on Cygwin, all I know is it compiles with the
same code that works everywhere else.

Index: INSTALL
===================================================================
RCS file: /cvsroot/zsh/zsh/INSTALL,v
retrieving revision 1.25
diff -u -r1.25 INSTALL
--- INSTALL	16 Feb 2006 14:28:54 -0000	1.25
+++ INSTALL	4 Aug 2006 14:44:06 -0000
@@ -264,37 +264,32 @@
 ---------------------------
 
 Support for multibyte character sets that extend ASCII, such as UTF-8, is
-under development but the code in the line editor is sufficiently stable to
-be turned on by default in environments that provide full ISO 10646 support
-including the preprocessor definition __STDC_ISO_10646__.  In principle
-this definition does not guarantee the full environment, but in practice
-systems with this defined also provide suitable library support.  The shell
-does not probe for all the features, so on other systems use of multibyte
-support must be explicitly enabled when it is available.
+now reasonably close to complete, except that combining characters are not
+handled properly (some assistance with this problem would be appreciated).
+The configuration script should turn on multibyte support on all systems
+where it can be compiled successfully.
 
 The support can be explicitly enabled or disable with --enable-multibyte or
---disable-multibyte.  Reports of systems where multibyte support was not
-enabled by default but --enable-multibyte resulted in a usable shell would
-be appreciated.  The developers are not aware of any need to use
+--disable-multibyte.  The developers are not aware of any need to use
 --disable-multibyte and this should be reported as a bug.  Currently
-multibyte mode is believed to work automatically on:
+multibyte mode is believed to work on at least the following:
 
   - All(?) current GNU/Linux distributions
-
-and to work when configured with --enable-multibyte on:
-
   - OS X 10.4.3 (problems have been reported with multibyte characters
     in HFS file names)
   - NetBSD 2.0.2
   - Solaris 8+ (inputting multibyte characters from the keyboard doesn't
     work in some installations).
+  - Cygwin (though use of multibyte characters is somewhat non-standard).
 
-The main shell is not yet aware of multibyte characters, so for example the
-length of a scalar parameter will return the number of bytes, not
-characters, and pattern tests likewise treat single bytes as if they were
-characters.  This means that pattern tests such as ? and [[:alpha:]] do not
-work correctly with characters in multibyte character sets beyond the ASCII
-subset.
+The corresponding shell option MULTIBYTE is now on by default in all
+emulation modes when multibyte support is enabled.  Turning it off is not
+recommended unless there is a particular need to examine single bytes
+regardless of the locale.  As the line editor bases its behaviour on the
+locale regardless of the option (in order to correspond to the displayed
+character set), the option should be left on during the execution of
+user-defined editor and completion widgets so that the behaviour
+corresponds to that of builtin widgets.
 
 See chapter 5 in the FAQ for some notes on multibyte input.
 
Index: MACHINES
===================================================================
RCS file: /cvsroot/zsh/zsh/MACHINES,v
retrieving revision 1.3
diff -u -r1.3 MACHINES
--- MACHINES	21 Mar 2006 19:19:07 -0000	1.3
+++ MACHINES	4 Aug 2006 14:44:07 -0000
@@ -180,9 +180,7 @@
 SGI: IRIX 6.5
 	Should build `out-of-the-box'; however, if using the native
 	compiler, "cc" rather than "c99" is recommended.  Compilation
-	with gcc is also reported to work.  Multibyte is supported,
-	for example:
-           CC=cc ./configure --enable-multibyte
+	with gcc is also reported to work.  Multibyte is supported.
 
 	On 6.5.2, zsh malloc routines are reported not to work; also
 	full optimization (cc -O3 -OPT:Olimit=0) causes problems.
Index: NEWS
===================================================================
RCS file: /cvsroot/zsh/zsh/NEWS,v
retrieving revision 1.10
diff -u -r1.10 NEWS
--- NEWS	28 Feb 2006 12:20:43 -0000	1.10
+++ NEWS	4 Aug 2006 14:44:08 -0000
@@ -5,27 +5,31 @@
 Major changes between versions 4.2 and 4.3
 ------------------------------------------
 
-- There is support for multibyte character sets in the line editor,
-  though not the main shell.  See Multibyte Character Support in INSTALL.
+- There is support for multibyte character sets.  This is now reasonably
+  close to complete, although Unicode combining characters don't work
+  properly.  See Multibyte Character Support in INSTALL.
 
 - The shell can now run an installation function for a new user
-  (one with no .zshrc, .zshenv, .zprofile or .zlogin file) without
-  any additional setting up by the administrator.
+  (a user with no .zshrc, .zshenv, .zprofile or .zlogin file) without
+  any additional setting up by the administrator.  See "THE ZSH/NEWUSER
+  MODULE" in the zshmodules manual page.
 
 - The manual now has a Roadmap section (manual page zshroadmap) to
   give new users an indication of the most interesting parts of the
   manual.
 
-- New option PROMPT_SP, on by default, to work around the problem that the
-  line editor can overwrite output with no newline at the end.
+- New option PROMPT_SP (on by default): works around the problem that the
+  line editor can overwrite output with no newline at the end.  See the
+  zshoptions manual page.
 
 - New option HIST_SAVE_BY_COPY (on by default): history is saved by
-  copying and renaming instead of directly overwriting.
+  copying and renaming instead of directly overwriting.  See the
+  zshoptions manual page.
 
 - New redirection syntax e.g. {myfd}>file opens a new file descriptor
   and stores the number in $myfd, so that >&$myfd will work.  Chosen
   not to break existing code (and to be compatible with proposals for the
-  Korn shell).
+  Korn shell).  See the section REDIRECTION in the zshmisc manual page.
 
 - Substitutions of the form ${var:-"$@"}, ${var:+"$@"} and similar where
   word-splitting is applied to the text after the :- or :+ (in particular,
@@ -36,20 +40,28 @@
 - New Posix-style zsh-specific tests [[:IDENT:]], [[:IFS:]],
   [[:IFSSPACE:]], [[:WORD:]] test if character can appear in identifier,
   is an IFS character, is an IFS whitespace character, or is considered
-  as part of a word (is alphanumeric or appears in $WORDCHARS).  Note
-  the pattern code doesn't yet handle multibyte characters.
+  as part of a word (is alphanumeric or appears in $WORDCHARS).  These
+  works correctly on multibyte characters if the appropriate support
+  is present.  See the section FILENAME GENERATION in the zshexpn
+  manual page.
 
 - The idiom =(<<<...) is optimised so that the shell internally turns
   the ... into the contents of a file whose name is then substituted.
+  The syntax has always been usable by means of the NULLCMD feature,
+  but previously it generated an intermediate process; it has now
+  been rewritten along the same lines as the optimisation for $(<...)
+  that inserts a file into the command line without the use of an
+  external programme.
 
 - Supplied functions catch and throw provide limited support for
   exception handling using the `{ ... } always { ... }' syntax.
+  See the section EXCEPTION HANDLING in the zshcontrib manual page.
 
 - Signals now accept the SIG as part of the name for compatibility with
   other shells.
 
 - Editor function argument-base allows non-decimal arguments for
-  editor widgets.
+  editor widgets.  See the entry in the zshzle manual page.
 
 - As always, there are many enhancements to completion functions.
 
Index: README
===================================================================
RCS file: /cvsroot/zsh/zsh/README,v
retrieving revision 1.35
diff -u -r1.35 README
--- README	2 Aug 2006 17:16:38 -0000	1.35
+++ README	4 Aug 2006 14:44:09 -0000
@@ -54,7 +54,8 @@
 assumed all such octets were allowed in identifiers, however the POSIX
 standard does not allow such characters in identifiers.  The older
 behaviour is still obtained with --disable-multibyte in effect.
-With --enable-multibyte set there are three possible cases:
+With --enable-multibyte in effect (this is now the default anywhere
+it is supported) there are three possible cases:
   MULTIBYTE option unset:  only ASCII characters are allowed; the
     shell does not attempt to identify non-ASCII characters at all.
   MULTIBYTE option set, POSIX_IDENTIFIERS option unset: in addition
-- 
Peter Stephenson <pws@xxxxxxx>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


To access the latest news from CSR copy this link into a web browser:  http://www.csr.com/email_sig.php



Messages sorted by: Reverse Date, Date, Thread, Author