Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: configure tests for iconv



Peter wrote:
> 
> No, that doesn't work either.  The error is from returning -1 from
>     	    	    cd = iconv_open(nl_langinfo(CODESET), "ISO-10646");

I tried downloading GNU libiconv and, sure enough, it doesn't like
"ISO-10646". I had imagined libiconv was the same code as glibc uses but
perhaps not. At least with this being the problem, I'm now fairly
confident that the configure tests are working.

Trying a few different systems, it seems UCS-4BE is a much more portable
choice of name to identify the character encoding by. Given that the
endianness is explicit, that might be a better choice anyway. So with
the following patch it should now work. If this breaks for any system we
can always try multiple names for the encoding.

Incidentally, the patch also helps on Solaris 8. The Solaris machines I
have access to didn't previously have any of the UTF-8 iconv packages
installed so I had assumed it simply couldn't do the necessary
conversions. Below is also a patch against _iconv to pick up these
character encodings on Solaris.

> > Does /usr/bin/printf's \u work?
> 
> This fails too, but with the slightly odd error "invalid universal
> character name".  It's not a problem with the input format, however,

I think its telling you that it refuses to convert characters in that
particular range and not that it especially *can't* convert them. It
won't handle the basic ASCII characters on any system. I think it also
prints that message for certain reserved or unallocated ranges. I really
can't see the point of that but it's a GNU coreutils issue.

> This gives US-ASCII, which might be part of the problem, though I really
> haven't the faintest idea.  A quick scan of the regional and language
> settings didn't suggest anything.

Well with the patch below, it should hopefully now cope with stuff like
\\u0061 which is as much as we can hope for in a US-ASCII locale. The
rest is obviously a Cygwin issue. Perhaps we should add an UNKNOWN_CHAR
variable or similar system to allow something else to be substituted
instead of an error message.

Oliver

Index: Completion/Unix/Command/_iconv
===================================================================
RCS file: /cvsroot/zsh/zsh/Completion/Unix/Command/_iconv,v
retrieving revision 1.4
diff -u -r1.4 _iconv
--- Completion/Unix/Command/_iconv	17 Jun 2004 13:12:26 -0000	1.4
+++ Completion/Unix/Command/_iconv	3 Mar 2005 10:29:49 -0000
@@ -1,7 +1,8 @@
 #compdef iconv
 
-local expl curcontext="$curcontext" state line codeset ret=1
+local expl curcontext="$curcontext" state line ret=1
 local LOCPATH="${LOCPATH:-/usr/lib/nls/loc}"
+local -U codeset
 
 if _pick_variant gnu=GNU unix --version; then
 
@@ -40,6 +41,7 @@
   if [[ $state = codeset ]]; then
     if [[ -f /usr/lib/iconv/iconv_data ]]; then  # IRIX & Solaris
       codeset=( ${${(f)"$(</usr/lib/iconv/iconv_data)"}%%[[:blank:]]*} )
+      codeset+=( /usr/lib/iconv/*%*.so(Ne.'reply=( ${${REPLY:t}%%%*} ${${REPLY:r}#*%} )'.) )
     elif [[ -d $LOCPATH/iconv ]]; then  # OSF
       codeset=( $LOCPATH/iconv/*(N:t) )
       codeset=( ${(j:_:s:_:)codeset} )
Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.75
diff -u -r1.75 utils.c
--- Src/utils.c	25 Feb 2005 10:21:01 -0000	1.75
+++ Src/utils.c	3 Mar 2005 10:29:50 -0000
@@ -3617,13 +3617,13 @@
 		    ICONV_CONST char *inptr = inbuf;
     	    	    inbytes = 4;
 		    outbytes = 6;
-		    /* assume big endian convention for UCS-4 */
+		    /* store value in big endian form */
 		    for (i=3;i>=0;i--) {
 			inbuf[i] = wval & 0xff;
 			wval >>= 8;
 		    }
 
-    	    	    cd = iconv_open(nl_langinfo(CODESET), "ISO-10646");
+    	    	    cd = iconv_open(nl_langinfo(CODESET), "UCS-4BE");
 		    if (cd == (iconv_t)-1) {
 			zerr("cannot do charset conversion", NULL, 0);
 			if (fromwhere == 4) {



Messages sorted by: Reverse Date, Date, Thread, Author