Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: utf-8

18.12.2014, 20:38, "Ray Andrews" <rayandrews@xxxxxxxxxxx>:
> On 12/18/2014 01:25 AM, Peter Stephenson wrote:
> Mikael, Peter:
>>  Chapter 5 of the FAQ is the best place to start. You can see this
>>  online at http://zsh.sourceforge.net/FAQ/zshfaq05.html#l52. The
>>  version in Etc of the source is newer but I don't think there are
>>  significant differences. pws
> Very nicely written. That's exactly what I wanted to learn.  And tho I
> knew it
> previously, I had semi forgotten the difference between unicode and utf-8,
> which lead to the fuzzy question. To ask it again more accurately, where are
> extended unicode characters permitted? Or perhaps that's better reversed,
> where are they *not* permitted? Can a variable have a name beyond ASCII?
> I see that zsh is transparent to utf-8 everywhere, but that does not presume
> that one has use of the entire unicode charset in all situations.

It is permitted at least in variable and function names: though I cannot find anything relevant in manual regarding them, but code that implements `isident` function that is used to check for variable names (not function names, I do not know this part) indirectly uses library function `iswalnum` which in turn knows about unicode character classes (depends on LC_CTYPE).

AFAIK function name can be anything that is not parsed as anything else: the following definition works:

    '()' () {
        echo Test

    # Outputs Test.


    $PATH () {
        echo Test

    # Outputs Test as well.

. It looks like zsh code was intentionally modified to use `iswalnum` for `itype_end` called from `isident`. It also appears that UTF-8 characters in IFS are also recognized: `itype_end` handles them as well and I do not think such handling was added without a reason. Everything is locale-bound in any case because libc functions are used and not something like icu.

Messages sorted by: Reverse Date, Date, Thread, Author