Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: utf-8

17.12.2014, 21:37, "Ray Andrews" <rayandrews@xxxxxxxxxxx>:
> When we talk about utf-8 and zsh, what is the relevance of that?  I mean
> what/when/where is zsh concerned with character encoding?  Filenames I
> guess, and inside strings too, perhaps? Not in zsh syntax itself I
> presume.  I guess that any data stream would/could be utf-8erized.
> Anywhere else?  Or is this something where I'm not even asking the right
> question?

You can check out explicit `utf-8` support by searching for `(?i)utf-?8|unicode` in `man zshall`.

It looks  like it is the following:

- Explicit support in RE patterns.
- COMBINING_CHARS option that tells zsh that terminal is able to display combining characters correctly (i.e. when calculating width zsh should assume that combining characters are joined with non-combining ones and thus are effectively zero cells wide).
- MULTIBYTE option that affects string indexing and string length calculations, also `${(#)SOME_INTEGER_THAT_IS_GREATER_THEN_127}` parameter expansion flag.
- `$'\uXXXX'` and `$'\UXXXXXXXX'`.
- Width calculations for unicode characters with East Asian width property equal to F and W (i.e. fullwidth or double-width characters).
- `insert-unicode-char` widget.

Otherwise zsh supports encoding from the system locale (which may be UTF-8 or not) and not UTF-8.

// Note: I did not actually check the code, I only checked the documentation.

Also note that it would be very, very strange if zsh assumed filenames are in any encoding. File systems usually hold filenames as pure byte strings that just cannot contain some characters (for POSIX filesystem they only cannot contain `/` (because it is directory separator) and `\0` (because it is almost impossible to implement since there was some legacy: C strings are considered zero-terminated)). Any sane language knows that filename is a zero-terminated `/`-separated (with some additional assumptions if it intends to be run on Windows) byte string and that filename is *just* zero-terminated `/`-separated string *and nothing beyond that*. Not even that `abc/./../def` can be transformed to `def`: it is generally not true, so such normalization is always done only explicitly. (Note: Python-3 is *not* sane.)

Messages sorted by: Reverse Date, Date, Thread, Author