Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: In POSIX mode, ${#var} measures length in bytes, not characters



07.06.2015, 03:29, "Martijn Dekker" <martijn@xxxxxxxx>:
> When in 'emulate sh' mode, ${#var} substitutes the length of the
> variable in bytes, not characters. This is contrary to the standard; the
> length in characters is supposed to be substituted.[*]
>
> Oddly enough, zsh is POSIX compliant here in native mode, but
> non-compliant in POSIX mode.

Do you have a reference where “character” is defined? This behaviour is the same in posh and dash:

    % posh -c 'VAR="«»"; echo ${#VAR}'
    4
    % dash -c 'VAR="«»"; echo ${#VAR}'
    4
    % zsh -c 'VAR="«»"; echo ${#VAR}' # Non-POSIX mode: length in Unicode codepoints for comparison
    2
    % locale
    LANG=ru_RU.UTF-8
    LC_CTYPE="ru_RU.UTF-8"
    LC_NUMERIC="ru_RU.UTF-8"
    LC_TIME="ru_RU.UTF-8"
    LC_COLLATE="ru_RU.UTF-8"
    LC_MONETARY="ru_RU.UTF-8"
    LC_MESSAGES="ru_RU.UTF-8"
    LC_PAPER="ru_RU.UTF-8"
    LC_NAME="ru_RU.UTF-8"
    LC_ADDRESS="ru_RU.UTF-8"
    LC_TELEPHONE="ru_RU.UTF-8"
    LC_MEASUREMENT="ru_RU.UTF-8"
    LC_IDENTIFICATION="ru_RU.UTF-8"
    LC_ALL=

>
> Confirmed in zsh 4.3.11 (Mac OS X), 5.0.2 (Linux) and 5.0.8 (Mac OS X).
>
> $ zsh
> % locale
> LANG="nl_NL.UTF-8"
> LC_COLLATE="nl_NL.UTF-8"
> LC_CTYPE="nl_NL.UTF-8"
> LC_MESSAGES="nl_NL.UTF-8"
> LC_MONETARY="nl_NL.UTF-8"
> LC_NUMERIC="nl_NL.UTF-8"
> LC_TIME="nl_NL.UTF-8"
> LC_ALL=
> % mot=arrêté
> % echo ${#mot}
> 6
> % emulate sh
> % echo ${#mot}
> 8
>
> - Martijn
>
> [*] Reference:
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
>>  ${#parameter}
>>      String Length. The length in characters of the value of parameter
>>      shall be substituted. [...]



Messages sorted by: Reverse Date, Date, Thread, Author