Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: utf-8

X-seq: zsh-users 19577
From: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
To: "zsh-users@xxxxxxx" <zsh-users@xxxxxxx>
Subject: Re: utf-8
Date: Thu, 18 Dec 2014 18:04:33 -0800
In-reply-to: <5493695B.3010702@eastlink.ca>
List-help: <mailto:zsh-users-help@zsh.org>
List-id: Zsh Users List <zsh-users.zsh.org>
List-post: <mailto:zsh-users@zsh.org>
Mailing-list: contact zsh-users-help@xxxxxxx; run by ezmlm
References: <5491C5E7.1070207@eastlink.ca> <20141218092544.01495a40@pwslap01u.europe.root.pri> <549310A1.4080602@eastlink.ca> <1024051418925912@web2o.yandex.ru> <54931FE4.2050100@eastlink.ca> <1182711418928721@web19g.yandex.ru> <54933331.2000709@eastlink.ca> <97181418935928@web17h.yandex.ru> <5493440A.5010908@eastlink.ca> <256161418938688@web19g.yandex.ru> <5493695B.3010702@eastlink.ca>

This has gone way off topic for the zsh-users list.  I don't recall if
the ietf-charsets list is still active, but that might be a better place
to go looking if it is.

On Dec 18,  3:55pm, Ray Andrews wrote:
} Subject: Re: utf-8
}
} On 12/18/2014 01:38 PM, ZyX wrote:
} > `\n`. Escapes are defined by zsh parser, not by anything else. Same 
} > for any other language. There is not much reasoning behind translating 
} > characters after `\` and I have never seen them actually translated in 
} > any language, no matter whether it allows unicode identifiers or not. 
} 
} Sorry, I don't understand.  Of course this is defined by zsh, but what 
} char in Cyrillic will be used for '\n' in Latin?  See what I mean?

Do you mean for '\n' to be interpreted as newline, or to be interpreted
as "a literal 'n'"?

} Or some even more different alphabet that has nothing like 'n' at all?
} Or do you have 'n' available to you exactly as in Latin?

Unicode is intended to be a "universal" character encoding, meaning that
all characters in all character sets are included.  So a literal 'n' is
always a literal 'n', and a newline is always a newline.

Unlike say the ISO set of character encodings, which have all of ASCII
in common but may use the same code point for different characters in
different languages, Unicode has only one code point for each possible
character regardless of language.  UTF-8 is the classic compromise that
made almost no one happy, because to fit everything into that code point
range a number of visually similar but semantically distinct ideograms
in various languages had to be combined on the same code points.

Follow-Ups:
- Re: utf-8
  - From: Ray Andrews

References:
- utf-8
  - From: Ray Andrews
- Re: utf-8
  - From: Peter Stephenson
- Re: utf-8
  - From: Ray Andrews
- Re: utf-8
  - From: ZyX
- Re: utf-8
  - From: Ray Andrews
- Re: utf-8
  - From: ZyX
- Re: utf-8
  - From: Ray Andrews
- Re: utf-8
  - From: ZyX
- Re: utf-8
  - From: Ray Andrews
- Re: utf-8
  - From: ZyX
- Re: utf-8
  - From: Ray Andrews

Messages sorted by: Reverse Date, Date, Thread, Author