Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: utf-8



18.12.2014, 23:04, "Ray Andrews" <rayandrews@xxxxxxxxxxx>:
> On 12/18/2014 10:52 AM, ZyX wrote:
> You are missing the main point. Identifiers consist of the characters for which `iswalnum` is true
>
> ...
> “☠” is U+2620 SKULL AND CROSSBONES which does *not* have unicode
> category “Letter” or “Number” and thus cannot be used in an identifier.
>
> Ok, I see what you are saying.  So 'anything' can be data, but an
> identifier must be a 'letter' or 'number'.  Where can I see a table of
> what iswalnum() accepts out of unicode?

http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt, third column. Read http://www.unicode.org/reports/tr44/tr44-14.html#General_Category_Values for the explanation of the values, you need L* and N* (note: testing shows that not all N* are relevant: No is not (test: CIRCLED DIGIT ONE), N is not as well (test: VULGAR FRACTION ONE QUARTER), Nd (DIGIT ONE, FULLWIDTH DIGIT ONE) and No (RUNIC ARLAUG SYMBOL) are). I highly suggest seeking answer in libc sources if you need better precision.



Messages sorted by: Reverse Date, Date, Thread, Author