Re: vcs_info and locales

On 2010-04-25 at 10:38 +0200, Frank Terbeck wrote:
> Anyway, could you try the following patch for the locale problem? I
> think it should solve the issue once and for all.

I have one concern, which leads to the question: is it really necessary
to set LC_ALL instead of LC_MESSAGES?

The main problem is that when you override LC_CTYPE to C, you lose any
potential UTF-8 support, unless the tool just passes through the binary

I think the safest algorithm is not to set LC_ALL but instead:
 * if LC_ALL is set and is not C, set LANG=$LC_ALL, unset LC_ALL

Make sense?

Rest of this email is just some exploration and skippable.

I don't have NLS support on my main box, or I could do more testing
myself; with { svn log }, where most of my UTF-8 shows, LC_CTYPE=C leads
to expressing the content with escapes instead of cleanly.  I know
VCS_Info doesn't use that, I mention it by way of example.  { svn info }
by contrast always percent-encodes those characters; this works anyway,
because VCS_Info walks back up the dir-tree to find the svn co dir, so
has the relative info by comparing the FS realpath'd root of the repo to
the current dir.

URL: https://svn.spodhuis.org/ksvn/scratch/Fran%C3%A7ois
 -> VCS_Info %S == FranÃois
 (yes, I picked the OP's name as example testdata)

For experimentation, I created a repo with a UTF-8 character in its
name.  Apache/mod_dav_svn won't serve it:
(20014)Internal error: Can't convert string from 'UTF-8' to native encoding: [...]

but I can use file:/// access instead.  A repo named foo-â appears in my
prompt as <foo-%E2%98%BA:0> (<name:version>).

And still VCS_Info works:
 URL: file:///home/pdp/tmp/T/ROOT/foo-%E2%98%BA/fred
 Repository Root: file:///home/pdp/tmp/T/ROOT/foo-%E2%98%BA
 -> VCS_Info %S == fred
    pwd -> ..../T/foo-â/fred

 URL: file:///home/pdp/tmp/T/ROOT/foo-%E2%98%BA/%E2%99%A1
 Repository Root: file:///home/pdp/tmp/T/ROOT/foo-%E2%98%BA
 -> VCS_Info %S == â
    pwd -> ..../T/foo-â/â

I â  VCS_Info for just working, but it's still juju.  It also works as
an accidental artifact of the VCS_INFO_get_data_svn implementation.  I
get to say "accidental" because apparently I wrote that code.
*scratches head*

(Through all this, cd gets interesting when xtitle updates to iTerm
 silently drop the UTF-8 characters through to the display)

