Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: PATCH: parse from even deeper in hell



On Fri, 20 Feb 2015 11:12:39 +0100
Mikael Magnusson <mikachu@xxxxxxxxx> wrote:
> > The question is where to put this in on history read.  I think it's
> > going to affect non-lexical history, too, but the error on reading won't
> > be flagged up.
> 
> I don't think so, unmetafy() doesn't care about the table. And as I
> checked earlier, both the old and new version of the string in my
> history file is unmetafied to the correct UTF-8 string. The 'only'
> problem is that the lexer is looking at some bytes before it's
> unmetafied and some stuff that should have been metafied to avoid
> being parsed as tokens, isn't, because they weren't special in the old
> version. That's why I think running unmetafy before lexing is
> needed... And if the lexer wants metafied text then we'd just have to
> metafy it again right away.

See if this fixes the problems, then.

Note we're almost out of meta characters with this limitation --- we
can't expand beyond the range of 32 we currently reserve if we need to
keep compatibility with history.  We're only just getting away with it
with 0xa0 because 0x80 isn't a meta character, as for historical reasons
they start at 0x83.

pws

diff --git a/Src/hist.c b/Src/hist.c
index 381c7e2..acc4259 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -3377,11 +3377,45 @@ histsplitwords(char *lineptr, short **wordsp, int *nwordsp, int *nwordposp,
     char *start = lineptr;
 
     if (uselex) {
-	LinkList wordlist = bufferwords(NULL, lineptr, NULL,
-					LEXFLAGS_COMMENTS_KEEP);
+	LinkList wordlist;
 	LinkNode wordnode;
-	int nwords_max;
+	int nwords_max, remeta = 0;
+	char *ptr;
+
+	/*
+	 * Handle the special case that we're reading from an
+	 * old shell with fewer meta characters, so we need to
+	 * metafy some more.  (It's not clear why the history
+	 * file is metafied at all; some would say this is plain
+	 * stupid.  But we're stuck with it now without some
+	 * hairy workarounds for compatibility).
+	 *
+	 * This is rare so doesn't need to be that efficient; just
+	 * allocate space off the heap.
+	 *
+	 * Note that our it's currently believed this all comes out in
+	 * the wash in the non-uselex case owing to where unmetafication
+	 * and metafication happen.
+	 */
+	for (ptr = lineptr; *ptr; ptr++) {
+	    if (*ptr != Meta && imeta(*ptr))
+		remeta++;
+	}
+	if (remeta) {
+	    char *ptr2, *line2;
+	    ptr2 = line2 = (char *)zhalloc((ptr - lineptr) + remeta + 1);
+	    for (ptr = lineptr; *ptr; ptr++) {
+		if (*ptr != Meta && imeta(*ptr)) {
+		    *ptr2++ = Meta;
+		    *ptr2++ = *ptr ^ 32;
+		} else
+		    *ptr2++ = *ptr;
+	    }
+	    lineptr = line2;
+	}
 
+	wordlist = bufferwords(NULL, lineptr, NULL,
+			       LEXFLAGS_COMMENTS_KEEP);
 	nwords_max = 2 * countlinknodes(wordlist);
 	if (nwords_max > nwords) {
 	    *nwordsp = nwords = nwords_max;



Messages sorted by: Reverse Date, Date, Thread, Author