Tuesday, June 23, 2015

Re: Search for character that doesn't have a combining character?

On Tuesday, June 23, 2015 at 4:20:32 PM UTC-5, ZyX wrote:
> 2015-06-24 0:02 GMT+03:00 Ben Fritz <fritzophrenic@gmail.com>:
> > On Tuesday, June 23, 2015 at 3:35:50 PM UTC-5, Ben Fritz wrote:
> >> I'm working on a custom command to add strikethrough to text, using the Unicode COMBINING LONG STROKE OVERLAY, 0x0336.
> >>
> >> In this command, I want to apply a strikethrough to a character, only if it is not already present.
> >>
> >> This pattern fails because it doesn't match *anything* with regexpengine set to 2, it does not match an unadorned character immediately before a struck-through base character, and it *does* match the last combining character in a word for some reason:
> >>
> >> [^\u0336]\%u0336\@!
> >>
> >> This pattern also fails, because it matches already struck-through base characters for some reason (although it does the same thing in both engines):
> >>
> >> [^\u0336][^\u0336]\@=
> >>
> >> What is the correct way to do this?
> >>
> >> Full command (attempted):
> >>
> >> '<,'>s;\%#=1\%V[^\u0336]\%u0336\@!;\=submatch(0)."\u0336";g
> >>
> >> Note, how I'm also limiting to a visual selection; so I'm trying to use the :s command for simplicity.
> >
> > My next attempt is to do two passes, first to remove the combining character from everywhere in the visual selection, and then to add it to the entire visual selection.
> >
> > But, my patterns for this task either don't match at all, or they remove the base character along with the combining character! Even this doesn't work, it removes the base character:
> >
> > echo join(split(getline('.'), "\u0336"),"")
>
> Though there is always one hack to get exactly one unicode codepoint
> from *valid* UTF-8 string:
>
> echo nr2char(char2nr(string[position :]))
>
> . You can use `len(nr2char(…))` to get the length of the first
> character and thus get to the second. I think this will allow you to
> construct needed \= expression, but the result would most likely be a
> definition of a new function due to its complexity.
>

Thanks! I agree this needs to be better supported in Vim's regex and tr() function. For my purposes I can pretty much always assume any combining characters are the strikethrough characters, making the replacement function trivial to implement with a nr2char(char2nr(submatch(0))) hack, but obviously this is not a good general solution as it will strip off all other combining characters when adding or removing the one character I'm actually interested in. I guess I could loop through the input string as you suggest if I'm interested in making a general solution at some point.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: