Saturday, April 23, 2016

Re: Capture columns nummers of matches ending with double byte chars

2016-04-21 22:03 GMT+03:00 rameo <raiwil@gmail.com>:
> Since I use Vim I have troubles with double byte characters.
>
> I want to capture all strings of matches together with startcolumn and endcolumn of a match (line by line). I don't need only the strings but also the columnnumbers for other functions.
>
> The last few years I used match/matchend then I noted that it did not capture correctly double byte characters within the string.
> Then I adapted everything to use searchpos() but today I found out that it gives troubles with string with a double byte character at the end.
>
>
> "mylist = list with all linenrs having matches
> for n in range(0, len(mylist)-1)
> let idx = []
> let edx = []
> let matches_between_cols = []
>
> "FIND ALL IDX MATCHES
> "idx --> forward search
> call cursor(mylist[n],1)
> while line(".") == mylist[n]
> let S= searchpos(@/, '')
> if S[0] == mylist[n]
> call add(idx, S[1]-1)
> endif
> endwhile
> "idx --> backward search (to include matches on first column)
> call cursor(mylist[n],len(getline(mylist[n])))
> while line(".") == mylist[n]
> let S= searchpos(@/, 'b')
> if S[0] == mylist[n]
> call add(idx, S[1]-1)
> endif
> endwhile
>
> "FIND ALL EDX MATCHES
> "edx --> forward search
> call cursor(mylist[n],1)
> while line(".") == mylist[n]
> let E= searchpos(@/, 'e')
> if E[0] == mylist[n]
> call add(edx, E[1])
> endif
> endwhile
> "edx --> backward search (to include matches on first column)
> call cursor(mylist[n],len(getline(mylist[n])))
> while line(".") == mylist[n]
> let E= searchpos(@/, 'eb')
> if E[0] == mylist[n]
> call add(edx, E[1])
> endif
> endwhile
>
> if len(idx) > 0
> for i in range(0,len(idx)-1)
> let r = strpart(getline(mylist[n]),idx[i], edx[i]-idx[i])
> call add(matches_between_cols, r)
> endfor
> endif
> endfor
>
> -----------------------------------
> Buffer:
> city | Felicità
> whatever | Peach
> pmg00000001 | Perché
> text| Céline
> bMgbXuEWo | Université
>
>
> @/ = "| \zs\S\+"
> it captures:
> Felicit<c3>
> Peach
> Perch<c3>
> Céline
> Universit<c3>
>
> Expected:
> Felicità
> Peach
> Perché
> Céline
> Université
>
> Can you please tell me what I did wrong?

I cannot say what you did wrong, but calling `searchpos()` multiple
times for each occurence of a pattern is rather wasty. Check how I did
this in [formatvim][1], I collect matches there to highlight them
later, so it gets start and end positions of the match.

[1]: https://bitbucket.org/ZyX_I/formatvim/src/a00edc4c7032bde5c7e970bca7871e9317ee2265/autoload/format.vim#format.vim-1457

> (Is it not possible to let every character be a single byte char as in languages as Python?)

This was already discussed many times. No, it is not: backward
compatibility, though there are special functions (useless because
column is byte index and not character index, virtual column is in
screen cells which also does not match characters).

Also any character above U+00FF in Python3 is *not* a single byte, it
is just as single byte character as 0xFFFF is single byte in `[0xFFFF,
0xFFFE, 0xFFFD][0]`. Simply different way of storing and indexing
strings.

>
> --
> --
> You received this message from the "vim_use" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups "vim_use" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: