Rebbe Malachi: Re: how to rearrange text file

Wednesday, May 12, 2010

Re: how to rearrange text file

On 05/12/2010 04:18 PM, esquifit wrote:
> On 11 Maig, 20:31, Tim Chase<v...@tim.thechases.com> wrote:
>>> - Every space that's in a line must be counted, placed upfront the
>>> line, and by the number + 1 needs to be done.
>>
>> You can use the following:
>>
>> :%s/.*/\=strlen(substitute(submatch(0), '\S\+', '', 'g')).'
>> '.submatch(0)
>>
>> to prepend the space-counts.
>
> How would be the performance of such a formula when processing a big
> file? I came upon another more or less obvious solution -again, with
> some simplifications like not making distinction among spaces and
> tabs. I'm using a macro:
>
> yyP:s/\S//g<CR>"=col('$')<CR>pJ

Just a little back-of-the-envelope thinking:

- copy the line into the scratch register, purging out the
previous contents of the previous scratch register, updating the
"0" (yank) register: O(1)

- switch to command-line mode: O(1)

- perform a substitute across the line: O(len(line))

- switch back to normal mode: O(1)

- switch into the expression-register entry mode: O(1)

- pull the length of the line: O(1)

- switch back to normal mode

- insert the preserved contents of the line: O(1)

- join the two lines: either O(1) or O(len(line))

> I run this macro on all lines (99999 times or so)

So this comes out to

((k + len(line)) * number_of_lines)

where k = "mode-switching" time + "copying to 2 registers" time +
"inserting a line time" + "joining a line" time + "removing the
leading space with your following substitute" time

Recording the macro (and burning a register to contain the
macro), ensuring that your "99999" covers sufficient lines (if
you choose fewer than the actual number of lines, you have to
re-execute your macro) also takes a bit more time.

> and at the end I suppress the leading spaces with
>
> :%s/\s//

and then doing a second pass on the entire file is
O(number_of_lines) which adds into the above sum.

> Without having made any benchmarking, I *suspect* that this can be
> quicker than using strlen, and submatches. I'd like to hear your
> opinion about this.

Using my suggestion is a one-pass (touches each line once and
only once), performing the substitute on it as it's touched,
finding the length (may be O(1) or O(len) depending on
implementation) and joining the space-count together with the
line content. If speed is important, you might be able to tweak
my original to

:%s/^/\=strlen(substitute(getline('.'), '\S\+', '', 'g')).' '

which terminates the initial search regexp and does a little less
work to get the results. Both my original solution and my 2nd
suggestion have additional benefits of

- not switching between various modes multiple times
- not tromping the contents of your scratch register
- not tromping the contents of your "0" (yank) register
- not burning a register for the macro
- not having to guess how many executions to replay

Feel free to benchmark if it matters to you :)

It just goes to show that Vim accommodates a variety of solutions
and has plenty of room for tweaking solutions if one doesn't work
for you.

-tim

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Rebbe Malachi

Wednesday, May 12, 2010

Re: how to rearrange text file

No comments:

Blog Archive

About Me