Tuesday, September 1, 2009

RE: improving the :join command

>>No disrespect intended, but *why* in B'Harni's Dark Name would you
>>want to join >10000 lines into 1?!?

>There might be usecases. Data is growing rapidly today, and I myself
>had to manage automatically generated text-files of several hundred MB
>of size. Plus there have occasionally been questions on this list
>regarding joining lines.

Even so, something which I can understand, eg, a logfile, should be
delineated by linebreaks. Raw xml/sgml/etc., should be edited as a
sequence of modestly-sized lines, then if necessary, joined to a single
line after saving (or before saving, if you have the time :D ).


>Well just one simple test:

>#v+
>~$ for i in 1 2 4 8 16 32 64 128; do
> seq 1 $(($i*1000)) >tempfile
> echo "joining $i kilo lines"
> time vim -u NONE -N -c ':%join|:q!' tempfile;
>done
>#v-

>and compare the timings yourself. Doesn't this look like a bug to you?

I have no idea, as I didn't run it yet. Offhand, an exponential
increase wouldn't be out of the question, ie, e^n.

Don't forget *physical* limits such as available memory. Once you bang
your head on that memory-ceiling and start having to swap to disk, all
bets are off, and processing time can increase by order*s* of magnitude,
depending how bad it is. Hell, I run into that in *perl*, let alone
'gvim', when intentionally joining huge files to a single line to c&p
whole sections of the file! And I'm not even dealing with syntax
highlighting, colorschemes, and the like.


>>Any 'vi' variant is a *line*-based editor, which presumed a modest
>>line-size for each. Juggling lines back and forth is easy, but
heaving
>>huge MB-sized chunks o' text is just obscene. Add to that
syntax-based
>>highlighting, multiple colors, etc., and all the processing required
for
>>just *1* line adds exponentially to the amount of work involved, let
>>alone cursor motions, etc.

>Well Vim is an editor. Shouldn't it be able to join properly millions
>of lines, even if that sounds strange? The power of vim comes from

Sure, it should be able to be pushed to its limits and do so, but not
necessarily *efficiently*. Ie, it may hit that aforementioned ceiling
and then start hitting the disk to do so, and pretty much require you to
leave it running overnight to go and join a brazillion lines into 1
Uberline. That's not necessarily a "bug", just an unexpected excursion
of its performance envelope. The fact that it can create a huge
Uberline without *crashing* is a testament to the robustness of the
code. An old version of 'vi' I had would vomit on lines >300chars or
so.

Point being, *line*-editors are meant to be used with *lines*, and lines
of a modest size. The fact that it *can* handle Uberlines is great, but
you can't expect it to be handled "efficiently". The kind of advice I
might give would be along the lines of the guy who sees his doctor:

guy: "Doc, it hurts when I do this."
doc: "So don't do that."


>the fact, that you can do many different manipulations very
>efficiently and does not limit you.

Absolutely, but again, recall that it's intended to be a *line*-editor.
Not to appear facetious in repeating that again, but that's what
'vim'/'gvim' happens to be, a *line*-editor. You yank and put *lines*.
You add *lines*. You delete *lines*. Hell, syntax highlighting becomes
downright painful for overly-long lines that people wrote add-ons to
stop highlighting after N columns! That should be Clue #1 that
overly-long lines are not "natural" to a *line*-editor.


>Plus :h limits does not talk about joining only a couple of lines ;)

Of course not. I can 'ls' a list of filenames into a file, do a ':%j'
to get them into a single line, then prepend a command to run (with the
filelist as the list of files to operate on) and make an instant
batchfile. Works great. But there's a huge difference between a
batch-/shell-command that's 1000chars long, and a 1-line file with a
100Mchar Uberline.


>>Dunno, but to me, that seems like using a text editor to edit a .jpg
or
>>.gif or something, ie, not the right tool for the job, even if,
through
>>herculean contortions and torturing the editor's functionality, it
*can*
>>be done.

>Exactly. It can. And it might be done by someone.

And if he has the luxury of letting it run overnight, great. :D


>>I'd, if anything, edit the file as needed, save it, then use 'sed',
>>'tr', etc., to post-process it accordingly. No overhead for syntax,
>>colorschemes, etc. Ie, use the right tool for the job.

>Yeah, but sed, tr, awk, perl, $language is not always available. And
>Vim should be able to do it right.

>What was the reason again to add :vimgrep to vim when grep is
>available?

I have no idea, as I don't recall ever using it. <shrug/>


To reiterate, I *don't* want to appear to be argumentative, but I'm just
saying that handing Uberlines is something that's *possible* in
'vim'/'gvim', but don't expect it to be handled "efficiently", not if
it's well outside the usual performance envelope of file-editing.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

No comments: