Thursday, June 7, 2012

Re: What is the largest file I can edit using vim?

Gary Johnson <garyjohn@spocom.com> wrote:

> On 2012-06-07, Dominique Pellé wrote:
>> Marc Weber wrote:
>>
>> > forgett about vim, on linux just do:
>> > tail -n +10 file.sql  | head -n +10 > trimmed.sql
>>
>> Many people posted solutions with head and tail that don't work.
>>
>> Here is one that works:
>>
>> $ sed 1,10d < input.txt | tac | sed 1,10d | tac > output.txt
>
> Are you sure that tac works in this case?  I thought that tac pushed
> all the input lines onto a stack in memory, then popped each line as
> it was output.  That means having to put the entire file into
> memory, which we were trying to avoid.

Yes, I was wondering about that too. But I checked and
tac only uses very little memory even on huge files.

Actually, tac behaves differently with an input file and with
an input stream:

* with an input file, it outputs results immediately. I suppose
tac reads the file by block from the end of the file and reverse
each blocks.
* with an input stream it can't do that, so it has to read the
full stream before it can output. Yet it uses very little memory.
It uses temporary files (confirmed by looking at /prof/<pid>/fd)

So tac solution works. But given that tac is more efficient on
an input file than on an input stream, changing the order
should be better. In other words, this...

$ tac input.txt | sed 1,10d | tac | sed 1,10d > output.txt

... should be faster than this:

$ sed 1,10d input.txt | tac | sed 1,10d | tac > output.txt

I measured it on a big file to confirm it:

first solution took 9.2 sec, second solution took 12.2 sec

In any case, the Perl solution that I gave and which use a
rotating buffer does one pass only, does not use much memory
and does not use temporary files either.

$ perl -ne 'print $l[$.%10] if ($. >= 10*2); $l[$.%10] = $_' input.txt
>output.txt

Yet this Perl solution is slower than tac. It takes 14.8 sec on
the same input file.

The strange looking solution...

$ sed -e :a -e '$d;N;2,10ba' -e 'P;D' input.txt > output.txt

... takes 6.0 sec.

Tim Chase wrote:

> I think you're reading it backwards, as head/tail (at least GNU
> versions; for other flavors, YMMV) allow for a "+" in front of the
> number so
>
> tail -n +20
>
> chops off the first 19 lines in the file; similarly, "-" in front of
> the number with head does all but the N last lines of the file. The
> example above should likely read something like
>
> tail -n +11 file.sql | head -n -10 > trimmed.sql

Right. My apologies. That works indeed and it's much faster.
It does not have to parse line by line with this solution I suppose.
With the same large input as above, it only took 2.6 sec.

-- Dominique

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: