Tuesday, February 22, 2011

Re: Including into the current buffer from an external source

On Tue, February 22, 2011 4:10 pm, ZyX wrote:
> by Christian Brabandt:
>> I am not sure, why you insist on NULL handling. Traditionally, shell
>> scripts couldn't handle binary NULL.
> In arguments only. grep, sed, awk, less, cat, cut, ... handle NULLs in
> files just fine.

This is not guaranteed if you want to stay portable. Gnu might have
expanded the scope of their utilities to handle binary files just fine,
but POSIX requires most of the above utilities to handle only "text"
files:

Quoting The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html#tag_03_392

3.392 Text File
A file that contains characters organized into one or
more lines. The lines do not contain NUL characters and none can
exceed {LINE_MAX} bytes in length, including the <newline>. Although
IEEE Std 1003.1-2001 does not distinguish between text files and
binary files (see the ISO C standard), many utilities only produce
predictable or meaningful output when operating on text files. The
standard utilities that have such restrictions always specify "text
files" in their STDIN or INPUT FILES sections.


Now look into the definition of said commands:

Sed:
http://pubs.opengroup.org/onlinepubs/009695399/utilities/sed.html

INPUT FILES
The input files shall be text files. The script_files named by the -f
option shall consist of editing commands.

Grep:
http://pubs.opengroup.org/onlinepubs/009695399/utilities/grep.html

INPUT FILES
The input files shall be text files.

awk:
http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html

INPUT FILES
Input files to the awk program from any of the following sources shall
be text files...

less (not portable).
more
http://pubs.opengroup.org/onlinepubs/009695399/utilities/more.html

INPUT FILES
The input files being examined shall be text files.

cat
http://pubs.opengroup.org/onlinepubs/009695399/utilities/cat.html

INPUT FILES
The input files can be any file type.


cut
http://pubs.opengroup.org/onlinepubs/009695399/utilities/cut.html
INPUT FILES
The input files shall be text files, except that line lengths shall
be unlimited.

so cat must handle NULLs correctly. I didn't even know that. I only knew
about tr (1) that handles NULL correctly. And I know, that sed used to
have problems with multibyte chars (I don't know if this changed), so I
wouldn't expect sed to handle binary files.

>> The only command, I can think of, that produces NULL delimited output is
>> find -print0 together with xargs -0 and even these parameters are not
>> portable and in fact not even necessary.
> Why it is not portable? I do not see that -print0 is claimed to be a GNU
> extension in find manual.

As far as I know, the arguments -print0 to find and -0 to xargs are not
portable. Look in the above standard. You won't find those paramters.
And as I said before, you don't even need those. You can work around it
using -exec sh -c '...' sh {} +

find
http://pubs.opengroup.org/onlinepubs/009695399/utilities/find.html
A feature of SVR4's find utility was the -exec primary's + terminator.
This allowed filenames containing special characters (especially
<newline>s) to be grouped together without the problems that occur if
such filenames are piped to xargs. Other implementations have added
other ways to get around this problem, notably a -print0 primary that
wrote filenames with a null byte terminator. This was considered here,
but not adopted. Using a null terminator meant that any utility that was
going to process find's -print0 output had to add a new option to parse
the null terminators it would now be reading.

Also see Svens page:
http://www.in-ulm.de/~mascheck/various/find/#xargs

>> So handling NULL is a corner case, that you can't expect to work
>> reliably anyway

> You can expect to work reliably if you give up VimL. Perl, awk,
> python, zsh (but not bash), ruby and many other scripting languages
> that I do not know about are able to hold NULL in a string variable
> just like any other byte.

Yes. But we are talking about a text editor, right? I wouldn't expect a
text editor to handle NULLs correctly and that's why I called it a
corner case (and see above about awk).

> For VimL there is a simple rule: if command/function accepts/outputs a
> list of strings (consider range of lines to be a list of strings),
> then each NL in each string is NULL. That is true for
> edit/write/read/... as well as readfile/writefile/getline/setline/....
> If it does not accept or output a list, then you cannot have a NULL
> there. Another thing is that normal-mode commands that use registers
> also handle null correctly, but, for example, you cannot distinguish
> NL which represents NULL and NL which represents NL if you yank
> something and then parse @{register} variable in a script.

Good point. Should be put into the faq.

For a deeper discussion about handling NULL in shell command language, I
refer to the usegroup comp.unix.shell because this is getting
a little bit off topic here.

regards,
Christian

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: