Saturday, April 29, 2023

Re: surprising glob() result on Windows

Stan Brown
Tehachapi, CA, USA
https://BrownMath.com

On 2023-04-29 08:28, Mike wrote:
> On 04/29/2023 10:51 AM, Mike wrote:
>> On 04/28/2023 9:32 PM, Mike wrote:
>>> Briefly, I have a case where glob("*.ext") returns more files than I
>>> expect.
>>>
>>> To give an example, in a directory of your choice create two files
>>> named "test.any" and "zest.anyother".  The important detail is that
>>> the second filename's extension be prefixed by the first filename's
>>> extension.
>>>
>>> Then launch Vim in that directory and run the command
>>>     :echo glob("*.any")
>>> Both files are returned, not just "test.any".
>>>
>>> I see this on Windows running vim 9.0.1240 with normal features built
>>> with Visual C.  On the other hand, Vim on my linux box returns only
>>> "test.any", as I would expect, so I don't think this a feature. :)
>>
>> I've since rebuilt Vim to include patches up to 1494 and still see the
>> same results on my Windows 10 system.  I thought that patches 1400 and
>> 1458 might help but they did not.
>
> More potatoes for the stew.
>
> Create 5 files: test.a, test.ab, test.abc, test.abcd and test.abcde.
> Then, using gvim -u NONE -U NONE --noplugin or gvim --clean:
>     glob("*.a") returns test.a
>     glob("*.ab") returns test.ab
>     glob("*.abc") returns test.abc, test.abcd and test.abcde
>     glob("*.abcd") returns test.abcd
>
> So the problem occurs when the glob pattern has a 3-character extension.

Mike, I saw someone answered this, but maybe their answer didn't reach you?

Short version: Windows is doing what it's supposed to, and so is Vim.

The original MS-DOS and MS-Windows file system, in the 1980s, allowed up
to 8 characters, and then optionally a dot (period, full stop) plus up
to 3 characters. Even if the file was created with lower-case characters
in its name, Windows would change those characters to upper case. We can
call these "8.3 filenames" for short.

Around the turn of the millennium (in Windows XP, if I recall
correctly), Windows added so-called long filenames (LFNs), which could
be longer than 8.3 and could contain lower-case.

Rather than start with a completely new file system (which would then
make floppy disks and other interchangeable media unreadable on the
previous generation of computers, Microsoft gave any filename that
exceeded 8.3 _two_ entries in the directory: one for the actual
filename, and one for an 8.3 "short filename" (SFN). If the new file's
name fit within 8.3, then it would get only that one entry, an SFN, in
the directory. Thus _every_ file had an SFN, but not every file had an
LFN. The graphical interface (called File Explorer, Windows Explorer, or
Explorer) would show an LFN if one existed, otherwise the SFN.

Some time after that, I'm not sure when but certainly by the release of
Windows 10, it became possible to disable SFNs for any particular disk
partition. And sometime after that, "LFNs only" became the default. But
your disk is obviously set to create SFNs from longer filenames.

Your test.a, test.ab, and test.abc all fit in the 8.3 paradigm, and
therefore they have only SFNs. Your test.abcd exceeds 8.3, so when you
created it Windows set up an SFN for it. How is the SFN formed? Windows
ignores any characters beyond the 6.3 limits (6.3, not 8.3), and for the
7th and 8th characters before the dot it adds ~1. Therefore your
test.abcd has two names, test.abcd and test~1.abc (probably ~1, but it
might be ~ and some other number). test.abcde is probably test~2.abc.
When you glob *.abc, the SFN name test~1.abc is caught in that net. But
since Windows prefers to show an LFN when one exists, you see them as
test.abcd and test.abcde.

None of the SFN/LFN business exists on Linux, and since glob() is a
Linux thing in origin it doesn't seem unreasonable to me that it doesn't
handle this.

If you really need to have more than three characters after the dot in
filenames, then the simplest thing would be for you to create a wrapper
function that calls glob and then in its return filters out anything
that doesn't match the input expression.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/2b0898b6-90f3-0fe8-077e-8761a263c0ca%40fastmail.fm.

No comments: