Saturday, April 29, 2023

Re: surprising glob() result on Windows

On 04/29/2023 3:37 PM, Stan Brown wrote:
>
>
> Stan Brown
> Tehachapi, CA, USA
> https://BrownMath.com
>
> On 2023-04-29 08:28, Mike wrote:
>> On 04/29/2023 10:51 AM, Mike wrote:
>>> On 04/28/2023 9:32 PM, Mike wrote:
>>>> Briefly, I have a case where glob("*.ext") returns more files than I
>>>> expect.
>>>>
>>>> To give an example, in a directory of your choice create two files
>>>> named "test.any" and "zest.anyother".  The important detail is that
>>>> the second filename's extension be prefixed by the first filename's
>>>> extension.
>>>>
>>>> Then launch Vim in that directory and run the command
>>>>     :echo glob("*.any")
>>>> Both files are returned, not just "test.any".
>>>>
>>>> I see this on Windows running vim 9.0.1240 with normal features built
>>>> with Visual C.  On the other hand, Vim on my linux box returns only
>>>> "test.any", as I would expect, so I don't think this a feature. :)
>>>
>>> I've since rebuilt Vim to include patches up to 1494 and still see the
>>> same results on my Windows 10 system.  I thought that patches 1400 and
>>> 1458 might help but they did not.
>>
>> More potatoes for the stew.
>>
>> Create 5 files: test.a, test.ab, test.abc, test.abcd and test.abcde.
>> Then, using gvim -u NONE -U NONE --noplugin or gvim --clean:
>>     glob("*.a") returns test.a
>>     glob("*.ab") returns test.ab
>>     glob("*.abc") returns test.abc, test.abcd and test.abcde
>>     glob("*.abcd") returns test.abcd
>>
>> So the problem occurs when the glob pattern has a 3-character extension.
>
> Mike, I saw someone answered this, but maybe their answer didn't reach you?

If you're referring to Brams' answer, it did. However, his link
primarily referenced FAT-based systems, not NTFS, and so the light-bulb
remained off.

>
> Short version: Windows is doing what it's supposed to, and so is Vim.
>
> The original MS-DOS and MS-Windows file system, in the 1980s, allowed up
> to 8 characters, and then optionally a dot (period, full stop) plus up
> to 3 characters. Even if the file was created with lower-case characters
> in its name, Windows would change those characters to upper case. We can
> call these "8.3 filenames" for short.
>
> Around the turn of the millennium (in Windows XP, if I recall
> correctly), Windows added so-called long filenames (LFNs), which could
> be longer than 8.3 and could contain lower-case.
>
> Rather than start with a completely new file system (which would then
> make floppy disks and other interchangeable media unreadable on the
> previous generation of computers, Microsoft gave any filename that
> exceeded 8.3 _two_ entries in the directory: one for the actual
> filename, and one for an 8.3 "short filename" (SFN). If the new file's
> name fit within 8.3, then it would get only that one entry, an SFN, in
> the directory. Thus _every_ file had an SFN, but not every file had an
> LFN. The graphical interface (called File Explorer, Windows Explorer, or
> Explorer) would show an LFN if one existed, otherwise the SFN.
>
> Some time after that, I'm not sure when but certainly by the release of
> Windows 10, it became possible to disable SFNs for any particular disk
> partition. And sometime after that, "LFNs only" became the default. But
> your disk is obviously set to create SFNs from longer filenames.

Thank you, now I understand.

Motivated by your answer, I've looked-up the NTFS article on wikipedia-
https://en.wikipedia.org/wiki/NTFS
and it says that short filenames are implemented as "hard links". I,
unthinkingly, did not realize this.

>
> Your test.a, test.ab, and test.abc all fit in the 8.3 paradigm, and
> therefore they have only SFNs. Your test.abcd exceeds 8.3, so when you
> created it Windows set up an SFN for it. How is the SFN formed? Windows
> ignores any characters beyond the 6.3 limits (6.3, not 8.3), and for the
> 7th and 8th characters before the dot it adds ~1. Therefore your
> test.abcd has two names, test.abcd and test~1.abc (probably ~1, but it
> might be ~ and some other number). test.abcde is probably test~2.abc.
> When you glob *.abc, the SFN name test~1.abc is caught in that net. But
> since Windows prefers to show an LFN when one exists, you see them as
> test.abcd and test.abcde.
>
> None of the SFN/LFN business exists on Linux, and since glob() is a
> Linux thing in origin it doesn't seem unreasonable to me that it doesn't
> handle this.
>
> If you really need to have more than three characters after the dot in
> filenames, then the simplest thing would be for you to create a wrapper
> function that calls glob and then in its return filters out anything
> that doesn't match the input expression.
>
Actually, I discovered this not because of glob() but because ":packadd"
was sourcing two files- one named pack.vim and the other named
pack.vim9, and both defined global-scope functions with the same name.
When I looked at the vim source code it appeared that it relied on glob
and so I chose to post the issue using glob as it seemed more fundamental.

Again, thanks for taking the time to provide a detailed answer.

-mike




--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/u2kdv1%24l82%241%40ciao.gmane.io.

No comments: