From: cr88192@gmail.com   
      
   On 7/22/2024 3:14 PM, Scott Lurndal wrote:   
   > BGB writes:   
   >> On 7/22/2024 9:51 AM, John Ames wrote:   
   >>> On Fri, 19 Jul 2024 23:21:22 GMT   
   >>> scott@slp53.sl.home (Scott Lurndal) wrote:   
   >>>   
   >>>> Poor performance, silly filename length limitations.   
   >>>   
   >>> I dunno, 8.3 is downright spacious compared to a number of actual   
   >>> mainframe operating systems...   
   >>>   
   >>   
   >> Looking some, it seems:   
   >> MS-DOS: 8.3   
   >> Commodore: 15.0   
   >> Apple ProDOS: 16.0   
   >> Apple Macintosh: 31.0 (HFS)   
   >> Early Unix: 14 (~ N.M where N+M+1 <= 14)   
   >   
   > Although file suffixes had no intrinsic meaning   
   > for Unix, and were seldom more than a single   
   > character.   
   >   
      
   There were/are lots of 3 or 4 character file extensions, like ".cpp" or   
   ".html", ...   
      
   In Linux, there are lots of multi-part extensions, like ".tar.gz", etc.   
      
   Though, I guess in traditional Unix, 1 character was common.   
      
      
      
   >>   
   >> Whereas TENEX and some others were 6 character.   
   >> OS4000: 8 character   
   >> VAX/VMS (and others): 6.3   
   >   
   > VMS filenames were 17 character orignally, openvms   
   > allows much longer names.   
   >   
      
   When I was looking at it, VAX/VMS was listed as 6.3, whereas OpenVMS was   
   longer. Could be wrong, it was a fairly quick/dirty search.   
      
      
   >   
   >> Others:   
   >> ISO 9660 30 (variable format, similar to Unix)   
   >> UDF: 255   
   >> FAT32 and NTFS: 256 (UTF-16)   
   >> EXT2/3/4: 256 (UTF-8)   
   >   
   > POSIX defines the minimum path length (generally 1024),   
   > but any implementation of POSIX can choose to support   
   > longer filenames; most filesystem are limited to 255   
   > or 256 characters for a path component.   
   >   
      
   OK.   
      
   Windows has a filename limit of 256, but a path-length limit of 260, so   
   as noted, you can only put a full-length filename into the root   
   directory, and putting a long-name file in a long-name directory is   
   likely to run into the limit.   
      
   Things like video downloaders seem to limit the first part of the   
   filename to around 120 characters or so (typically using the video title   
   as the filename, and truncating it after this point).   
      
      
   But, yeah, 1024 for an overall path limit makes more sense than 260.   
   For my own project, I had assumed 512, but either way...   
      
   Well, excluding AF_UNIX sockets, which as-is will have a 104 character   
   name limit... Though, this is more because of the layout for   
   "sockaddr_un" (where "sockaddr_storage" generally supports up to 128   
   bytes for the total size).   
      
   Internally though, the idea isn't that the actual path for these sockets   
   is used though, but rather they are mashed into a 128-bit hash (where,   
   internally pretty much everything can be treated as-if it were IPv6).   
      
      
   >>   
   >> For most uses, a 32 character limit would probably be fine.   
   >   
   > In your use cases, perhaps.   
   >   
      
   IME, the vast majority of "normal" files tend to have names shorter than   
   32 characters.   
      
   The video files (within YouTube or similar) seem to primarily use   
   shorter alphanumeric names, but the video downloaders tend to use the   
   title as a filename (so may generate longer names...).   
      
      
   >   
   >> Basically using free-form names following Unix-like conventions, albeit   
   >> with semi-mandatory file extensions more like in Windows land (binaries   
   >> typically use '.exe' and '.dll' extensions; however, unlike Unix style   
   >> shells, the file extension is not usually given when invoking a command;   
   >> and the extension will be inferred when loading the program).   
   >   
   > Extensions were, and are, a pile of steaming stuff. They're   
   > completely unnecessary as a component of a filesystem. As   
   > a user-selected convention they're ok (for example, the gcc   
   > driver program selects which language to compile for from   
   > the extension (but it's optional anyway)), but the operating   
   > system knows nothing of extensions.   
   >   
      
   In my case, the filesystem driver and VFS doesn't really know much about   
   file extensions, but at the level of the shell and program loader, it   
   knows about extensions.   
      
      
   So, for things like opening files or "readdir()" or similar, it doesn't   
   care. The VFS doesn't know about LFN's either (rather, these are local   
   to the FAT driver). Internally, names are normalized to UTF-8 and   
   treated as case-sensitive (generally normalizing FAT 8.3 names to lower   
   case).   
      
   The handling for generating SFN's from LFN's differs slightly from   
   WIndows regarding FAT32:   
   Windows: "Program Name.txt" => "PROGNA~1.TXT"   
   TestKern: "~HHHHHHH.~~~", where HHH is an hash of the LFN.   
      
   Mostly because the "~1" convention requires figuring out which names   
   already exist and advancing a sequence number (what happens when 10+   
   conflict?...). Simply hashing the LFN is easier (and, if an LFN exists,   
   no need to care about the SFN as mostly no one will see it).   
      
   It will just use an 8.3 name in cases where the filename matches an 8.3   
   pattern (and the case can be encoded using WinNT rules).   
      
      
   There may also be some "$META.$$$" files, but these are used internally   
   by the FS driver and not exposed to programs (but, would be visible if   
   the drive viewed from Windows). These mostly being part of a hacky   
   scheme to add additional metadata (along vaguely similar lines to Linux   
   UMSDOS; just using native VFAT LFN's for the filenames). Unlike UMSDOS   
   though, in the table is keyed using the SFN rather than the location in   
   the directory (and is at least slightly less brittle).   
      
      
   With a new filesystem, the filesystem itself would not need to care   
   about file extensions, just encoding filenames (as a UTF-8 blob).   
      
   General idea was a scheme like:   
    0- 48: 1 entry;   
    49-100: 2 entry;   
    101-220: 4 entry.   
    221-256: 5 entry (though, has space for 280 bytes).   
      
   Where, each extended entry adds 60 bytes, but cuts 8 bytes off the   
   base-name (for the filename hash).   
    "OverlyLongFileNameThatIsASentance_NeedTOFindMoreToStickOnHere.txt"   
   Has a base name like:   
    "OverlyLongFileNameThatIsASentance_NeedT~HHHHHHH"   
   Where 'H' is the hash of the full name, and cut-off when rebuilding the   
   name from the LFN entries.   
      
      
      
   Though, in the case of the program loader, the extension doesn't really   
   determine how the file is loaded, as the loader itself mostly uses file   
   magic, eg:   
    'MZ': PE loader.   
    'PE': PE loader.   
    0x7F,'ELF': ELF Loader   
    '#!': Redirect ("#!pathname\n")   
      
   If it appears to be ASCII text, the extension is considered:   
    ".bas": BASIC interpreter.   
    Else: Shell Script   
      
   The shell will have a list of known executable extensions, and when a   
   command is typed, will look it up in the following pattern:   
    Check current directory:   
    Check first for no extension;   
    Then tries each known executable extension.   
    Check everything in the PATH environment variable:   
    Check first for no extension;   
    Then, try each known extension.   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|