From: bohannonindustriesllc@gmail.com   
      
   On 12/26/2024 3:46 PM, Waldek Hebisch wrote:   
   > BGB wrote:   
   >> Granted, this group isn't very active, but it is one of the few groups   
   >> where this sort of thing is on-topic, so...   
   >>   
   >> So, for my project I have made a new experimental filesystem I am   
   >> calling DFS2.   
   >>   
   >> Why?...   
   >> The existing filesystems didn't really fit what I wanted.   
   >>   
   >> Namely:   
   >> Can support Unix style features;   
   >> File ownership;   
   >> Symbolic links;   
   >> ...   
   >> Not excessively complicated;   
   >> Needs to be relatively low overhead.   
   >> Neither RAM nor CPU time are abundant.   
   >> Target is 50MHz with RAM measured in MB.   
   >> I also wanted things like file compression.   
   >> But, more for IO optimization than saving space.   
   >>   
   >> Currently I am using FAT32:   
   >> Does not natively support the desired features;   
   >> FAT chain walking and large clusters are not efficient to deal with;   
   >> No built-in compression.   
   >>   
   >> A few existing options:   
   >> Minix:   
   >> Too limited, missing features.   
   >> EXT2:   
   >> Not 1:1 with wanted features;   
   >> Some aspects of the design seem annoying.   
   >> NTFS:   
   >> Significantly over complicated.   
   >> Most other Linux filesystems:   
   >> Tend to have a problem of most being over-complicated;   
   >> Most are trying to "aim higher" than EXT2.   
   >> Or, are specialized for specific cases, like "intitrd".   
   >>   
   >>   
   >>   
   >> General design I went with:   
   >> Uses an inode table (kinda like EXT2)   
   >> Though, the table is itself represented with an inode (pros/cons);   
   >> Superblock gives the address of the inode table   
   >> with inode 0 as the inode-table inode.   
   >> Inode structure consists of multiple tagged members.   
   >> Sorta like the NTFS MFT entries.   
   >> Fixed size directory entries   
   >> They are 64 byte with a 48 byte name field;   
   >> Multiple entries are used for longer names (sorta like FAT).   
   >> Though, this should be infrequent,   
   >> as names longer than 48 bytes are rare.   
   >> Directory entries are organized into an AVL tree.   
   >>   
   >> Tradeoff for AVL:   
   >> An AVL tree is more complicated than could have been hoped. But, it is   
   >> less complicated than a B-Tree would have been. While a case could have   
   >> been made for using linear search (like FAT), it seemed like AVL was   
   >> probably worth the complexity (does mean directory lookups are roughly   
   >> log2 N).   
   >>   
   >> By my current estimates, likely the break-even point between an AVL tree   
   >> and a B-Tree (with k=16) would likely not be reach until around 1000   
   >> files, which is well over the size of most directories. If one could   
   >> argue that most directories were smaller than around 8-16 files, a case   
   >> could have been made for linear search; but AVL doesn't really add much   
   >> cost beyond some additional code complexity.   
   >>   
   >>   
   >> Where, each directory entry encodes:   
   >> A name field;   
   >> The inode number;   
   >> Left, Right, and Parent links;   
   >> The Z depth of this node;   
   >> Directory Entry Type.   
   >>   
   >> Initially, there was no parent-link in the tree, but the lack of a   
   >> parent link added significant complexity to tasks like walking the   
   >> directory or rebalancing nodes (it was necessary to keep track of this   
   >> externally), so I ended up adding a link. Though, at present this   
   >> reduces the maximum theoretical directory size to 2M files (unlikely to   
   >> be an issue).   
   >>   
   >> Note that the filesystem is case-sensitive and assumes UTF-8 for   
   >> filenames (with some wonk for names longer than 48 bytes).   
   >>   
   >>   
   >> Block management within inodes:   
   >> Uses a scheme similar to EXT2, namely (with 32-bit block numbers):   
   >> 16 entries, direct block number (16KB with 1K blocks)   
   >> 8 entries, indirect block (2MB )   
   >> 4 entries, double indirect block (256MB)   
   >> 2 entries, triple indirect block (16GB )   
   >> 1 entries, quad indirect block (4TB )   
   >> 1 entries, penta indirect block (1PB )   
   >> For larger volumes and compressed files, 64 bit numbers are used:   
   >> 8 entries, direct block number (8K / 128K, with 1K or 16K)   
   >> 4 entries, indirect block (512K / 8MB )   
   >> 2 entries, double indirect block (32MB / 512MB)   
   >> 1 entries, triple indirect block (2GB / 32GB )   
   >> 1 entries, quad indirect block (256GB / 1TB )   
   >>   
   >> Blocks and inodes are allocated via bitmaps, which are themselves   
   >> assigned inodes (and store their data the same as normal files).   
   >>   
   >> Where the reducing the number of entries avoids making the inode bigger.   
   >> Currently, the design can use 256 or 512 byte inodes.   
   >> The former layout with 64b entries requires a 512B inode.   
   >>   
   >> Had considered spans tables, but didn't seem worth it (though   
   >> conceptually simpler, spans are more complicated to deal with in terms   
   >> of code complexity).   
   >>   
   >> For the 64-bit block numbers, with compressed files, the high-order bits   
   >> are used for some additional metadata (such as how this logical block of   
   >> the file is stored, and how many disk-blocks were used).   
   >>   
   >>   
   >> For file compression:   
   >> A larger logical block size is used in this case:   
   >> Say, 16K/32K/64K vs 1K/2K/4K.   
   >> Currently I am using 16K and 1K.   
   >> This larger block is compressed,   
   >> and stored as a variable number of disk blocks.   
   >> Blocks are decompressed when loaded into a block-cache;   
   >> And, re-compressed when evicted (and dirty).   
   >> Underlying blocks are dynamically reallocated as needed.   
   >> Currently, using a non-entropy-coded LZ77 variant.   
   >> Say, along vaguely similar lines to LZ4.   
   >>   
   >>   
   >> Currently, with all of this, code size is still moderately smaller than   
   >> my existing FAT32 driver... Granted, this isn't saying much.   
   >>   
   >> Any thoughts?...   
   >   
   > For SD cards it is better to put critical data (like bitmaps or   
   > block lists) at start. IIUC SD card manufactires assume FAT-like   
   > filesystem and make first part (say 4 MB) better. So speading out   
   > metadata over the whole card means higher chance of loosing   
   > critical data.   
   >   
      
   OK.   
      
   Some of this could likely be tunable when creating the filesystem (in an   
   "mkfs" type tool or similar).   
      
   My code currently initially reserves 16K for the starting inode table,   
   and a few K for the initial bitmaps (though, bigger may be better).   
      
   But, up-front creation/reservation based on the final volume size   
   (rather than allocating them incrementally) is still very much possible.   
      
      
   Say, initial block assignment at present, IIRC:   
    0: Superblock   
    1: Root (first 16 entries)   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|