From: running_man@writeable.com   
      
   On 11/12/2024 12:35 BGB wrote:   
   > Granted, this group isn't very active, but it is one of the few groups   
   > where this sort of thing is on-topic, so...   
   >   
   > So, for my project I have made a new experimental filesystem I am   
   > calling DFS2.   
   >   
   > Why?...   
   > The existing filesystems didn't really fit what I wanted.   
   >   
   > Namely:   
   > Can support Unix style features;   
   > File ownership;   
   > Symbolic links;   
   > ...   
   > Not excessively complicated;   
   > Needs to be relatively low overhead.   
   > Neither RAM nor CPU time are abundant.   
   > Target is 50MHz with RAM measured in MB.   
   > I also wanted things like file compression.   
   > But, more for IO optimization than saving space.   
   >   
   > Currently I am using FAT32:   
   > Does not natively support the desired features;   
   > FAT chain walking and large clusters are not efficient to deal with;   
   > No built-in compression.   
   >   
   > A few existing options:   
   > Minix:   
   > Too limited, missing features.   
   > EXT2:   
   > Not 1:1 with wanted features;   
   > Some aspects of the design seem annoying.   
   > NTFS:   
   > Significantly over complicated.   
   > Most other Linux filesystems:   
   > Tend to have a problem of most being over-complicated;   
   > Most are trying to "aim higher" than EXT2.   
   > Or, are specialized for specific cases, like "intitrd".   
   >   
   >   
   >   
   > General design I went with:   
   > Uses an inode table (kinda like EXT2)   
   > Though, the table is itself represented with an inode (pros/cons);   
   > Superblock gives the address of the inode table   
   > with inode 0 as the inode-table inode.   
   > Inode structure consists of multiple tagged members.   
   > Sorta like the NTFS MFT entries.   
   > Fixed size directory entries   
   > They are 64 byte with a 48 byte name field;   
   > Multiple entries are used for longer names (sorta like FAT).   
   > Though, this should be infrequent,   
   > as names longer than 48 bytes are rare.   
   > Directory entries are organized into an AVL tree.   
   >   
   > Tradeoff for AVL:   
   > An AVL tree is more complicated than could have been hoped. But, it is   
   > less complicated than a B-Tree would have been. While a case could have   
   > been made for using linear search (like FAT), it seemed like AVL was   
   > probably worth the complexity (does mean directory lookups are roughly   
   > log2 N).   
   >   
   > By my current estimates, likely the break-even point between an AVL tree   
   > and a B-Tree (with k=16) would likely not be reach until around 1000   
   > files, which is well over the size of most directories. If one could   
   > argue that most directories were smaller than around 8-16 files, a case   
   > could have been made for linear search; but AVL doesn't really add much   
   > cost beyond some additional code complexity.   
   >   
   >   
   > Where, each directory entry encodes:   
   > A name field;   
   > The inode number;   
   > Left, Right, and Parent links;   
   > The Z depth of this node;   
   > Directory Entry Type.   
   >   
   > Initially, there was no parent-link in the tree, but the lack of a   
   > parent link added significant complexity to tasks like walking the   
   > directory or rebalancing nodes (it was necessary to keep track of this   
   > externally), so I ended up adding a link. Though, at present this   
   > reduces the maximum theoretical directory size to 2M files (unlikely to   
   > be an issue).   
   >   
   > Note that the filesystem is case-sensitive and assumes UTF-8 for   
   > filenames (with some wonk for names longer than 48 bytes).   
   >   
   >   
   > Block management within inodes:   
   > Uses a scheme similar to EXT2, namely (with 32-bit block numbers):   
   > 16 entries, direct block number (16KB with 1K blocks)   
   > 8 entries, indirect block (2MB )   
   > 4 entries, double indirect block (256MB)   
   > 2 entries, triple indirect block (16GB )   
   > 1 entries, quad indirect block (4TB )   
   > 1 entries, penta indirect block (1PB )   
   > For larger volumes and compressed files, 64 bit numbers are used:   
   > 8 entries, direct block number (8K / 128K, with 1K or 16K)   
   > 4 entries, indirect block (512K / 8MB )   
   > 2 entries, double indirect block (32MB / 512MB)   
   > 1 entries, triple indirect block (2GB / 32GB )   
   > 1 entries, quad indirect block (256GB / 1TB )   
   >   
   > Blocks and inodes are allocated via bitmaps, which are themselves   
   > assigned inodes (and store their data the same as normal files).   
   >   
   > Where the reducing the number of entries avoids making the inode bigger.   
   > Currently, the design can use 256 or 512 byte inodes.   
   > The former layout with 64b entries requires a 512B inode.   
   >   
   > Had considered spans tables, but didn't seem worth it (though   
   > conceptually simpler, spans are more complicated to deal with in terms   
   > of code complexity).   
   >   
   > For the 64-bit block numbers, with compressed files, the high-order bits   
   > are used for some additional metadata (such as how this logical block of   
   > the file is stored, and how many disk-blocks were used).   
   >   
   >   
   > For file compression:   
   > A larger logical block size is used in this case:   
   > Say, 16K/32K/64K vs 1K/2K/4K.   
   > Currently I am using 16K and 1K.   
   > This larger block is compressed,   
   > and stored as a variable number of disk blocks.   
   > Blocks are decompressed when loaded into a block-cache;   
   > And, re-compressed when evicted (and dirty).   
   > Underlying blocks are dynamically reallocated as needed.   
   > Currently, using a non-entropy-coded LZ77 variant.   
   > Say, along vaguely similar lines to LZ4.   
   >   
   >   
   > Currently, with all of this, code size is still moderately smaller than   
   > my existing FAT32 driver... Granted, this isn't saying much.   
   >   
   > Any thoughts?...   
   >   
      
   Another useful feature to add could be a Reed-Solomon encoding   
   scheme for error-detection and recovery making it more resilient.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|