... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
alt.os.development
Operating system development chatter
4,255 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 4,192 of 4,255
BGB-Alt to Waldek Hebisch
Re: Misc: experimental filesystem DFS2 (
26 Dec 24 16:37:47
   From: bohannonindustriesllc@gmail.com   
      
   On 12/26/2024 3:46 PM, Waldek Hebisch wrote:   
   > BGB  wrote:   
   >> Granted, this group isn't very active, but it is one of the few groups   
   >> where this sort of thing is on-topic, so...   
   >>   
   >> So, for my project I have made a new experimental filesystem I am   
   >> calling DFS2.   
   >>   
   >> Why?...   
   >>    The existing filesystems didn't really fit what I wanted.   
   >>   
   >> Namely:   
   >>    Can support Unix style features;   
   >>      File ownership;   
   >>      Symbolic links;   
   >>      ...   
   >>    Not excessively complicated;   
   >>    Needs to be relatively low overhead.   
   >>      Neither RAM nor CPU time are abundant.   
   >>      Target is 50MHz with RAM measured in MB.   
   >>    I also wanted things like file compression.   
   >>      But, more for IO optimization than saving space.   
   >>   
   >> Currently I am using FAT32:   
   >>    Does not natively support the desired features;   
   >>    FAT chain walking and large clusters are not efficient to deal with;   
   >>    No built-in compression.   
   >>   
   >> A few existing options:   
   >>    Minix:   
   >>      Too limited, missing features.   
   >>    EXT2:   
   >>      Not 1:1 with wanted features;   
   >>      Some aspects of the design seem annoying.   
   >>    NTFS:   
   >>      Significantly over complicated.   
   >>    Most other Linux filesystems:   
   >>      Tend to have a problem of most being over-complicated;   
   >>      Most are trying to "aim higher" than EXT2.   
   >>        Or, are specialized for specific cases, like "intitrd".   
   >>   
   >>   
   >>   
   >> General design I went with:   
   >>    Uses an inode table (kinda like EXT2)   
   >>      Though, the table is itself represented with an inode (pros/cons);   
   >>      Superblock gives the address of the inode table   
   >>        with inode 0 as the inode-table inode.   
   >>      Inode structure consists of multiple tagged members.   
   >>        Sorta like the NTFS MFT entries.   
   >>    Fixed size directory entries   
   >>      They are 64 byte with a 48 byte name field;   
   >>      Multiple entries are used for longer names (sorta like FAT).   
   >>        Though, this should be infrequent,   
   >>          as names longer than 48 bytes are rare.   
   >>      Directory entries are organized into an AVL tree.   
   >>   
   >> Tradeoff for AVL:   
   >> An AVL tree is more complicated than could have been hoped. But, it is   
   >> less complicated than a B-Tree would have been. While a case could have   
   >> been made for using linear search (like FAT), it seemed like AVL was   
   >> probably worth the complexity (does mean directory lookups are roughly   
   >> log2 N).   
   >>   
   >> By my current estimates, likely the break-even point between an AVL tree   
   >> and a B-Tree (with k=16) would likely not be reach until around 1000   
   >> files, which is well over the size of most directories. If one could   
   >> argue that most directories were smaller than around 8-16 files, a case   
   >> could have been made for linear search; but AVL doesn't really add much   
   >> cost beyond some additional code complexity.   
   >>   
   >>   
   >> Where, each directory entry encodes:   
   >>    A name field;   
   >>    The inode number;   
   >>    Left, Right, and Parent links;   
   >>    The Z depth of this node;   
   >>    Directory Entry Type.   
   >>   
   >> Initially, there was no parent-link in the tree, but the lack of a   
   >> parent link added significant complexity to tasks like walking the   
   >> directory or rebalancing nodes (it was necessary to keep track of this   
   >> externally), so I ended up adding a link. Though, at present this   
   >> reduces the maximum theoretical directory size to 2M files (unlikely to   
   >> be an issue).   
   >>   
   >> Note that the filesystem is case-sensitive and assumes UTF-8 for   
   >> filenames (with some wonk for names longer than 48 bytes).   
   >>   
   >>   
   >> Block management within inodes:   
   >>    Uses a scheme similar to EXT2, namely (with 32-bit block numbers):   
   >>      16 entries, direct block number   (16KB with 1K blocks)   
   >>      8 entries, indirect block         (2MB  )   
   >>      4 entries, double indirect block  (256MB)   
   >>      2 entries, triple indirect block  (16GB )   
   >>      1 entries, quad indirect block    (4TB  )   
   >>      1 entries, penta indirect block   (1PB  )   
   >> For larger volumes and compressed files, 64 bit numbers are used:   
   >>      8 entries, direct block number    (8K    / 128K, with 1K or 16K)   
   >>      4 entries, indirect block         (512K  / 8MB  )   
   >>      2 entries, double indirect block  (32MB  / 512MB)   
   >>      1 entries, triple indirect block  (2GB   / 32GB )   
   >>      1 entries, quad indirect block    (256GB / 1TB  )   
   >>   
   >> Blocks and inodes are allocated via bitmaps, which are themselves   
   >> assigned inodes (and store their data the same as normal files).   
   >>   
   >> Where the reducing the number of entries avoids making the inode bigger.   
   >>    Currently, the design can use 256 or 512 byte inodes.   
   >>    The former layout with 64b entries requires a 512B inode.   
   >>   
   >> Had considered spans tables, but didn't seem worth it (though   
   >> conceptually simpler, spans are more complicated to deal with in terms   
   >> of code complexity).   
   >>   
   >> For the 64-bit block numbers, with compressed files, the high-order bits   
   >> are used for some additional metadata (such as how this logical block of   
   >> the file is stored, and how many disk-blocks were used).   
   >>   
   >>   
   >> For file compression:   
   >>    A larger logical block size is used in this case:   
   >>      Say, 16K/32K/64K vs 1K/2K/4K.   
   >>      Currently I am using 16K and 1K.   
   >>    This larger block is compressed,   
   >>      and stored as a variable number of disk blocks.   
   >>      Blocks are decompressed when loaded into a block-cache;   
   >>      And, re-compressed when evicted (and dirty).   
   >>        Underlying blocks are dynamically reallocated as needed.   
   >>    Currently, using a non-entropy-coded LZ77 variant.   
   >>      Say, along vaguely similar lines to LZ4.   
   >>   
   >>   
   >> Currently, with all of this, code size is still moderately smaller than   
   >> my existing FAT32 driver... Granted, this isn't saying much.   
   >>   
   >> Any thoughts?...   
   >   
   > For SD cards it is better to put critical data (like bitmaps or   
   > block lists) at start.  IIUC SD card manufactires assume FAT-like   
   > filesystem and make first part (say 4 MB) better.  So speading out   
   > metadata over the whole card means higher chance of loosing   
   > critical data.   
   >   
      
   OK.   
      
   Some of this could likely be tunable when creating the filesystem (in an   
   "mkfs" type tool or similar).   
      
   My code currently initially reserves 16K for the starting inode table,   
   and a few K for the initial bitmaps (though, bigger may be better).   
      
   But, up-front creation/reservation based on the final volume size   
   (rather than allocating them incrementally) is still very much possible.   
      
      
   Say, initial block assignment at present, IIRC:   
      0: Superblock   
      1: Root (first 16 entries)   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]