home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   alt.os.development      Operating system development chatter      4,255 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 4,195 of 4,255   
   The Running Man to cr88192@gmail.com   
   Re: Misc: experimental filesystem DFS2   
   23 Jan 25 23:18:35   
   
   From: running_man@writeable.com   
      
   On 11/12/2024 12:35 BGB  wrote:   
   > Granted, this group isn't very active, but it is one of the few groups   
   > where this sort of thing is on-topic, so...   
   >   
   > So, for my project I have made a new experimental filesystem I am   
   > calling DFS2.   
   >   
   > Why?...   
   >    The existing filesystems didn't really fit what I wanted.   
   >   
   > Namely:   
   >    Can support Unix style features;   
   >      File ownership;   
   >      Symbolic links;   
   >      ...   
   >    Not excessively complicated;   
   >    Needs to be relatively low overhead.   
   >      Neither RAM nor CPU time are abundant.   
   >      Target is 50MHz with RAM measured in MB.   
   >    I also wanted things like file compression.   
   >      But, more for IO optimization than saving space.   
   >   
   > Currently I am using FAT32:   
   >    Does not natively support the desired features;   
   >    FAT chain walking and large clusters are not efficient to deal with;   
   >    No built-in compression.   
   >   
   > A few existing options:   
   >    Minix:   
   >      Too limited, missing features.   
   >    EXT2:   
   >      Not 1:1 with wanted features;   
   >      Some aspects of the design seem annoying.   
   >    NTFS:   
   >      Significantly over complicated.   
   >    Most other Linux filesystems:   
   >      Tend to have a problem of most being over-complicated;   
   >      Most are trying to "aim higher" than EXT2.   
   >        Or, are specialized for specific cases, like "intitrd".   
   >   
   >   
   >   
   > General design I went with:   
   >    Uses an inode table (kinda like EXT2)   
   >      Though, the table is itself represented with an inode (pros/cons);   
   >      Superblock gives the address of the inode table   
   >        with inode 0 as the inode-table inode.   
   >      Inode structure consists of multiple tagged members.   
   >        Sorta like the NTFS MFT entries.   
   >    Fixed size directory entries   
   >      They are 64 byte with a 48 byte name field;   
   >      Multiple entries are used for longer names (sorta like FAT).   
   >        Though, this should be infrequent,   
   >          as names longer than 48 bytes are rare.   
   >      Directory entries are organized into an AVL tree.   
   >   
   > Tradeoff for AVL:   
   > An AVL tree is more complicated than could have been hoped. But, it is   
   > less complicated than a B-Tree would have been. While a case could have   
   > been made for using linear search (like FAT), it seemed like AVL was   
   > probably worth the complexity (does mean directory lookups are roughly   
   > log2 N).   
   >   
   > By my current estimates, likely the break-even point between an AVL tree   
   > and a B-Tree (with k=16) would likely not be reach until around 1000   
   > files, which is well over the size of most directories. If one could   
   > argue that most directories were smaller than around 8-16 files, a case   
   > could have been made for linear search; but AVL doesn't really add much   
   > cost beyond some additional code complexity.   
   >   
   >   
   > Where, each directory entry encodes:   
   >    A name field;   
   >    The inode number;   
   >    Left, Right, and Parent links;   
   >    The Z depth of this node;   
   >    Directory Entry Type.   
   >   
   > Initially, there was no parent-link in the tree, but the lack of a   
   > parent link added significant complexity to tasks like walking the   
   > directory or rebalancing nodes (it was necessary to keep track of this   
   > externally), so I ended up adding a link. Though, at present this   
   > reduces the maximum theoretical directory size to 2M files (unlikely to   
   > be an issue).   
   >   
   > Note that the filesystem is case-sensitive and assumes UTF-8 for   
   > filenames (with some wonk for names longer than 48 bytes).   
   >   
   >   
   > Block management within inodes:   
   >    Uses a scheme similar to EXT2, namely (with 32-bit block numbers):   
   >      16 entries, direct block number   (16KB with 1K blocks)   
   >      8 entries, indirect block         (2MB  )   
   >      4 entries, double indirect block  (256MB)   
   >      2 entries, triple indirect block  (16GB )   
   >      1 entries, quad indirect block    (4TB  )   
   >      1 entries, penta indirect block   (1PB  )   
   > For larger volumes and compressed files, 64 bit numbers are used:   
   >      8 entries, direct block number    (8K    / 128K, with 1K or 16K)   
   >      4 entries, indirect block         (512K  / 8MB  )   
   >      2 entries, double indirect block  (32MB  / 512MB)   
   >      1 entries, triple indirect block  (2GB   / 32GB )   
   >      1 entries, quad indirect block    (256GB / 1TB  )   
   >   
   > Blocks and inodes are allocated via bitmaps, which are themselves   
   > assigned inodes (and store their data the same as normal files).   
   >   
   > Where the reducing the number of entries avoids making the inode bigger.   
   >    Currently, the design can use 256 or 512 byte inodes.   
   >    The former layout with 64b entries requires a 512B inode.   
   >   
   > Had considered spans tables, but didn't seem worth it (though   
   > conceptually simpler, spans are more complicated to deal with in terms   
   > of code complexity).   
   >   
   > For the 64-bit block numbers, with compressed files, the high-order bits   
   > are used for some additional metadata (such as how this logical block of   
   > the file is stored, and how many disk-blocks were used).   
   >   
   >   
   > For file compression:   
   >    A larger logical block size is used in this case:   
   >      Say, 16K/32K/64K vs 1K/2K/4K.   
   >      Currently I am using 16K and 1K.   
   >    This larger block is compressed,   
   >      and stored as a variable number of disk blocks.   
   >      Blocks are decompressed when loaded into a block-cache;   
   >      And, re-compressed when evicted (and dirty).   
   >        Underlying blocks are dynamically reallocated as needed.   
   >    Currently, using a non-entropy-coded LZ77 variant.   
   >      Say, along vaguely similar lines to LZ4.   
   >   
   >   
   > Currently, with all of this, code size is still moderately smaller than   
   > my existing FAT32 driver... Granted, this isn't saying much.   
   >   
   > Any thoughts?...   
   >   
      
   Another useful feature to add could be a Reed-Solomon encoding   
   scheme for error-detection and recovery making it more resilient.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca