... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
alt.os.development
Operating system development chatter
4,255 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 3,039 of 4,255
BGB to James Harris
Re: Format for the OS image (1/2)
14 Jan 22 13:26:02
   From: cr88192@gmail.com   
      
   On 1/5/2022 9:40 AM, James Harris wrote:   
   >   
   > After a long absence from OS development I recently returned to it - and   
   > it feels great to be doing this stuff again!!! The reason for the   
   > absence (other than life!) was that I was developing a language to write   
   > the OS in.   
   >   
      
   And, me just randomly poking in this group, not been around for a while.   
      
      
   > Well, I now have a working compiler and a language which, while it is   
   > currently primitive, is usable.   
   >   
      
   C is pretty hard to beat, IMO.   
      
      
   I have my own languages as well, but C is pretty hard to beat for   
   low-level tasks even when one has the ability to write their own   
   compilers (if anything, it gives more insight into why C is the way it is).   
      
   Well, and also why a few of the "less well received" C99 features, such   
   as VLAs are, in practice, "kinda awful" (one can implement support for   
   them, but making them "not suck" is harder than what might be otherwise   
   implied).   
      
      
      
   > At this point it seems to me that there's an opportunity for a win-win.   
   > If I use the language to work on the OS then that will let me make   
   > progress on the OS while at the same time using the experience to   
   > provide useful feedback on how the language should develop.   
   >   
   > So that's what I plan to do, and the above is background to the query of   
   > this post, which is:   
   >   
   >     What formats of image file are best for the OS itself?   
   >   
   > My compiler currently emits x86 32-bit code (and its output is readily   
   > linkable with other code which can be written in 32-bit assembly) so   
   > pmode is my target. I have enough 16-bit asm code to load the bytes of   
   > an image and switch to pmode but the next problem is what format the   
   > 32-bit image should have. AISI the options are:   
   >   
   > 1. Flat binary   
   >   
   > A 32-bit flat binary would be easy to invoke as I could just jump to its   
   > first byte. It would not be relocatable but it looks as though I could   
   > change my compiler so that as long as I avoid globals I can emit   
   > position-independent code - which could be handy! But I am not sure how   
   > to create a 32-bit flat binary. My copy of ld doesn't seem to support   
   > such an output, though maybe there's a way to persuade it.   
   >   
   > 2. Elf or PE   
   >   
   > Elf and PE have the opposite problem. Either of them should be easy to   
   > create but how would one invoke the image? Options:   
   >   
   > 2a. Extract the executable part (how?) for inclusion in the loadable image.   
   >   
   > 2b. Include the whole executable file, including the headers, and write   
   > some asm code to parse the headers and jump to the executable part of it.   
   >   
   > Or maybe there's another option. I've a feeling we've discussed this   
   > before but at the moment I cannot think of what we concluded. Plus, I   
   > need to work with what my compiler can produce (32-bit Nasm) which may   
   > be a new constraint.   
   >   
   > So, any thoughts on what format an OS image should have?   
   >   
      
   In one of my own projects, which has occupied much of the last several   
   years of my life, is a custom CPU ISA (mostly used of FPGA boards thus   
   far...).   
      
      
   For this, I mostly went with a tweaked PE/COFF variant:   
   * Omits the 'MZ' stub, as it is basically useless in this case.   
   ** Typically the file starts at a 'PE\0\0' or 'PEL4' magic or similar.   
   ** The MZ stub is disallowed entirely in the LZ compressed variants.   
   ** The MZ stub may be present for uncompressed 'PE\0\0' files.   
   * Optional (per-image) LZ compression (typically LZ4 in this case).   
   ** The LZ decoding is integrated with reading the image off the SDcard.   
   * Adds an RVA==Offset restriction.   
   * ...   
      
   The addition of an RVA==Offset restriction means that it is possible to   
   essentially just read (or unpack) the EXE into its target address and   
   then jump to its entry point. Though, my loader also zeroes the ".bss"   
   section and similar. Without the restriction, it would be necessary to   
   first read the binary to an intermediate location and then copy its   
   sections to their destination addresses.   
      
   Though, if using a generic linker which does not follow this rule, one   
   would need to first read into a buffer and then copy out the sections   
   (or do "seek and read" for each section if one has a "proper" filesystem   
   driver).   
      
      
   For programs within the OS, the same basic format was used, except that   
   the ABI splits the binary into two separate areas:   
   * One for '.text' and other read-only sections.   
   * An area for '.data' and '.bss' and similar.   
      
   The modifiable sections would then be addressed relative to a "Global   
   Register" (oddly enough, PE/COFF already had the fields for this; albeit   
   they were unused for x86/x64, mostly intended for MIPS and similar).   
      
   This allows multiple logical instances of the same program within a   
   single address space (without also needing multiple instances of the   
   ".text" section). Implicitly, the ".data" section points to a table to   
   allow the main EXE (and any DLLs) to reload its own data sections   
   (typically needed for DLL exports and calls via function pointers, which   
   may not necessarily have the correct data section in the global register   
   on entry to the function).   
      
      
   Base relocations could be performed easily enough, but are N/A for   
   loading up an OS image. The image needs to have its base set to its   
   starting address.   
   * In my case, this is generally 01100000 (or 17MB)   
   * This is 1MB past the start of DRAM, 01000000 (16MB)   
   ** The first 1MB of DRAM is generally reserved for stacks and similar.   
      
   Base relocations are typically applied (once) when loading up program   
   binaries and DLLs though. These fix up the binary both for the load   
   address, and also its index into the table used for reloading the global   
   pointer. The base reloc format is basically the same as in normal   
   PE/COFF, with a few minor tweaks and extensions.   
      
      
   Addresses are generally:   
   * 00000000..0000FFFF: ROM, Boot SRAM   
   * 00010000..000FFFFF: Special Hardware Pages (Fixed Contents).   
   ** There is a page of 00 bytes, a page of NOPs, BREAK, RTS, ...   
   ** These are partly intended for use by virtual memory and similar.   
   * 01000000..7FFFFFFF: DRAM Range (RAM may wrap within this space).   
   * 80000000..EFFFFFFF: Reserved for now   
   * F0000000..FFFFFFFF: MMIO (Low Range)   
      
   Ranges above the 4GB mark also exist (47:32):   
   * 0001..7FFF: Virtual Address Space   
   ** Virtual memory generally goes in this range.   
   ** Stuff below 4GB being physically mapped.   
      
      
   In the case of boot-loading, the PE/COFF image is treated as (more or   
   less) functionally equivalent to a flat binary, just with the entry   
   point pulled from the PE/COFF header.   
      
   For the optional LZ scheme:   
   * The image is compressed in terms of 1K blocks, starting at 1K.   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]