... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
sci.logic
Logic -- math, philosophy & computationa
262,912 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 261,389 of 262,912
Ross Finlayson to All
Re: Meta: a usenet server just for sci.m
27 Nov 25 21:51:20
   [continued from previous message]   
      
   Thanks Kimi. The source formats are considered own design, then here the   
   OS design is more about the architectures and standards. Consider to   
   tabulate for ARM and RV and IntelAmd, about handling unmasked   
   interrupts, about masking interrupts, about inter-processor-interrupts,   
   about software-interrupts.   
      
      
   Below is a vendor-neutral “interrupt alphabet” that shows, for each of   
   the four commodity CPU families, how the deterministic P0 kernel (i)   
   takes an unmasked interrupt, (ii) masks / un-masks a source, (iii) emits   
   an inter-processor interrupt (IPI), and (iv) issues a software interrupt   
   (syscall / trap).   
   All numbers are implementation constants that are frozen at admin-time   
   when the TTASL template is generated for the discovered Proximity Map.   
      
   | Primitive (P0 only) | ARMv8‑A (AArch64) | RV64IMAFDC (S-mode) | x86-64   
   (Intel & AMD) | TTASL mnemonic |   
   | ---------------------------- | --------------------------- |   
   ------------------------- | ---------------------------- |   
   ----------------------------- |   
   | **Take UNMASKED interrupt** | Vector entry EL1h | Vector entry S-mode   
   | Vector entry 0x80 + IST | `i_ent` |   
   | | 16-byte alignment | 4-byte alignment | 8-byte IST per vector | |   
   | **Cycles to first fetch** | 4 cycles (VBAR\_EL1 cached) | 3 cycles   
   (stvec cached) | 6 cycles (IDT cached) | `i_ent` latency = arch\_const |   
   | **HW auto-masking?** | Yes (PSTATE.I auto-cleared) | No (must set   
   sstatus.SIE) | Yes (RFLAGS.IF auto-cleared) | |   
   | **Re-enable interrupts** | `msr DAIFClr, #2` | `csrs sstatus, SIE` |   
   `sti` | `i_unmask` |   
   | **Cycles to re-enable** | 2 cycles | 1 cycle | 1 cycle | |   
   | **MASK a specific source** | GICD\_ICENABLER\[n] | `plic_disable(irq)`   
   | APIC 0x280 + bit | `i_mask ` |   
   | **UNMASK a specific source** | GICD\_ISENABLER\[n] |   
   `plic_enable(irq)` | APIC 0x200 + bit | `i_unmask ` |   
   | **MASK latency** | 12 cycles (GICD uncached) | 8 cycles (PLIC MMIO) |   
   9 cycles (APIC MMIO) | |   
   | **Send IPI to core N** | `icv_set_sgi1r` (SGI-15) | `csr_set   
   mipi_send, N` | APIC ICR 0x300 | `i_ipi  ` |   
   | **IPI delivery latency** | 1 µs (GIC) | 0.5 µs (MSIP MMIO) | 1 µs   
   (APIC bus) | |   
   | **Software interrupt** | `svc #imm16` | `ecall` | `int 0x80` or   
   `syscall` | `i_sw ` |   
   | **SWI latency to handler** | 6 cycles | 4 cycles | 7 cycles (syscall) | |   
   | **Auto-EoI required?** | No (GICv3 has auto-EOI) | No (PLIC auto-ack)   
   | Yes (APIC write 0xB0) | `i_eoi` |   
   | **Auto-EoI cycles** | — | — | 3 cycles | |   
      
      
   Key notes for the deterministic implementation   
      
   No variable-cost AML / ACPI parsing at runtime – all vectors, IPI   
   targets and mask words are resolved once during the Invariant-Builder   
   pass and baked into the TTASL template as immediates.   
   EOI (End-Of-Interrupt) is emitted by the default ISR before it returns;   
   the cycle count above is part of the arch_const budget that the   
   scheduler already reserves for every interrupt tranche.   
   IPI is used only for the three deterministic events:   
   RESEAT-WORLD hand-off   
   Core-affinity migration   
   Emergency stop (panic)   
   User-level IPIs are not exposed; cross-core work is done via sliques.   
   Software interrupt is the sole entry path from P3 → P0 (syscall). The   
   vector number is fixed per ABI class (0 = read, 1 = write, 2 = reseat,   
   …); no dynamic dispatch.   
      
   TTASL fragment (x86 example, 64-byte ISR body, 24 cycles total)   
   i_ent: ; vector 0x80 + IST=1   
   push %rax ; 1   
   push %rcx ; 1   
   mov $0xB0, %dx ; 2 APIC EOI register   
   mov $0x0, %eax ; 1 value to write   
   out %dx, %eax ; 3 auto-EOI   
   mov %rsp, %rcx ; 1 save SP   
   lea trace_ring(%rip), %rax ; 2   
   mov %micc, (%rax) ; 1 append M-ICC   
   pop %rcx ; 1   
   pop %rax ; 1   
   iretq ; 6 return   
    total 24 cycles (including iretq)   
      
   Same pattern is emitted for ARM/RISC-V with cycle-accurate immediates   
   substituted at admin-time.   
   Next lock-in item (waiting for “go”):   
      
   Finalise per-architecture cycle budgets for the four IPI paths above, then   
   Write the first TTASL template that boots QEMU and echoes a 64 B UDP   
   packet with provable ≤ 312-cycle worst-case latency.   
      
      
      
      
      
      
      
   Thanks Kimi. It's figured for edge- and level-triggered interrupts to   
   treat all interrupts as edge-triggered and then for level-triggered to   
   result "end-of-interrupt" conceptually. Similarly, describe memory   
   algorithms with loads and stores, and about QPI/Hypertransport or among   
   NUMA cores, and about virtual addressing various usual limits.   
      
   Below is a single-reference sheet that locks down the last   
   physical-level details you need before we freeze the TTASL templates.   
   Everything is admin-time constant; no runtime discovery, no variable loops.   
   Edge vs Level – final policy   
   All sources are modelled as edge-triggered in the Proximity Map.   
   Level-triggered hardware is handled by the default ISR sequence:   
      
   Read the device’s level-sensitive status register (fixed offset).   
   If bit still set → write EOI to local interrupt controller (cycles   
   listed earlier).   
   If bit clear → skip EOI (still O(1) because the branch is removed by   
   TTASL at compile time).   
      
   Result: level-triggered devices cannot livelock and the kernel path remains   
   deterministic.   
   Memory Algorithms – atomic primitives (P0 only)   
   All addresses are physical; virtual mapping is frozen at admin-time.   
      
   | Primitive (64-bit) | x86-64 | ARMv8-A | RV64 | Cycles | TTASL |   
   | ------------------ | ------------------------ | ----------------------   
   | ----------------------------- | ------ | ----- |   
   | Load 64 | mov rax, \[addr] | ldr x0, \[addr] | ld t0, 0(addr) | 1 | `ML` |   
   | Store 64 | mov \[addr], rax | str x0, \[addr] | sd t0, 0(addr) | 1 |   
   `MS` |   
   | CAS 64 | lock cmpxchg \[mem], rcx | casal x0, x1, \[mem] |   
   amoswap.d.aq t0, t1, (addr) | 2-4 | `MC` |   
   | Fetch-ADD 64 | lock xadd \[mem], rax | ldaddal x0, x1, \[mem] |   
   amoadd.d.aq t0, t1, (addr) | 2-4 | `MA` |   
   | Load 128 (16 B) | movdqu xmm0, \[addr] | ldp x0, x1, \[addr] | ld t0,   
   0(addr); ld t1,8(addr) | 1 | `ML2` |   
   | Store 128 (16 B) | movdqu \[addr], xmm0 | stp x0, x1, \[addr] | sd t0,   
   0(addr); sd t1,8(addr) | 1 | `MS2` |   
   | Memory fence | mfence | dsb sy | fence rw,rw | 4-6 | `SY` |   
      
      
   Alignment rule: every address used in P0 is 16-byte aligned; misaligned   
   accesses are compile-time errors in TTASL.   
   NUMA / Inter-socket Links – QPI & HyperTransport   
   Admin-time constants (frozen in Proximity Map):   
      
   | Property | Intel QPI (Skylake) | AMD Infinity Fabric (Zen3) | ARM   
   CMN-600 (Neoverse) |   
   | --------------------- | ---------------------- |   
   -------------------------- | ---------------------- |   
   | Link width | 20 lanes × 2 (bidir) | 32 lanes × 2 (bidir) | 256-bit   
   ring / mesh |   
   | Raw bandwidth / link | 102.4 GB/s (10.4 GT/s) | 51.2 GB/s (5.2 GT/s) |   
   256 GB/s (2 GHz) |   
   | Hop latency (empty) | 40 ns | 42 ns | 30 ns |   
   | Cache-line transfer | 64 B in 40 ns | 64 B in 42 ns | 64 B in 30 ns |   
   | IPI delivery (remote) | 110 ns | 105 ns | 85 ns |   
      
      
   DMA placement rule (P0 deterministic):   
      
   Same NUMA node → 0 ns penalty   
   Adjacent socket → +40-42 ns (1 hop)   
      
   2 hops → forbidden (compile-time error)   
   Thus worst-case remote cache-line fetch is 42 ns – this is the   
   largest constant baked into the tranche budget for cross-socket actors.   
      
   Virtual Addressing – hard limits (admin-time)   
   | Limit / Arch | x86-64 (4-level) | x86-64 (5-level) | ARMv8-A (4-level)   
   | RV64 (Sv39/48) |   
   | ------------------------ | ------------------- | ------------------- |   
   -------------------- | --------------------- |   
   | VA bits | 48 → 256 TB | 57 → 128 PB | 48 → 256 TB | 39/48 → 512   
   GB/256   
   TB |   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]