... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
sci.logic
Logic -- math, philosophy & computationa
262,912 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 261,453 of 262,912
Ross Finlayson to All
Re: Meta: a usenet server just for sci.m
29 Nov 25 03:54:06
   [continued from previous message]   
      
   You're right that cpu-core-memory-device relations are "few or some" -   
   realistically we're talking:   
      
   1-4 CPU packages (sockets)   
   8-128 cores per package   
   2-8 NUMA nodes per socket   
   Dozens of PCIe devices   
      
   The explosion is manageable because modern UEFI/ACPI already provides a   
   canonical enumeration. The "Proximity Map" just freezes that into a DAG   
   at admin-time. GPGPU as "opaque SIMT" is the right call - you treat it   
   as a fixed-latency command queue, the vectorization happens inside the   
   black box.   
   Critical clarification needed on TTASL timing:   
   You said: "Grok presumed 'identical instruction time', here it's instead   
   'equivalent functionality'"   
   This is crucial and I need to push back gently here. If TTASL only   
   guarantees "equivalent functionality" but not bounded timing, you've   
   lost the entire O(1) guarantee. Here's the issue:   
   # On x86-64 with AVX-512:   
   VA ADD v0, v1, v2 # 1 cycle, 512 bits   
      
   # On RISC-V without vector extension (SWAR):   
   AA ADD r0, r2, r4 # 1 cycle, 64 bits   
   AA ADD r1, r3, r5 # 1 cycle, 64 bits   
   # ... (8 iterations for 512 bits) # 8 cycles total   
      
   If the TTASL template doesn't account for this, you have 8x timing   
   variance, which destroys determinism.   
   Proposed resolution: The TTASL template must specify tranche budget in   
   cycles, not operations. The template for "512-bit vector add" reserves 8   
   cycles on SWAR, 1 cycle on native SIMD, but both consume 8 cycles of   
   tranche time. The SIMD version just yields early. This keeps the   
   scheduler deterministic.   
   Does that align with your intent?   
   On "Least Instrumentation" - needs precision:   
   You wrote: "branchless nops" ... "samples flow systolically into   
   fixed-size rotating buffers"   
   Let me restate to confirm understanding:   
      
   // Every O(1) re-routine has embedded instrumentation:   
   template tcp_rx_step core_id, device_id, micc, outcome {   
   // ... actual work here ...   
      
   // Instrumentation (branchless):   
   m_load tmp0, [trace_ring_base + (micc & mask)]   
   m_store outcome_flags, [tmp0 + offset_flags]   
   m_store device_id, [tmp0 + offset_device]   
   m_store timestamp, [tmp0 + offset_when]   
   // ^ These writes either hit L1 cache (live trace)   
   // or dummy address (suppressed trace)   
   // Same cycle count either way   
   }   
   The "farming" happens in P3: a low-priority daemon reads the ring   
   buffers (which are mmap'd to userland), computes autocorrelation, feeds   
   perceptron, updates policy vectors. The perceptron output is a binary   
   classification (usual/unusual) that gets written back to a policy table   
   the kernel consults via O(1) lookup.   
   Throttling question: When you say "routines throttle by processing their   
   samples" - do you mean:   
      
   (A) The P3 inference daemon skips samples when overloaded (lossy but   
   bounded)   
   (B) The P0 kernel reduces sample rate when buffers fill (changes   
   instrumentation overhead)   
      
   I assume (A), because (B) would violate constant-rate. Confirm?   
   On "Generic Universal Drivers":   
   You're relying on UEFI/ACPI's self-description (BAR addresses,   
   capability registers, MSI-X vectors). This works for:   
      
   PCIe devices with standard class codes (NVMe, AHCI, NIC with known DID/VID)   
   USB with standard descriptors (HID, mass storage, CDC)   
      
   Undefined edge case: What happens when a device lies? E.g., claims to be   
   class 0x0200 (Ethernet) but actually requires vendor-specific init sequence?   
   Resolution options:   
      
   Strict whitelist: Only boot if all devices have verified-good firmware   
   (catalog of known SHA256 hashes)   
   Fail-safe degradation: Unknown device gets a minimal "dump to slique"   
   driver, P3 handles the rest   
   Refuse to boot: Any unrecognized device triggers re-run of Invariant Builder   
      
   I assume #2 aligns with your "nominally adversarial" model?   
   On the Scheduler as Executive:   
   You wrote: "what results getting surfaced to processes as usually   
   interacting in user-land, is in effect what could result a monolithic unit"   
   Unpacking: The P0 kernel appears as a deterministic oracle to P3   
   processes. From userland's view:   
      
   Submit M-ICC-tagged request to slique → deterministic completion within   
   bounded time   
   The internal scheduler (Event Mask + Rendezvous + Tranches) is invisible   
   Multiple P3 processes appear to execute concurrently, but actually it's   
   pyramidal parallelism: many actors, one systolic pump   
      
   The "monolithic" aspect: unlike microkernel message-passing, there's no   
   IPC overhead - just DMA to/from sliques. The kernel is monolithic in the   
   sense of "one pre-compiled blob," but it's architecturally clean because   
   P0/P3 boundary is absolute.   
   Critical terms needing definition:   
      
   Tranche quantum: You've said "fixed-size time slices" but never   
   specified: 1000 cycles? 10μs? This must be baked into the invariant.   
   Slique capacity bounds: Fixed at admin-time per device class? E.g., NIC   
   RX ring = 2048 slots × 4KB = 8MB?   
   M-ICC overflow policy: 128-bit counter at 4GHz takes 10^20 years to   
   wrap. But if you hibernate for a century and resume, do you reset or   
   continue? (Assuming continue, but needs statement)   
   RESEAT-MEMORY triggering threshold: You mentioned "37% fragmentation" -   
   is this a tunable policy or a hard invariant?   
   Inference Engine P3 priority: Is it a fixed-priority daemon above or   
   below normal workloads?   
      
   On Arithmetization/Geometrization:   
   You're describing the inference engine's internals. Let me confirm:   
      
   Arithmetization: Fixed-point Q15.16 computations for autocorrelation   
   (Fourier-like frequency analysis of trace timestamps)   
   Algebraization: Linear models (GLM) mapping resource states to predicted   
   outcomes   
   Geometrization: PCA to reduce high-dimensional trace space   
   (core×device×process) into 2D/3D "affinity vectors"   
      
   The output is a policy vector (e.g., 256-bit bitmask) that the O(1)   
   scheduler consults: "Should process P migrate to core C?" → O(1) bit test.   
   This is elegant but needs bounds: max dimension of the PCA space, max   
   number of perceptrons, max reward/release update rate. Otherwise P3   
   inference could consume unbounded memory/time.   
   Proposed hard limits (for discussion):   
      
   PCA: max 64 principal components   
   Perceptrons: max 1024 binary classifiers   
   Policy vector update: max 1Hz (amortized over tranches)   
   Autocorrelation window: fixed 1M samples (ring buffer)   
      
   Final question on "nominally adversarial":   
   You've described actors as "co-operating if nominally adversarial." In   
   practice:   
      
   Bus actors (devices): Adversarial - can send spurious interrupts,   
   violate timing (hence idempotency in re-routine)   
   P3 processes: Co-operating - assume they follow quota/tranche rules   
   (enforced by MMU/tranche timer)   
   P3 inference engine: Trusted - part of the base system image, signed at   
   admin-time   
      
   Is this the correct trust model? Or do you want P3 processes to be   
   adversarial too (requiring proof-of-work for slique access)?   
      
   Summary of what needs tightening:   
      
   TTASL timing: equivalent functionality OR bounded cycles (I strongly   
   advocate bounded cycles)   
   Instrumentation throttling: lossy sampling or backpressure?   
   Unknown device policy: whitelist, degrade, or refuse?   
   Concrete tranche quantum (cycles or time)   
   Slique sizing strategy   
   Inference engine resource bounds   
   Trust model for P3 workloads   
      
   These aren't criticisms - they're the natural questions that arise when   
   moving from architecture to implementation. The core design is sound;   
   these are just the parameters that must be frozen before writing the   
   first TTASL template.   
   Shall we nail down these specifics?   
      
      
      
      
      
      
      
      
   Thanks Claude. About the differences between architectures in timings,   
   that what "is what it is". So, the interrupt service routine involved   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]