... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
sci.logic
Logic -- math, philosophy & computationa
262,912 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 261,468 of 262,912
Ross Finlayson to All
Re: Meta: a usenet server just for sci.m
29 Nov 25 03:54:06
   [continued from previous message]   
      
   volatile uint32_t* tx_desc = (void*)I225_TX_DESC_BASE;   
   tx_desc[0] = p.dma_addr;   
   tx_desc[1] = p.len | TX_DESC_CMD_EOP;   
      
   // Ring doorbell   
   *(volatile uint32_t*)I225_TX_TAIL = (tx_tail + 1) % TX_RING_SIZE;   
   }   
   ```   
      
   The **abstract simulator** uses `MockNICOps`; the **real kernel** uses   
   `i225_ops`. Same re-routine code runs on both.   
      
   ---   
      
   ## Buffers, Back-Pressure, and the Zero-Size Ideal   
      
   ### The Systolic Buffer Model   
      
   You've stated the ideal: **every connection has a buffer of size zero**   
   (direct wire, no storage). This is the pure **dataflow** model from   
   electrical engineering:   
   ```   
   Producer → [0-size buffer] → Consumer   
      
   If consumer isn't ready, producer stalls (back-pressure).   
   No buffering = no possibility of overflow.   
      
   Reality: Zero-size buffers require perfect synchronization (producer and   
   consumer must execute in lockstep). This is impossible in an   
   asynchronous system with:   
      
   Variable network latency   
   Disk seek times   
   Interrupt coalescing   
   Multi-core cache coherence delays   
      
   The compromise: Introduce bounded buffers but make bounds explicit and   
   finite:   
      
   struct Slique {   
   Node* nodes[MAX_SLIQUE_DEPTH]; // Fixed at admin-time   
   atomic head;   
   atomic tail;   
   };   
      
   bool slique_try_push(Slique* sq, Node* n) {   
   uint64_t t = atomic_load(&sq->tail);   
   uint64_t h = atomic_load(&sq->head);   
      
   if (t - h >= MAX_SLIQUE_DEPTH) {   
   return false; // Buffer full - back-pressure   
   }   
      
   sq->nodes[t % MAX_SLIQUE_DEPTH] = n;   
   atomic_store(&sq->tail, t + 1);   
   return true;   
   }   
      
   When buffer fills: The producer blocks or drops:   
      
   // DMA completion handler (producer):   
   void dma_done_handler(Packet p) {   
   if (!slique_try_push(rx_slique, packet_to_node(p))) {   
   // Back-pressure: Can't push to slique   
      
   // Option 1: Drop packet (TCP will retransmit)   
   trace_emit(PACKET_DROPPED, p.micc);   
   free_packet(p);   
      
   // Option 2: Throttle DMA (slow down producer)   
   nic_reduce_rx_rate(0.5); // Cut rate by 50%   
      
   // Option 3: Signal consumer to speed up   
   send_ipi_to_consumer_core();   
   }   
   }   
      
      
   Burst Buffers and Token Buckets   
   For bursty traffic (e.g., HTTP request flood), the slique needs burst   
   capacity beyond steady-state average:   
      
   struct Slique {   
   size_t steady_state_depth; // e.g., 64 entries   
   size_t burst_depth; // e.g., 256 entries   
      
   atomic tokens; // Token bucket for admission control   
   uint64_t refill_rate; // Tokens added per second   
   };   
      
   bool slique_can_accept(Slique* sq) {   
   uint64_t depth = sq->tail - sq->head;   
      
   if (depth < sq->steady_state_depth) {   
   return true; // Always accept in steady state   
   }   
      
   if (depth < sq->burst_depth && atomic_fetch_sub(&sq->tokens, 1) >= 1) {   
   return true; // Accept burst if tokens available   
   }   
      
   return false; // Burst budget exhausted - drop   
   }   
      
      
   Key properties:   
      
   Steady-state load always succeeds   
   Bursts allowed up to token budget   
   Tokens refill over time (leaky bucket)   
   Hard limit at burst_depth (never unbounded)   
      
      
   Networking: Mux/Demux, Scatter-Gather, Zero-Copy   
   The Slique as Mux/Demux Primitive   
   Each slique is associated with a flow (TCP connection, UDP port, etc.):   
      
   // Per-connection slique:   
   Slique* tcp_conn_sliques[MAX_TCP_CONNS];   
      
   // Demux incoming packets:   
   void nic_rx_handler(Packet p) {   
   ConnID cid = parse_tcp_header(p); // Extract src/dst IP+port   
      
   Slique* sq = tcp_conn_sliques[cid];   
   if (sq && slique_try_push(sq, p)) {   
   // Success - packet queued to correct flow   
   } else {   
   // Unknown conn or buffer full - drop   
   send_tcp_rst(p);   
   }   
   }   
      
      
   This is hardware-offloaded mux/demux (like Intel Flow Director, Mellanox   
   RSS) but done in kernel P0 re-routine.   
   Zero-Copy via Scatter-Gather DMA   
   Traditional (bad):   
   // Two copies:   
   nic_dma_to_buffer(rx_buf); // DMA: NIC → kernel buffer   
   copy_to_user(user_buf, rx_buf); // CPU: kernel → user   
   Zero-copy (good):   
   // One DMA directly to user buffer:   
   user_buf = mmap_slique_to_userspace(slique);   
   nic_dma_to_buffer(user_buf); // DMA: NIC → user buffer (bypasses kernel)   
   The slique becomes the shared memory region:   
   // P3 userland maps slique read-only:   
   Slique* sq = mmap(NULL, slique_size, PROT_READ, MAP_SHARED, fd, 0);   
      
   // P0 kernel writes via DMA:   
   nic_program_scatter_gather(sq->nodes, MAX_SLIQUE_DEPTH);   
      
   // P3 polls:   
   while (true) {   
   if (sq->tail > sq->head) {   
   Node* n = &sq->nodes[sq->head % MAX_SLIQUE_DEPTH];   
   process_packet(n->payload); // Direct access, no copy   
   atomic_fetch_add(&sq->head, 1);   
   }   
   }   
      
   sendfile() equivalent:   
      
   // Traditional (two copies):   
   read(disk_fd, buf, len); // Disk → kernel buffer   
   write(sock_fd, buf, len); // Kernel buffer → NIC   
      
   // Zero-copy (DMA chaining):   
   Slique* disk_slique = open_disk("/data/file.bin");   
   Slique* net_slique = open_socket("192.168.1.1:80");   
      
   // Kernel chains DMAs:   
   while (!eof(disk_slique)) {   
   Node* n = slique_pop(disk_slique); // Disk DMA completes   
   slique_push(net_slique, n); // Queue for NIC DMA   
   // Same buffer, no CPU copy   
   }   
      
   Re-Ordering and Retry Holding (Sideline Queue)   
   When packets arrive out-of-order:   
      
   struct Slique {   
   Node* in_order_nodes[MAX_DEPTH]; // Main queue (ordered)   
   Node* sideline_nodes[MAX_SIDELINE]; // Out-of-order holding area   
   uint64_t expected_seq; // Next expected sequence number   
   };   
      
   void slique_insert_packet(Slique* sq, Packet p) {   
   if (p.seq == sq->expected_seq) {   
   // In order - append to main queue   
   slique_push(sq, p);   
   sq->expected_seq++;   
      
   // Check sideline for now-orderable packets   
   slique_drain_sideline(sq);   
   } else if (p.seq > sq->expected_seq) {   
   // Future packet - sideline it   
   sideline_insert(sq, p);   
   } else {   
   // Duplicate - drop   
   trace_emit(PACKET_DUPLICATE, p.micc);   
   }   
   }   
      
   Sideline limits: If sideline fills (too many gaps), drop oldest   
   sidelined packets and rely on TCP retransmit. This bounds memory usage.   
      
   Load Distribution Across Cores   
   The Re-Routine Affinity Model   
   Each slique is homed to a core:   
      
   Slique* rx_sliques[NUM_CORES]; // One per core   
      
   // NIC distributes packets via RSS (Receive Side Scaling):   
   void nic_rx_distribute(Packet p) {   
   uint32_t hash = hash_tcp_tuple(p.src_ip, p.dst_ip, p.src_port, p.dst_port);   
   CoreID core = hash % NUM_CORES;   
      
   slique_push(rx_sliques[core], p);   
   send_ipi_to_core(core); // Wake consumer   
   }   
      
   Why this works:   
      
   Each core processes disjoint flows (no inter-core synchronization)   
   Sliques stay NUMA-local (fast cache access)   
   Load naturally balances (hash distributes evenly)   
      
   When load imbalances: The inference engine detects (via autocorrelation   
   of slique depths) and rebalances flows:   
      
   // Inference engine (P3 daemon):   
   if (slique_depth[CORE_0] > 2 * avg_depth) {   
   // Core 0 overloaded - migrate some flows to Core 1   
   migrate_flow(flow_id, CORE_0, CORE_1);   
      
   // Update NIC RSS table:   
   nic_update_rss_mapping(flow_id, CORE_1);   
   }   
      
   POSIX Error Codes and Limits   
   The standard POSIX errors map naturally:   
   POSIX ErrorRe-Routine CauseSlique BehaviorEAGAINSlique full, try   
   againslique_try_push() returns falseETIMEDOUTRe-routine timeout sweeper   
   expired M-ICCCallback invoked with errorECONNRESETTCP RST receivedSlique   
   entry contains error flagENOMEMMemo freelist exhaustedRe-routine   
   submission rejected   
   User-facing API:   
   // Re-routine-aware read:   
   ssize_t read(int fd, void* buf, size_t count) {   
   RR_MAKER_BEGIN(read, FSM_READ);   
      
   MEMO_CHECK(STEP_DMA) {   
   if (!submit_dma_read(fd, buf, count, rr_ctx.current_micc)) {   
   errno = EAGAIN; // Slique full   
   return -1;   
   }   
   MEMO_INCOMPLETE();   
   }   
      
   Result r = memo_get(rr_ctx.current_memo, STEP_DMA);   
   if (r.error) {   
   errno = r.error;   
   return -1;   
   }   
      
   RR_MAKER_END;   
   return r.bytes;   
   }   
      
   Summary: Path to Working Prototype   
   Phase 1 (Weeks 1-4): Abstract simulator   
      
   MIX-like instruction interpreter   
   Re-routine executor with memo management   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]