home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.protocols.tcp-ip      TCP and IP network protocols.      14,669 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 13,791 of 14,669   
   Rick Jones to Christoph Kukulies   
   Re: time between two packets (1/2)   
   01 Aug 11 16:37:06   
   
   From: rick.jones2@hp.com   
      
   Christoph Kukulies  wrote:   
   > A question about packet processing speed:   
      
   > I have a server and a client which communicate via a socket (ipv4,   
   > AF_INET).  The client sends a connect. The server does an accept and   
   > waits for further messages from the client on that socket (using   
   > select()).   
      
   > The design of the higher level protocol is the following way:   
      
      
   > client                                        server   
      
      
   > send(REQUEST) --------------------------->    recv(REQUEST)   
   > recv(DATA)    <--------------------------     send(DATA or ERROR CODE)   
   > recv(END)     <--------------------------     send(END)   
      
   > Now it turns out that the doubly sending of packets makes the whole   
   > communication very slow and I have no idea why at the moment.   
   > Each transaction in the above way takes 0.2 seconds and calculatiing   
   > alone from the wire speed (100Mbit/s) for 256 bytes I would expect   
   > smaller transmission times. Lowering the packet size down to 70   
   > doesn't have any impact on the time I'm achieving.   
   > But when I remove the sending of the END packet, I'm getting a   
   > drastic performance increase, that is the transaction goes down to   
   > 600 in 1 second.  A 600fold performance increase.   
      
   Some boilerplate likely related to what you are experiencing:   
      
   > I'm not familiar with this issue, and I'm mostly ignorant about what   
   > tcp does below the sockets interface. Can anybody briefly explain what   
   > "nagle" is, and how and when to turn it off? Or point me to the   
   > appropriate manual.   
      
   In broad terms, whenever an application does a send() call, the logic   
   of the Nagle algorithm is supposed to go something like this:   
      
   1) Is the quantity of data in this send, plus any queued, unsent data,   
      greater than the MSS (Maximum Segment Size) for this connection? If   
      yes, send the data in the user's send now (modulo any other   
      constraints such as receiver's advertised window and the TCP   
      congestion window). If no, go to 2.   
      
   2) Is the connection to the remote otherwise idle? That is, is there   
      no unACKed data outstanding on the network. If yes, send the data   
      in the user's send now. If no, queue the data and wait. Either the   
      application will continue to call send() with enough data to get to   
      a full MSS-worth of data, or the remote will ACK all the currently   
      sent, unACKed data, or our retransmission timer will expire.   
      
   Now, where applications run into trouble is when they have what might   
   be described as "write, write, read" behaviour, where they present   
   logically associated data to the transport in separate 'send' calls   
   and those sends are typically less than the MSS for the connection.   
   It isn't so much that they run afoul of Nagle as they run into issues   
   with the interaction of Nagle and the other heuristics operating on   
   the remote. In particular, the delayed ACK heuristics.   
      
   When a receiving TCP is deciding whether or not to send an ACK back to   
   the sender, in broad handwaving terms it goes through logic similar to   
   this:   
      
   a) is there data being sent back to the sender? if yes, piggy-back the   
      ACK on the data segment.   
      
   b) is there a window update being sent back to the sender? if yes,   
      piggy-back the ACK on the window update.   
      
   c) has the standalone ACK timer expired.   
      
   Window updates are generally triggered by the following heuristics:   
      
   i) would the window update be for a non-trivial fraction of the window   
      - typically somewhere at or above 1/4 the window, that is, has the   
      application "consumed" at least that much data? if yes, send a   
      window update. if no, check ii.   
      
   ii) would the window update be for, the application "consumed," at   
       least 2*MSS worth of data? if yes, send a window update, if no   
       wait.   
      
   Now, going back to that write, write, read application, on the sending   
   side, the first write will be transmitted by TCP via nagle rule 2 -   
   the connection is otherwise idle.  However, the second small send will   
   be delayed as there is at that point unACKnowledged data outstanding   
   on the connection.   
      
   At the receiver, that small TCP segment will arrive and will be passed   
   to the application. The application does not have the entire app-level   
   message, so it will not send a reply (data to TCP) back. The typical   
   TCP window is much much larger than the MSS, so no window update would   
   be triggered by heuristic i. The data just arrived and consumed by the   
   application is < 2*MSS, so no window update from heuristic ii.  Since   
   there is no window update, no ACK is sent by heuristic b.   
      
   So, that leaves heuristic c - the standalone ACK timer. That ranges   
   anywhere between 50 and 200 milliseconds depending on the TCP stack in   
   use.   
      
   If you've read this far :) now we can take a look at the effect of   
   various things touted as "fixes" to applications experiencing this   
   interaction.  We take as our example a client-server application where   
   both the client and the server are implemented with a write of a small   
   application header, followed by application data.  First, the   
   "default" case which is with Nagle enabled (TCP_NODELAY _NOT_ set) and   
   with standard ACK behaviour:   
      
                 Client                     Server   
                Req Header        ->   
                                  <-        Standalone ACK after Nms   
                Req Data          ->   
                                  <-        Possible standalone ACK   
                                  <-        Rsp Header   
                Standalone ACK    ->   
                                  <-        Rsp Data   
       Possible standalone ACK    ->   
      
      
   For two "messages" we end-up with at least six segments on the wire.   
   The possible standalone ACKs will depend on whether the server's   
   response time, or client's think time is longer than the standalone   
   ACK interval on their respective sides. Now, if TCP_NODELAY is set we   
   see:   
      
      
                 Client                     Server   
                Req Header        ->   
                Req Data          ->   
                                  <-        Possible Standalone ACK after Nms   
                                  <-        Rsp Header   
                                  <-        Rsp Data   
        Possible Standalone ACK   ->   
      
   In theory, we are down two four segments on the wire which seems good,   
   but frankly we can do better.  First though, consider what happens   
   when someone disables delayed ACKs   
      
                 Client                     Server   
                Req Header        ->   
                                  <-        Immediate Standalone ACK   
                Req Data          ->   
                                  <-        Immediate Standalone ACK   
                                  <-        Rsp Header   
      Immediate Standalone ACK    ->   
                                  <-        Rsp Data   
      Immediate Standalone ACK    ->   
      
   Now we definitly see 8 segments on the wire.  It will also be that way   
   if both TCP_NODELAY is set and delayed ACKs are disabled.   
      
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca