From: barmar@alum.mit.edu   
      
   In article ,   
    Fernando Nunes wrote:   
      
   > Hello,   
   >   
   > I have a very weird (at least for me) situation that I think has to do with   
   > some TCP layer issue that's way off my knowledge.   
   > I have a database system using a storage manager to do it's backups. There is   
   > a   
   > standar API for this. The storage manager functions implementing that API   
   > send   
   > the data to another machine that they call media server. The communication is   
   > using TCP. The storage manager process associated with the database is   
   > send()ing the data to the media server. In a single socket connection we're   
   > sending at most something like 60GB. The database side is HP-UX 11.31i and   
   > the   
   > media server is Linux (can't specify the exact version at the moment).   
   >   
   > After sending a lot of data some of the processes (not all/every time) get   
   > "stuck". The SM database side is blocked in a send(). The process on the   
   > media   
   > server is on recv(), but no data is being send. I can't think of any "normal"   
   > explanation for this... I search around and found that the sending end may   
   > think that the other side has not receiving window, but the TCP stack was   
   > built   
   > to avoid that... The storage manager supplier doesn't seem able to debug it,   
   > and I found nothing that points clearly to a problem... only vague   
   > possibilities that are usually hard to test.   
   >   
   > My questions:   
   >   
   > - Does this ring a bell to anyone?   
   > - What issues may cause a similar situation?   
   > - What more should I do to debug it? A tcpdump could be helpful, but the   
   > output   
   > would be massive and hard to handle...   
      
   You only care about the part of the dump around the time that it gets   
   stuck. You should be able to tell if there's a zero-window problem, as   
   well as see whether the storage manager is sending zero-window probes as   
   it should.   
      
   > - After a while the media server end (the one receiving the data) notices   
   > that   
   > it hasn't receive any data in a definable period of time, and closes the   
   > process (and it's sockets). On the sending end the socket changes to   
   > CLOSE_WAIT   
   > and the process doesn't abort neither seems to receive the usual SIGPIPE. Any   
   > possible explanation for this (like socket options?) It just keeps on the   
   > send() function...   
      
   It will only get a SIGPIPE if the server sends a RST.   
      
   >   
   > Thanks for any pointers or suggestions. I can gather more details if needed,   
   > but the code is closed source, so my only option is to truss/strace it...   
   >   
   > Regards!   
      
   --   
   Barry Margolin, barmar@alum.mit.edu   
   Arlington, MA   
   *** PLEASE don't copy me on replies, I'll read them in the group ***   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|