... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.lang.c++.moderated

Moderated discussion of C++ superhackery

33,346 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 33,044 of 33,346

James K. Lowden to Bart van Ingen Schenau

Re: compilers, endianness and padding

16 May 13 05:47:52

   From: jklowden@speakeasy.net   

   On Tue, 14 May 2013 15:08:22 CST   
   Bart van Ingen Schenau  wrote:   

   > Trees are not that difficult to serialize. How about a slightly more   
   > complex structure:   
   >   
   > class X {   
   >   struct t {   
   >      size_t a;   
   >      char* b;   
   >   };   

   As I mentioned elsewhere, it's necessary in the general case for the   
   compiler to provide the extent as well as the value of a pointer.  IOW   

   	sizeof(X::t::b) == sizeof(char*)   
   	X x;   
   	x.t.b = char s[10];   
   	extentof(x.t.b) == 10;   

   Every pointer -- static, free store, or automatic -- always has some   
   number of bytes allocated to it.  (That number might be zero.)  The   
   language deficiency is that it does not make that information   
   available to the programmer.  Instead, it requires the programmer to   
   track it independently and duplicatively.  And often, it might be   
   noted, incorrectly.   

   Someone will object that keeping track of the size of memory allocated   
   to a pointer will add 8 bytes to every pointer.  Not true!  Remember,   
   every time you say   

   	char *s = "hello";   

   the compiler set aside those 6 bytes and placed the next variable   
   *after* them.  Change it just a little   

   	char s[] = "hello";   

   and suddenly sizeof(s) works.  Yet the pointer is the same size.  Move   
   to the heap   

   	char *s = malloc(6);   

   and the heap must do as the compiler does, setting aside 6 bytes.  I'm   
   simply pointing out that the language could expose that fact with   

   	extentof(s);   

   at *no* cost.  Not just a little: none.  The information is already   
   there, in the executable image, or on the stack, or in the free store.   
   What's missing is a bit of syntax.   

   >   size_t c;   
   >   union {   
   >     char d[sizeof(t)];   
   >     t e;   
   >   } f;   
   > };   

   At first glance, this seems no problem at all, insofar as sizeof(f) is   
   known at compile time.  The problem I think you're alluding to is that   
   two different compilers might arrange f differently, and nothing about   
   the bit pattern of the union tells us what to do.   

   My answer is simple, once again, although at a trivial cost.  It must   
   be possible to know which member of f was last written.  Why?  Because   
   if f.t was written, serialization demands its endianism be honored.   

   One might hope, though, that this sort of malarky might fade into   
   history if endianism were dealt with in the language proper.   

   --jkl   

   --   
         [ See http://www.gotw.ca/resources/clcm.htm for info about ]   
         [ comp.lang.c++.moderated.    First time posters: Do this! ]   

   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]