... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.c++.moderated
Moderated discussion of C++ superhackery
33,346 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 33,066 of 33,346
James K. Lowden to Bart van Ingen Schenau
Re: compilers, endianness and padding
21 May 13 01:21:06
   From: jklowden@speakeasy.net   
      
   On Fri, 17 May 2013 02:49:47 -0700 (PDT)   
   Bart van Ingen Schenau  wrote:   
      
   > How would the compiler decide when to chase the value.large.data   
   > pointer and when to just dump the bytes from value.short?   
   >   
   > > My answer is simple, once again, although at a trivial cost.  It   
   > > must be possible to know which member of f was last written.  Why?   
   > > Because if it was written, serialization demands its endianism be   
   > > honored.   
   >   
   > Are you really proposing to add a hidden member to all unions to track   
   > which member was last written to?   
      
   Yes, approximately.   
      
   I am not saying that unions should acquire a hidden member contiguous   
   with the union's memory.  The compiled program need only track which   
   member was written to.   
      
   We agree, yes, that data members can be represented as a table, much as   
   virtual functions are represented as a vtable?  Let's pretend there is   
   such a thing, and call it a "dtable". Like the vtable, the dtable need   
   not be *in* the union/struct/class, need not perturb the memory   
   layout.   
      
   A dtable would typically have only a handful of rows, because most   
   structures (I bet) have less than a dozen or so members.  Surely 255   
   members in a union is rare.  So usually one byte per instance would   
   suffice to track which member was last written to.   
      
   Such a byte could be used to automatically throw an exception in the   
   event the union is used for "type punning".  I happen to like type   
   punning and am somewhat baffled as to why the compiler writers were   
   allowed to prohibit it, but if that's to be the rule, then this would   
   be a feature.   
      
   > Just in case it might need serialization and the endianness might   
   > matter?   
      
   It's not obvious to me that the compiler would generate metadata "just   
   in case" but rather "in the event".  Metadata aren't needed unless   
   referenced; perhaps like templates they need not be generated   
   unless and until they're referenced.   
      
   The simplest way I can see to address the binary-bloat concern is to   
   make metadata generation optional, a la RTTI, and let link-time errors   
   inform the user of the need to turn it on.  To make those errors   
   clearer, the compiler might generate a marker, say,   
   __SERIALIZATION__ to enable compile-time detection of metadata   
   availability.  That would permit a linker to report that object A in   
   a.o needs to be compiled with the -serializable option because it did   
   not provide the symbol required by libserialization.so.   
      
   Actually, there is a simpler way: keep the concern in proportion.  The   
   dtable exists once per struct/class.  Let's guess that each name in a   
   struct requires N bytes of metadata, plus the characters of the name   
   itself, where 4 <= N <= 16 (unless someone can show otherwise).  For a   
   typical int->string map, whose value_type is std::pair, the   
   dtable might be something like 64 bytes if the row size is on the high   
   side and the names are longish.  By one measure, that might be seen as   
   2X cost; the metadata might be as big as the structure itself.  OTOH,   
   the metadata is per type, not per instance, and data structures are   
   typically dwarfed by code and data size.  It would be interesting to   
   see implemented and measure the effect on something like, say, Qt.  I'd   
   lay my chips in the under 1% range.   
      
   Bear in mind that the dtable is not 100% cost and could well prove to   
   be a net savings.  After all, it  would enable the creation of   
   libraries to replace what today is bespoke code.  If you measure cost   
   as functionality over lines of code, that's a clear win.  Can   
   individual programmers write more efficient operator<< methods than   
   would be implemented in a library?  Some, perhaps, but not on   
   average.  And they would still be able to do that; they just won't have   
   to.   
      
   Another way to look at the question is that, as everyone reading this   
   list knows, efficiency and correctness are improved any time logic is   
   reduced to a table.   
      
   > And have you considered that your proposed serialization feature   
   > might be standardized to use an endian-neutral serialization format?   
      
   If wishes were horses then beggars would ride.  ;-)   
      
   That indeed is where I would like to end up.  If structure metadata   
   were exposed in the language, I'm sure the denizens of Boost would have   
   a field day.  We might see JNI compatability, and json/yaml/fotm   
   (flavor of the month).  I personally would like to see library   
   support for scatter/gather I/O because it's essential to efficient   
   DBMS interfaces.   
      
   --jkl   
      
      
   --   
         [ See http://www.gotw.ca/resources/clcm.htm for info about ]   
         [ comp.lang.c++.moderated.    First time posters: Do this! ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]