... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.c++.moderated
Moderated discussion of C++ superhackery
33,346 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 33,033 of 33,346
James K. Lowden to that it requires a sad and lonely h
Re: compilers, endianness and padding (1
13 May 13 23:11:46
   From: jklowden@speakeasy.net   
      
   On Mon, 13 May 2013 16:19:48 CST   
   Seungbeom Kim  wrote:   
      
   Seungeom, I want to acknowledge the care you took to pose a reasonably   
   hard example problem.  I probably missed something, but I hope I've   
   shown it is readily solved.   
      
   > On 2013-05-12 23:12, James K. Lowden wrote:   
   >   
   > > I find it odd that   
   > >   
   > > 	char *s = "hello";   
   > > 	cout << s;   
   > >   
   > > works,   
   >   
   > Again, char* is a special case. Mainly because C used char* values   
   > to represent string values.   
   >   
   > > but   
   > >   
   > > 	struct {  char *s; } s = { "hello" };   
   > > 	cout << s;   
   > >   
   > > does not. I do not understand why we accept serialization of   
   > > built-in types, and resolutely refuse to standardize -- or even   
   > > support the standardization of -- serialization of user-defined   
   > > types.   
   >   
   > How do you define the serialization format for an arbitrary UDT?   
      
   By iteration over the members.   
      
   There is no such thing as an "arbitrary UDT".  Every UDT is built up   
   from primitive types, and every memberwise I/O operation eventually   
   boils down to (de-)serialization of those primitive types.   
      
   > Why should the language standard define one?   
      
   The language should define one so we can stop reinventing I/O for   
   every possible combination.   
      
   I do not mean C++ should suddenly see I/O added to the language   
   definition.  I do mean that the language needs some small but   
   important extensions before iostreams can be extended to support   
   generic types.   
      
   > For example, given the node type mentioned above, what's THE ONE   
   > correct way to serialize a binary tree?   
      
   Tell me this: what's the one correct way to serialize a double?   
      
   We don't need *the* one correct way.  We would benefit, though, from a   
   correct, reversible way.  There is no reason it can't be done   
   mechanically.   
      
   For the reader's reference, the struct in question is   
      
   	struct node { int value; node* left; node* right; } n;   
      
   How, you ask?  I'd do something I bet very like what you would do.   
   What I do not see is why the standard library couldn't do it for me,   
   with a little information from the compiler.   
      
   I hope someone better versed in graph theory will come to my rescue,   
   but here's a plausible Monday night hack:   
      
   byte    type    size    value   
        0   node      20       -   
        0   int        4       x   
        4   node*      8      20   
       12   node*      8      40   
       20   node      20       -    // n.left   
       20   int        4       y   
       24   node*      8      60   
       32   node*      8      80   
       40   node      20       -   
       40   int        4       z    // n.right   
       44   node*      8     100   
       52   node*      8     120   
       etc.   
      
   I wrote that in ASCII of course, because we're two humans   
   communicating.  For communication between C++ programs, the above   
   information would better be tokenized.   
      
   The serialization system would recognize "node" as a UDT, taken from   
   the list of types provided by the compiler, and would therefore have   
   access to the metadata array describing the members.  Pointers are   
   denoted as offsets into the stream.  In reality, the stream reflects   
   what the compiler itself must do to maintain the graph in memory.   
   (Because, after all, pointers are just offsets from zero into the   
   linear address space we call "memory".)   
      
   Of course, nothing prevents a graph built from such a structure from   
   having cycles.  OTOH nothing prevents the serializer from detecting   
   cycles.   
      
   > > The minimum I would like to see is the ability to iterate over the   
   > > members of a structure.  Suppose they were described as an array   
   > > of tuples of {type, size, constness}.  Then we could serialize   
   > > abstractly along the lines of   
   > >   
   > > 	struct { ... } foo;   
   > > 	for_each(members_of(foo).begin(), ... );   
   >   
   > That would be very cool, but even before being able to iterate over   
   > struct members, the most fundamental problem to be solved is how to   
   > represent types as data, I believe.   
      
   I simply don't see the problem.  As I said, every struct or class   
   eventually is composed of built-in types.  The compiler is able to   
   manage the structures in memory.  The debugger is able to represent   
   them on the screen.  What do you think is so different about a stream   
   that it requires a sad and lonely human being to write the I/O   
   routines?   
      
   > But again, I guess lots of UDTs need more than just what the   
   > template expansion can do for serialization (as imposed by the   
   > external format).   
      
   ISTM it's not as hard as you think.  You'll agree that inheritance is   
   a tree, and that trees can be unambiguously represented and traversed.   
   Structures you'll agree can be described as an array of types.  If I   
   gave you a tree of arrays arbitrarily and recursively defined, but   
   with each element defined in advance -- because I'm a compiler, and   
   all my types are known by ODR -- then surely you would be able to   
   iterate over the whole steaming mass and write it to a file.   
      
   The problem as I see it is that the type system is unavailable at   
   runtime.  The information I'm describing -- class hierarchy, member   
   structure -- is discarded by the compiler (except insofar as it's made   
   available to the debugger).   
      
   Although the vogue term is "reflection", the idea is older than   
   ancient.  Classes in Smalltalk could be interrogated at runtime.   
   (Heck, IIRC classes could be *modified* at runtime.  But we won't go   
   there!)   
      
   > > Stroustrup & friends restricted themselves to a single, well   
   > > understood problem: std::string.  To answer my own question,   
   > > std::string is special because its need was recognized in 1985.   
   >   
   > What makes you think std::string is special in the current context?   
   > It's just a class type, which happens to be included in the standard   
   > library and thus be supported better by other components in the same   
   > library. The core language doesn't give it any special treatment.   
      
   Exactly.  Because the core language discards information the standard   
   library could otherwise use to handle UDTs generically, std::string   
   had to be explicitly and painstakingly integrated into the standard   
   library.  Before the advent of the Internet, std::string was the   
   answer to the one well known I/O problem, namely char*.  In that day   
   and age, it was deemed worthwhile to craft a single-purpose type,   
   rather than expose the type system for the library's use.   
      
   I cannot reliably take std::string from one library and pass it to   
   operator<< in another.  There are all sorts of little geegaws in   
   std::string because the compiler does not provide the requisite   
   information: the library must "know" the name of the char* pointer,   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]