From: jklowden@speakeasy.net   
      
   On Wed, 22 May 2013 00:15:25 CST   
   Edward Diener wrote:   
      
   > >> The compiler knows no such thing. It only knows that 'void* p' is a   
   > >> pointer.   
   > >   
   > > Of course it does, as you well know. The information isn't   
   > > recorded in the pointer, but any allocated memory -- no matter how   
   > > allocated -- has size. That size is tracked by the   
   > > compiler/stack/heap. It's C++'s way of keeping everything from   
   > > using the same space.   
   >   
   > The compiler/stack/heap are all different things.   
      
   That is part of the challenge. Pointers can be assigned values in a   
   variety of ways. If you make them "bigger" in the sense of adding an   
   extent attribute, you have to touch each member of the menagerie.   
      
   > From your 'struct' above the compiler only knows that p is a pointer   
   > to void   
      
   That's true when defined. Once assigned,   
      
    string s;   
    void *p = reinterpret_cast(&s);   
      
   it's a pointer to something. That information is discarded   
   today, but need not be. Indeed, it would be valuable to keep; only a   
   few days ago there was a discussion on this list about the danger of   
   casting to void* and casting the result to any type other than the   
   original. By retaining the cast-from information, the running system   
   could throw an exception if that were done.   
      
   Absent better information, void* points to a sequence of bytes.   
      
   > That pointer can be to anywhere in memory and can point to   
   > anything. It does not have to point to dynamic memory or be on the   
   > stack. I tend to doubt that today the information ( length in bytes )   
   > about that 'void * p' is kept anywhere while the program is running   
   > by the run-time system.   
      
   I'm sure your right. As I said, for a file-scope variable:   
      
    static const char *name = "Galileo";   
      
   the compiler reserves space in the object code for the string   
   "Galileo", but the size of that reservation is discarded.   
      
   > I know what you mean by "heap". I was only questioning the idea of   
   > what the "heap" knows.   
      
   Ah. I've always thought it peculiar that   
      
    char *p = malloc(10);   
    free(p);   
      
   works but   
      
    char *p = new char[10];   
    delete p;   
      
   leaks.   
      
   I know it's been justified time and again, but I don't see how it can   
   be seen as anything other than a step backwards. I wonder how many   
   cycles have been saved versus hours wasted.   
      
   > I am not in principal against a run-time system that can track what   
   > you want it to track but I think you may be understimating the   
   > speed/size costs as well as the effort involved.   
      
   You may be right. I've been misunderestimated myself on occasion.   
      
   > If there were overhead I would want an end-user to be able to opt out   
   > of it. Not everyone will agree that the ability to automatically   
   > serialize data should be paid for in terms of either slower code or   
   > bigger code.   
      
   Acknowledged. OTOH, it's easy to overestimate the costs, because they   
   are so often currently borne by the programmer.   
      
   Let's separate two areas of concern: static metadata and pointer   
   extents.   
      
   Incorporating static metadata -- basically, a name-type-size tuple for   
   every structure member -- in the object code will make the object code   
   bigger. That's undeniable. If I were writing a compiler, I'm sure I'd   
   hear from users complaining about that, and be under pressure to offer   
   an option to remove it. (There will even be proprietary-software   
   concerns. Not every closed-source vendor will want his data structures   
   clearly disclosed.) I think it would be interesting to measure, though,   
   especially if we restrict ourselves to being able to iterate over the   
   members of a struct/class.   
      
   I don't see how extending the language to provide static metadata   
   imposes any runtime cost. I don't believe it's terribly difficult   
   given that every compiler already provides the information to   
   debuggers. (ISTM debuggers would then be easier to write.)   
      
   Just metadata and only metadata would be a boon to anyone doing C++   
   I/O, especially to library writers. If you want to see more people   
   using C++, that's surely one way to get there.   
      
   In the general case, serialization requires inheritance metadata, and   
   pointer extents, too. I'm personally not all that exercised about   
   inheritance, but neither does the inheritance graph strike me as   
   particularly difficult to represent. It would support some interesting   
   use cases. For instance, it would be possible to explain a Koenig   
   lookup without parsing the code.   
      
   To provide pointer extents requires runtime support. There is some   
   complexity for that reason among others. But it's not at all clear the   
   cost is nearly so great as it seems at first blush.   
      
   I believe the compiled program should track the extent of every   
   pointer. When people object about efficiency, I'm actually puzzled,   
   because I find that whenever I'm dealing with a pointer, any pointer,   
   I'm always tracking and testing against the extent   
      
    A *   
    foo( A *a, size_t len ) {   
    assert(a);   
    for( A *p=a; p < a + len; ++p ) {   
    if( bar(p) )   
    return p;   
    }   
    return NULL;   
    }   
      
   Who hasn't written that 1000 times in one form or another? How does   
   one deal with pointers without tracking the bounds of what they point   
   to? Who iterates over a string today relying on the NUL terminator   
   without reference to the allocated extent of the buffer?   
      
   Once we accept that every pointer has an extent, and that to use that   
   pointer we track its extent, why not ask the compiler to do the work   
   for us?   
      
   It may at first seem an unbearable cost, because it may seem that   
   a pointer's extent must be updated whenever it's incremented, and may   
   seem that   
      
    *p   
      
   becomes analogous to vector::at() instead of vector::operator[]. But   
   neither of those suppositions is accurate.   
      
   In the first place, it's not necessary to update the pointer's extent   
   or to check every dereference. In my A *a example above, the system   
   can compute the extent of p at will   
      
    extentof(p) == extentof(a) - (p - a)   
      
   In the second place, that computation need not take place unless   
   demanded. If extentof is never invoked, the information to produce it   
   need not exist in the executable.   
      
   Most important, though: we already bear the cost. We're tracking   
   the length of allocated objects, passing lengths with pointers, testing   
   against boundaries. We have been since 1975. Would 2017 be too soon   
   to move that information into the language, where it would be more   
   convenient? Not to mention unerringly correct?   
      
    A *   
    foo( A a[] ) {   
    A *p = a;   
    for( ; p < a + extentof(a)/sizeof(a[0]); ++p ) {   
    if( bar(p) )   
    return p;   
    }   
    return NULL;   
    }   
      
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|