XPost: comp.lang.c++   
   From: lynnmcguire5@gmail.com   
      
   On 2/7/2026 3:23 AM, Thomas Koenig wrote:   
   > Lynn McGuire schrieb:   
   >> On 2/4/2026 9:44 AM, Scott Lurndal wrote:   
   >>> Lawrence =?iso-8859-13?q?D=FFOliveiro?= writes:   
   >>>> On Tue, 3 Feb 2026 17:28:35 -0600, Lynn McGuire wrote:   
   >>>>   
   >>>>> I am swinging huge datasets for simulation models from 1 MB to 1,000 MB.   
   >>>>> Nothing besides C++ has the oomph and speed to make this happen.   
   >>>>   
   >>>> Lots of Pythoneers are doing data science at this sort of scale.   
   >>>   
   >>> Even "R" can handle datasets larger than that.   
   >>   
   >> We are not 64 bit yet. 1,000 MB is about the largest dataset we can   
   >> swing in Win32 due to our inefficiency in managing memory. We do   
   >> compress all strings above 1,000 bytes which means that our datasets are   
   >> actually 2X to 3X bigger than the binary version of the dataset.   
   >   
   > Wow, this must hurt, both for speed and complexity...   
   >   
   > Depending on what your database is like, it might make sense to   
   > convert into a suitable binary format, map it into your address   
   > space (I am fairly sure that Windows can do memory-mapped I/O,   
   > although I am *not* a Windows person, at least not as far as   
   > programming is concerned) and then immediately do things with it.   
   > You could then reduce your startup time by only "recompiling"   
   > from the text representation when needed.   
   >   
   > If this makes sense to your application or not, I don't know (of course)   
   > because I do not know what exactly your database does and does not do.   
      
   We do not uncompress strings until they are used. And our datasets are   
   stored in "mostly" binary.   
      
   Lynn   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|