home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.forth      Forth programmers eat a lot of Bratwurst      117,927 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 117,735 of 117,927   
   Hans Bezemer to Anton Ertl   
   Re: Back & Forth - CSV is dead, long liv   
   18 Nov 25 16:18:56   
   
   [continued from previous message]   
      
   And here you got the result. It's perfectly fine. But of course, the   
   question rises - could we do better? With less code - and an even easier   
   to parse format? And the answer is, yes, we can! The IANA registered   
   such a format in September 2000. Its de facto specification is the   
   text/tab-separated-values media type. But note that fields that contain   
   tabs or linebreaks are not allowed in this format. That's a bummer,   
   don't you think?   
      
   But the Library of Congress expanded the format, adding a requirement   
   for escaped tabs, carriage returns, newlines and backslashes. Which made   
   it much more universal. The now defunct website “dataprotocols.org” also   
   lists a specification, dated May, 2014. According to that same page,   
   Jason Dusek is the original author. The specification itself is   
   virtually identical to the one by the Library of Congress. And although   
   the prominence of these distinguished institutions is not in dispute -   
   it's not the same as an RFC.   
      
   I wanted it. I wanted it badly. Fortunately, I'd already written a CSV   
   writer for 4-t-H. And since TSV is not that different from CSV, all I   
   had to do is to write a routine to escape those pesky control characters   
   - and I was in business. Okay. I escaped a few more. Since unescaping   
   had to be done by another, already available 4-t-H library that also   
   unescaped those characters.   
      
   And when I was done, what would it take to turn this simple CSV file   
   reader into a TSV file converter? Just a handful of lines:   
      
   • Ok, we got to include the library. That much is clear;   
      
   • We take an extra argument - which is the output file;   
      
   • Now, here things get a bit different - the TSV writer requires it's   
   own OPEN-FILE word, called TSVopen. Note - it knows it's an output file   
   - no need to specify that twice. Yeah - and when we're done, we have to   
   close it with .. TSVclose! What a surprise! (and no, contrary to OPEN it   
   leaves nothing on the stack);   
      
   • Let's fix the parsing routines. This time we won't write anything to   
   the console, but straight to the TSV file. We write a field using   
   TSVtype - and we terminate a record using TSVcr. I don't think that one   
   is rocket science. And we're done!   
      
   Now let's try it out and convert our little spreadsheet to TSV. And here   
   we are! It looks perfectly fine. Let's see if we can read it. Let's take   
   our original CSV reader and see how much it takes to turn it into a TSV   
   reader. Well, not much. Just change the delimiter, and we're good to go.   
   Now, since this one does not contain any control characters, we won't   
   need to expand any escape codes. If we did, we'd need another library.   
   One we already have. But won't need here.   
      
   Anyway, it works just fine. There you go.   
      
   BTW, let's assume you need another format. That's not a problem. We got   
   more libraries. You can convert your file to Wikicode. Or an HTML table.   
   Or a 4-t-H table. Or JSON. Or LibreOffice. Or MS-Office. Or KSpread. And   
   yes, even CSV. And they all got roughly the same API - except that   
   TSVtype is now called XLStype. Yeah, it's all very confusing. I'm so, so   
   sorry.   
      
   But anyways - ending on that positive note, I'm Hans Bezemer and this   
   was “Back and Forth”."   
      
   Hans Bezemer   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca