Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.forth    |    Forth programmers eat a lot of Bratwurst    |    117,927 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 117,735 of 117,927    |
|    Hans Bezemer to Anton Ertl    |
|    Re: Back & Forth - CSV is dead, long liv    |
|    18 Nov 25 16:18:56    |
      [continued from previous message]              And here you got the result. It's perfectly fine. But of course, the       question rises - could we do better? With less code - and an even easier       to parse format? And the answer is, yes, we can! The IANA registered       such a format in September 2000. Its de facto specification is the       text/tab-separated-values media type. But note that fields that contain       tabs or linebreaks are not allowed in this format. That's a bummer,       don't you think?              But the Library of Congress expanded the format, adding a requirement       for escaped tabs, carriage returns, newlines and backslashes. Which made       it much more universal. The now defunct website “dataprotocols.org” also       lists a specification, dated May, 2014. According to that same page,       Jason Dusek is the original author. The specification itself is       virtually identical to the one by the Library of Congress. And although       the prominence of these distinguished institutions is not in dispute -       it's not the same as an RFC.              I wanted it. I wanted it badly. Fortunately, I'd already written a CSV       writer for 4-t-H. And since TSV is not that different from CSV, all I       had to do is to write a routine to escape those pesky control characters       - and I was in business. Okay. I escaped a few more. Since unescaping       had to be done by another, already available 4-t-H library that also       unescaped those characters.              And when I was done, what would it take to turn this simple CSV file       reader into a TSV file converter? Just a handful of lines:              • Ok, we got to include the library. That much is clear;              • We take an extra argument - which is the output file;              • Now, here things get a bit different - the TSV writer requires it's       own OPEN-FILE word, called TSVopen. Note - it knows it's an output file       - no need to specify that twice. Yeah - and when we're done, we have to       close it with .. TSVclose! What a surprise! (and no, contrary to OPEN it       leaves nothing on the stack);              • Let's fix the parsing routines. This time we won't write anything to       the console, but straight to the TSV file. We write a field using       TSVtype - and we terminate a record using TSVcr. I don't think that one       is rocket science. And we're done!              Now let's try it out and convert our little spreadsheet to TSV. And here       we are! It looks perfectly fine. Let's see if we can read it. Let's take       our original CSV reader and see how much it takes to turn it into a TSV       reader. Well, not much. Just change the delimiter, and we're good to go.       Now, since this one does not contain any control characters, we won't       need to expand any escape codes. If we did, we'd need another library.       One we already have. But won't need here.              Anyway, it works just fine. There you go.              BTW, let's assume you need another format. That's not a problem. We got       more libraries. You can convert your file to Wikicode. Or an HTML table.       Or a 4-t-H table. Or JSON. Or LibreOffice. Or MS-Office. Or KSpread. And       yes, even CSV. And they all got roughly the same API - except that       TSVtype is now called XLStype. Yeah, it's all very confusing. I'm so, so       sorry.              But anyways - ending on that positive note, I'm Hans Bezemer and this       was “Back and Forth”."              Hans Bezemer              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca