... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.asm.x86
Ahh, the lost art of x86 assembly
4,675 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 3,667 of 4,675
Rod Pemberton to John"
Re: DJB2
11 Nov 18 08:14:13
   From: invalid@nospicedham.lkntrgzxc.com   
      
   On Sun, 11 Nov 2018 11:07:19 -0000 (UTC)   
   "Kerr-Mudd,John"  wrote:   
      
   > On Sun, 11 Nov 2018 05:37:00 GMT, Rod Pemberton   
   >  wrote:   
      
   > > I tested about 50 hash functions with brute-force text input about a   
   > > decade ago (comp.lang.misc):   
   > >   
   >   
   > Thanks for that; I'm not looking to have uniqueness over the entire   
   > ASCII (or even all Cap letters combinations);   
      
   It was intended to test to determine which hashes were good at   
   producing low collisions for unknown text strings of random length.   
      
   > what I'm looking for is a keyword compression algorithm; I'm thinking   
   > of replacing FORTH's existing counted-Word followed by code with a   
   > lookup table of hashed keyword-to- code-entry.   
      
   There were some conversations on this on comp.lang.forth a while ago.   
   (I'm currently way behind on reading and replying to posts there ...)   
      
   A hash should improve the speed of compiling the dictionary, app code,   
   and improve lookup's via FIND etc (or whatever Forth word ANS uses   
   instead of FIND ... PARSE?).  However, this shouldn't have much effect   
   on code speed, as the dictionary lookups are minimal or non-existent at   
   runtime.  In other words, lookups should only occur for use of FIND   
   entered interactively by the user, or when defining a new word, i.e.   
   compiling a new colon-def.   
      
   > No need to store the actual Word literal.   
      
   Are you replacing just the count or the count and string?   
      
   If the latter, how do you distinguish one word from another when a hash   
   collision occurs without having the original name to compare?   
      
   If the former, that's a violation of ANS which requires a single char   
   for the name length of a Forth word in the dictionary.  See COUNT.   
   COUNT is used to find a word's name after being given the count   
   location.  It's also claimed on c.l.f., that adjusting COUNT to fit a   
   different string format will break existing Forth code.  Apparently,   
   some Forth code uses a few historical aspects of counted strings, i.e.,   
   such as treating the count char as an 8-bit byte, e.g., not a 16-bit or   
   32-bit word/hash etc, or skipping a byte for the count when accessing   
   the name whether a count value exists there or not in said Forth, or   
   using COUNT to add one to an address, i.e., COUNT can't return 0 or 2   
   or 4 etc for the difference between c-addr2 and c-addr1.  A COUNT   
   difference of zero 0 would be for a nul-terminated string without a   
   length.  A COUNT difference 2 or 4 would be for a count of longer   
   strings or for a hash ...  Forth has lots of undocumented historical   
   behavior that is not specified in ANS, which only comes about from   
   working your way up from earlier standards.   
      
   How do I know this?  Well, I did exactly that in my Forth by using C   
   strings, which had a COUNT difference of zero 0.  This is because I have   
   a C back end (not assembly) and used nul-terminated strings, instead of   
   using counted Forth strings.   
      
   Also, any non-standard format for the count and name also won't pass   
   Hayes Core tests, as the test explicitly constructs counted strings   
   using a single character (8-bit byte) as the length, just prior to the   
   name.  I.e., the count must exist, must be only one character in size,   
   and must be immediately followed by the name, and can't use a pointer to   
   a table of names etc, for Hayes Core to work correctly.   
      
   Also, the word list isn't fixed in Forth.  So, anyone can define a new   
   word any time Forth is in interactive mode.  This could collide with any   
   prior word, although it sounds like you're not really having an issue   
   with collisions yet.   
      
   I'm also guessing that you're not going to allocate 128KB (or more) for   
   a lookup table, e.g., 64KB for hash, 64KB for paired code address.   So,   
   your "compacted" lookup table will still need to traverse the hash   
   values until you find the matching hash, either by using comparisons   
   and branches or by using a linked-list.   
      
   Using comparisons and branches in assembly to traverse the "compacted"   
   lookup table of hash values also means you'd need to be able to compile   
   assembly into the lookup table at runtime when adding additional   
   hash values to the table to skip unused values.  I.e., you'll probably   
   need Forth's ;CODE word etc.   
      
   Forth's dictionary is usually set up as a list-linked list.  If you   
   continue using a linked-list, then you've just reduced the time to   
   compare strings, which can take time, but which also seems really   
   insignificant for my interpreted Forth.  I.e., unless you're loading   
   huge amounts of code, I don't think using hashes will be much of an   
   improvement.   
      
      
   Rod Pemberton   
   --   
   "The most ironic outcome is the most probable." Elon Musk   
   Could someone tell Elon that means Tesla implodes? ...   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]