... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.pascal.borland
Borland Pascal was actually pretty neat
2,978 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 1,922 of 2,978
Bart to All
Re: Decoding RNA codons: fastest method
12 Oct 05 23:06:43
   From: broersma.juda_ANTISPAM_@tiscali.nl   
      
   Op Sun, 9 Oct 2005 22:49:21 +0100 schreef Dr John Stockton   
   :   
      
   >Or, since the codons are made of ACGU only, encode each letter as 2 bits   
   >and store in a byte; you can then look the translation up in a small   
   >array of enumerated or string (fill unused entries with 0 or '').   
      
      
   type   
     tHashTable = array[0..$3F] of Word;   
      
   var HashTable: tHashTable;   
      
      
   function CodonHash3(const ACodon: tCodon): Word;   
   // bit 6 and bit 7 are enough to identify the letter !!   
   // 'A' = 01000001 -> 00   
   // 'C' = 01000011 -> 01   
   // 'G' = 01000111 -> 11   
   // 'U' = 01010101 -> 10   
   begin   
     Result := (((Ord(ACodon[1]) and 7) shr 1) shl 4) +   
               (((Ord(ACodon[2]) and 7) shr 1) shl 2) +   
               ((Ord(ACodon[3]) and 7) shr 1);   
   end;   
      
      
   procedure BuildHashTable;   
   //HashTable voor CodonHash3   
   begin   
     FillChar(HashTable, SizeOf(HashTable), $FF);   
     {UUU}  HashTable[$2A] := Phe;   
     {UUC}  HashTable[$29] := Phe;   
     ...   
     ...   
     {GGA}  HashTable[$3C] := Gly;   
     {GGG}  HashTable[$3F] := Gly;   
   end;   
      
   In the testing method:   
   ...   
     hashvalue := CodonHash3(CodonTable[j]);   
     index := HashTable[hashvalue];   
   ...   
      
   This method is 1.24 times slower than the original "nested case"   
   method.   
      
   Since all hashingvalues are bytes, it is possible to use a HashTable   
   defined as an array of byte.   
   I tested this and it actually is slower.   
      
   >Since a letter needs only 5 bits, you could try encoding three in a byte   
   >as (X1*2+X2)*2+X3; there *may* be no duplicates.  In fact, only 3 bits   
   >vary in ACGU : ...X.XX. : so (X1*32+X2)*32+X3 looks safe, using word   
   >arithmetic.   
      
   Well since only the last 3 bits are relevant I did (X1*8 X2)*8 +X3   
   using shifts   
      
   type   
     tHashTable2 = array[$1289..$183D] of Word;   
      
   var HashTable2: tHashTable2;   
      
      
   function CodonHash4(const ACodon: tCodon): Word;   
   begin   
     Result := (Ord(ACodon[1]) shl 6) + (Ord(ACodon[2]) shl 3) +   
   Ord(ACodon[3]);   
   end;   
      
      
   procedure TForm1.BuildHashTable2;   
   //HashTable voor CodonHash4   
   begin   
     FillChar(HashTable2, SizeOf(HashTable2), $FFFF);   
     {UUU}  HashTable2[$183D] := Phe;   
     {UUC}  HashTable2[$182B] := Phe;   
     ...   
     ...   
     {GGA}  HashTable2[$1439] := Gly;   
     {GGG}  HashTable2[$143F] := Gly;   
   end;   
      
      
   In the testing method:   
      
   ...   
     hashvalue := CodonHash4(CodonTable[j]);   
     index := HashTable2[hashvalue];   
   ...   
      
   This method finally is 1.008 times faster !!   
      
   I then modified this Hashing method (using only the last 9 bits) in   
   order to make a smaller hashtable.   
   This is slower. The hasing takes more time (1 "and" added). It seesm   
   that the size of the HashTable does not matter for the speed.   
   I think that for better speed a faster hashig mehod is needed.   
   Since i use only shifts in my (at this time) fastest hashing, I don't   
   see much room for that...   
      
      
   As I said in a previous respons, I redid the program in Delphi (3).   
      
   Now I finally got my old TP 6.0 up and running again.   
      
   All methods in TP (tested in real mode DOS) are 50 to 70 times slower   
   than in Delphi!   
   Well, there's 32-bit computing for you.   
   Interestingly the "Nested Case" method is in TP 1.2 times faster than   
   the fastest Hashing method ...   
      
   Bart   
      
   --   
   Bart Broersma   
   broersma.juda_ANTISPAM_@tiscali.nl   
   (ff _ANTISPAM_ wegpoetsen uit dit adres natuurlijk)   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]