home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   linux.debian.bugs.dist      Ohh some weird Debian bug report thing      28,835 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 27,766 of 28,835   
   Mikhail Potemkin to All   
   Bug#1128093: ITP: golang-github-bobusumi   
   15 Feb 26 20:30:01   
   
   XPost: linux.debian.devel   
   From: potemkinmr@gmail.com   
      
   Package: wnpp   
   Severity: wishlist   
   Owner: Mikhail Potemkin    
      
   * Package name    : golang-github-bobusumisu-aho-corasick   
     Version         : 1.0.3-1   
     Upstream Author : Øyvind Ingvaldsen   
   * URL             : https://github.com/BobuSumisu/aho-corasick   
   * License         : MIT   
     Programming Lang: Go   
     Description     : Aho-Corasick string-searching algorithm in Go   
      
    Aho-Corasick   
    .   
    Build Status (https://travis-ci.com/BobuSumisu/aho-corasick) [Image: Go   
    Version] (https://img.shields.io/github/go-mod/go-version/BobuSumisu/aho-   
    corasick) [Image: Latest Tag]   
    (https://img.shields.io/github/v/tag/BobuSumisu/aho-corasick)   
    .   
    Implementation of the Aho-Corasick string-search algorithm in Go.   
    .   
    Licensed under MIT License.   
    .   
    Details   
    .   
    This implementation does not use a Double-Array Trie   
    (https://linux.thai.net/~thep/datrie/datrie.html) as in my   
    implementation (https://github.com/BobuSumisu/go-ahocorasick) from a   
    couple of years back.   
    .   
    This reduces the build time drastically, but at the cost of higher   
    memory consumption.   
    .   
    The search time is still fast, and comparable to other Go   
    implementations I have found on github that claims to be fast (see   
    performance).   
    .   
    Documentation   
    .   
    Can be found at godoc.org (https://godoc.org/github.com/BobuSumisu/aho-   
    corasick).   
    .   
    Example Usage   
    .   
    Use a TrieBuilder to build a Trie:   
    .   
      trie := NewTrieBuilder().   
          AddStrings([]string{"or", "amet"}).   
          Build()   
    .   
    Then go and match something interesting:   
    .   
      matches := trie.MatchString("Lorem ipsum dolor sit amet, consectetur   
    adipiscing elit.")   
      fmt.Printf("Got %d matches.\n", len(matches))   
    .   
      // => Got 3 matches.   
    .   
    What did we match?   
    .   
      for _, match := range matches {   
          fmt.Printf("Matched pattern %d %q at position %d.\n",   
    match.Match(),   
              match.Pattern(), match.Pos())   
      }   
    .   
      // => Matched pattern 0 "or" at position 1.   
      // => Matched pattern 0 "or" at position 15.   
      // => Matched patterh 1 "amet" at position 22.   
    .   
    Building   
    .   
    You can easily load patterns from file:   
    .   
      builder := NewTrieBuilder()   
      builder.LoadPatterns("patterns.txt")   
      builder.LoadStrings("strings.txt")   
    .   
    Both functions expects a text file with one pattern per line.   
    LoadPatterns expects the pattern to be in hexadecimal form.   
    .   
    Storing   
    .   
    Use Encode to store a Trie in gzip compressed binary format:   
    .   
      f, err := os.Create("trie.gz")   
      err := Encode(f, trie)   
    .   
    And Decode to load it from binary format:   
    .   
      f, err := os.Open("trie.gz")   
      trie, err := Decode(f)   
    .   
    Performance   
    .   
    Some simple benchmarking on my machine (Intel(R) Core(TM) i7-8665U CPU @   
    1.90GHz, 32 GiB RAM).   
    .   
    Build and search time grows quite linearly with regards to number of   
    patterns and input text length.   
    .   
    Building   
    .   
      BenchmarkTrieBuild/100-4                    7886            154786   
    ns/op   
      BenchmarkTrieBuild/1000-4                    739           1647419   
    ns/op   
      BenchmarkTrieBuild/10000-4                    91          13331713   
    ns/op   
      BenchmarkTrieBuild/100000-4                    9         123886615   
    ns/op   
    .   
    Searching   
    .   
      BenchmarkMatchIbsen/100-4                1471089               819   
    ns/op   
      BenchmarkMatchIbsen/1000-4                202288              5667   
    ns/op   
      BenchmarkMatchIbsen/10000-4                19957             59680   
    ns/op   
      BenchmarkMatchIbsen/100000-4                2012            595086   
    ns/op   
    .   
    Compared to Other Implementation   
    .   
    See aho-corasick-benchmark (https://github.com/Bobusumisu/aho-corasick-   
    benchmark).   
    .   
    Memory Usage   
    .   
    As mentioned, the memory consumption will be quite high compared to a   
    double-array trie implementation. Especially during the build phase   
    (which currently contains a lot of object allocations).   
      
      
   Needed by trufflehog   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca