From: anthk@openbsd.home   
      
   On 2025-03-18, Toaster wrote:   
   > On Tue, 18 Mar 2025 12:00:07 -0500   
   > D Finnigan wrote:   
   >   
   >> On 3/18/25 10:17 AM, Ben Collver wrote:   
   >> > Please stop externalizing your costs directly into my face   
   >> > ==========================================================   
   >> > March 17, 2025 on Drew DeVault's blog   
   >> >   
   >> > Over the past few months, instead of working on our priorities at   
   >> > SourceHut, I have spent anywhere from 20-100% of my time in any   
   >> > given week mitigating hyper-aggressive LLM crawlers at scale.   
   >>   
   >> This is happening at my little web site, and if you have a web site,   
   >> it's happening to you too. Don't be a victim.   
   >>   
   >> Actually, I've been wondering where they're storing all this data;   
   >> and how much duplicate data is stored from separate parties all   
   >> scraping the web simultaneously, but independently.   
   >   
   > But what can be done to mitigate this issue? Crawlers and bots ruin the   
   > internet.   
   >   
      
   GZip bombs + fake links = profit. Remember that gz'ed web pages are a   
   standard, even lynx can parse gz files natively.   
      
   Also, Megahal/Hailo under Perl. Feed it nonsense, and create some   
   non-visible contents under a robots.txt-dissallowed directory   
   full of Markov-chains generated nonsense and gzip bombs.   
      
   --- SoupGate-DOS v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|