• S&T Moderators: Skorpio | VerbalTruist

Technology Tarpits - A Class of Malware Designed to Deter Scraping

4DQSAR

Bluelighter
Joined
Feb 3, 2025
Messages
793

I found the above article to be quite interesting. It's been decades since I've done any programming but I think I understand the basic concept.

But I do have a couple of technical questions and a more philosophical question.

I'm guessing that tarpits contain HTML files that continually reference each other so that a crawler bot never reaches the end of a file but instead is tapped in a cycle where one one hand huge amounts of data are scraped, but that data is of no value. Would that be a correect analysis?

In the article 'Aaron' mentions that using hie tarpit (Nepenthes) only uses the resources of quite a modest virtual machine - a Raspberry Pi being the example. So does the virtualization simply keep on generating virtual files and possibly fiddling with the attributes of said files e.g. the header of a file might state that the file is say 16K in sise but in actuality is practically endless?

The issue I sense with using Markov babble is that AI crawlers could potentially chose to reject data that is at odds with previous 'learnt' patterns.

On a much more omonous note, could someone adapt a tarpit to specifically inject untrue 'facts'? Could tarpits be used to rewrite history?

I really know nothing about this topic. But I was shocked by the potential implications. AI has reached the point where much of the technology employed to ensure the user isn't a robot are now redundant. Are there any emerging technologies that can potentially defeat AI?

I will conclude with a slightly OT point (but possibly of value). I've read that AI can learn to play many games but one criteria I have read is 'the board must be of a finite size'. With that in mind, is it therefore impossible for AI to master games such as infinite chess?
 
I really need a "mini scraper". Something that can download a list of URLs in a format similar to "Webpage, Complete". Grok has attempted to make me two using wget and Python's requests library, but he just can't get it right, and I don't understand why since it's such a simple function and he's almost got it.
 
I really need a "mini scraper". Something that can download a list of URLs in a format similar to "Webpage, Complete". Grok has attempted to make me two using wget and Python's requests library, but he just can't get it right, and I don't understand why since it's such a simple function and he's almost got it.
I wouldn't necessarily consider the string formatting an easy thing to work into a prompt for an LLM like Grok. This sounds a lot like a school assignment I've tutored before, it's almost like deja-vu or some shit. Either way, this can be done with a single for loop and a print statement compounding a couple variables and a string literal.

This simple of a program that you're asking about here is also highly vulnerable to the tarpits that the thread is about, detecting anomalies like that requires an entire system to keep track of short term memory, it's a significant pain in the ass.
 
Top