About

What is Checkflare?

Since the dawn of the commercial Internet, an endless shadow war between hackers and security researchers have raged across the net. Out of these digital conflicts, some powerful players have arisen. One of them is Cloudflare, a company that specializes in CDNs, DDoS mitigation, and other cloud-based services. They provide a service to prevent automated traffic to protected websites, while allowing humans through. Of course, sometimes developers need their bots to be able retrieve data from websites for purposes such as archival, third-party data visualization/aggregation, forum linking, and other legitimate purposes. Such tools would be blocked because they are bots. Projects like this aim to return power to the users and how they choose to interact with information on the web. It should be noted that such solutions do not allow people to DDoS the subject website; it is a utility, not a weapon. All the following solutions work at moderate volume, as high-volume traffic will still invoke Cloudflare’s DDoS mitigation and anti-bot measures.

Numerous solutions have been proposed and implemented to attempt to tackle this challenge. There are two general categories of solutions of utilities designed to defeat anti-bot measures: client-based, and server-based.

Client-based solutions implement the bypass module on the client side. Web scraping code can call a local function that will handle the retrieval of the data. Related works in this technology include Cloudscraper and Hooman. Unfortunately, personal experience shows that these technologies often simply don’t work. Cloudscraper, for example, returns a “Cloudflare v2” error in cases of heavily defended websites (or at random), meaning they cannot be accessed by non-humans. Otherwise, solutions like Hooman have simply become abandonware as their authors can no longer maintain them, which can spell death for dependent projects. Simply put, there simply isn’t many (if any) good client-based solutions available today.

Server-based solutions expose an API that can be called, which will in turn retrieve the page and forward it to the calling client. Examples of this branch of technology include services like Apify and ScraperAPI. While these are certainly more likely to be kept up-to-date and working, they are usually not free, and costs money to use. Depending on the context, this may not be acceptable. Additionally, there may be concerns about data privacy, as the data is not collected on a device owned by the developer. These solutions are great for projects where cost and data privacy are non-issues and the effort to maintain a self-hosted solution is unacceptable.

Checkflare’s approach to the problem is a self-hosted, server-based solution that allows users to call an API, and the server will do the rest. This provides the best of both worlds, as the solution is low-maintenance (most client solutions require regular updates to continue functioning owing to their specialized bypass measures), free, privacy-preserving, transparent (calling applications do not have to manage much on this front), and as an option, can also be run on client machines as a separate process, which can help to avoid issues like IP blacklisting. The ability to run on both server and client makes the solution portable and debuggable, as it is far easier to see what problems are being encountered in a graphical environment like a desktop.

Project deliverables are a solution that implements a solution to allow a bot (in this case, a REST interface) to access data protected by Cloudflare.

Checkflare is being developed by a single person, Steven Chan.