crawler free download

Showing 335 open source projects for "crawler"

View related business solutions

Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

crawler

Collection of JS reverse engineering examples for web scraping study

...Many examples illustrate techniques such as debugging scripts, intercepting requests, analyzing encrypted parameters, and understanding authentication flows. crawler also explores common anti-scraping defenses and demonstrates how developers can examine them through debugging tools and reverse engineering techniques.

Downloads: 3 This Week

Last Update: 6 days ago
See Project
2

AI-Crawler

Crawl a website starting from a URL, find relevant pages

AI Crawler is an experimental AI-powered web crawling and data extraction tool that uses natural language prompts to guide the discovery and retrieval of relevant information across websites. Unlike traditional web scrapers that rely on static selectors and manual scripting, it uses AI to dynamically identify and prioritize pages based on user intent, making it more flexible and resilient to changes in website structure.

Downloads: 11 This Week

Last Update: 2026-04-02
See Project
3

SiteOne Crawler

SiteOne Crawler is a website analyzer and exporter

...Watch a detailed video with a sample report for Astro. build website. This crawler can be used as a command-line tool (see releases and video), or you can use a multi-platform desktop application with a graphical interface (see a video about the app).

1 Review

Downloads: 4 This Week

Last Update: 2026-03-30
See Project
4

Weibo Crawler

Python crawler for collecting and downloading Sina Weibo user data

weibo-crawler is a Python-based data collection tool designed to retrieve information from Sina Weibo user accounts. It automates the process of gathering posts, user profile details, and engagement metrics from one or more target accounts. weibo-crawler can extract comprehensive information about users, including profile attributes such as nickname, follower count, following count, and account metadata.

Downloads: 3 This Week

Last Update: 1 day ago
See Project
Earn up to 16% annual interest with Nexo.
Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
5

tumblr-crawler

Python crawler to download photos and videos from Tumblr blogs

tumblr-crawler is an open source Python-based utility designed to download media content from Tumblr blogs. It provides a script that automatically retrieves photos and videos from specified Tumblr sites and saves them locally for offline access. Users can specify one or multiple blogs to crawl by editing a configuration file or by passing parameters through the command line.

Downloads: 2 This Week

Last Update: 7 hours ago
See Project
6

Crawler Detect

CrawlerDetect is a PHP class for detecting bots/crawlers/spiders

Crawler Detect is a PHP library that detects bots, crawlers, and spiders by analyzing user-agent headers and comparing them against a constantly updated list of known crawlers. It's useful for analytics, rate-limiting, or displaying alternative content for automated tools. It is fast, lightweight, and easy to integrate into any PHP application.

Downloads: 1 This Week

Last Update: 2026-03-26
See Project
7

Spatie Crawler

An easy to use, powerful crawler implemented in PHP

Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.

Downloads: 1 This Week

Last Update: 2026-03-20
See Project
8

GPT Crawler

Crawl a site to generate knowledge files to create your own custom GPT

GPT Crawler is an open-source tool designed to automatically crawl websites and generate structured knowledge that can be used to build AI assistants and retrieval systems. It focuses on extracting high-quality textual content from web pages and preparing it in formats suitable for embedding, indexing, or fine-tuning workflows. The project is especially useful for teams that want to turn documentation sites or knowledge bases into conversational AI backends without building custom scrapers from scratch. ...

Downloads: 0 This Week

Last Update: 2026-03-02
See Project
9

EasySpider

A visual no-code/code-free web crawler/spider

A visual code-free/no-code web crawler/spider, supporting both Chinese and English.

Downloads: 11 This Week

Last Update: 2025-01-01
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

WebMagic

A scalable web crawler framework for Java

WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting.

Downloads: 1 This Week

Last Update: 2025-02-10
See Project
11

dxy-covid-19-crawler

Realtime crawler for COVID-19 outbreak statistics from DXY data

DXY-COVID-19-Crawler is a Python-based project designed to collect real-time COVID-19 infection data from the public dataset provided by Ding Xiang Yuan (DXY). The crawler periodically retrieves pandemic statistics and stores them in a database so that historical changes in the outbreak can be preserved and analyzed later. It was created to make up-to-date infection data more accessible for developers, researchers, and analysts who wanted to build visualizations or conduct data analysis during the early stages of the pandemic. ...

Downloads: 6 This Week

Last Update: 2026-04-02
See Project
12

Nebula libp2p DHT

A libp2p DHT crawler, monitor, and measurement tool

A libp2p DHT crawler and monitor that tracks the liveness of peers. The crawler connects to DHT bootstrap peers and then recursively follows all entries in their k-buckets until all peers have been visited. The crawler supports the IPFS, Filecoin, Polkadot, Kusama, Rococo, Westend networks and more. The crawler can store its results as JSON documents or in a postgres database - the --dry-run flag prevents it from doing either.

Downloads: 1 This Week

Last Update: 2025-03-25
See Project
13

crwlr

Library for Rapid (Web) Crawler and Scraper Development

...A depth of 3 means 3 levels deep. Links found on the initial URLs provided to the crawler are level 1 and so on.

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
14

Heritrix

Internet Archive's open-source, web-scale, web crawler project

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt. Heritrix is designed to respect the robots.txt exclusion directives† and META nofollow tags. ...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
15

FEAPDER

Powerful Python crawler framework for scalable web scraping tasks

...It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based jobs. feapder supports features such as breakpoint resume, allowing crawlers to continue from where they stopped without losing progress. It also integrates monitoring and alerting capabilities to help developers track crawler performance and detect issues during execution. feapder includes browser rendering support for handling dynamic web pages and provides mechanisms for large-scale data deduplication during crawling.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
16

Python API for JMComic

Python crawler and API for downloading JMComic albums and images

JMComic-Crawler-Python is a Python library and crawler framework designed to programmatically access and download comic content from the JMComic platform. It provides a structured API that allows developers to retrieve albums, chapters, and images using simple Python code while handling the necessary network requests and data processing behind the scenes.

Downloads: 4 This Week

Last Update: 2 days ago
See Project
17

Spider

High-performance Rust web crawler and scraper for large-scale data

Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents.

Downloads: 3 This Week

Last Update: 2026-03-31
See Project
18

Pholcus

Distributed high-concurrency crawler software written in pure golang

...This software is only used for academic research, users need to abide by the relevant laws and regulations of their location, please do not use it for illegal purposes! Provide users with a certain Go or JS programming foundation with a heavyweight crawler tool that only needs to pay attention to rule customization and complete functions.

Downloads: 1 This Week

Last Update: 2026-03-03
See Project
19

Crawl4AI

Open-source LLM Friendly Web Crawler & Scraper

Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
20

katana

Fast CLI web crawler for discovering endpoints in modern web apps

Katana is an open source command-line web crawling and spidering framework developed by ProjectDiscovery. It is designed to efficiently crawl websites and web applications in order to discover endpoints, resources, and other useful information that may not be easily visible through manual browsing. Katana focuses on speed and automation, making it suitable for use in security reconnaissance workflows and automated pipelines. Katana supports both standard HTTP crawling and headless browser...

Downloads: 8 This Week

Last Update: 2026-03-10
See Project
21

Colly

Elegant Scraper and Crawler Framework for Golang

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling.

Downloads: 0 This Week

Last Update: 2025-03-27
See Project
22

autocrawler

Multiprocess Selenium crawler for downloading images by keywords

...AutoCrawler supports multiprocess and multithreaded downloading, which allows it to retrieve images faster by running several tasks simultaneously. Users provide search terms through a simple keyword file, and the crawler organizes downloaded images into directories for each keyword. It can download either thumbnails or full resolution images and supports multiple image formats such as JPG, GIF, and PNG. It also includes configuration options such as headless mode, download limits, proxy usage, and thread count to customize crawling behavior.

Downloads: 1 This Week

Last Update: 7 hours ago
See Project
23

PaSa

An advanced paper search agent powered by large language models

..., PaSa decomposes the task: the Crawler generates search queries, retrieves candidate papers (via search tools and citation expansion), then adds them to a “paper queue.” The Selector then reads abstracts or full text (depending on what’s available) and decides which papers are relevant.

Downloads: 0 This Week

Last Update: 2025-12-02
See Project
24

spider_collection

Collection of Python web scraping scripts for data extraction tasks

spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages. ...

Downloads: 1 This Week

Last Update: 7 hours ago
See Project
25

douyin

Open source Douyin crawler for collecting and downloading public data

DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It also integrates with the Aria2 download utility to enable large-scale downloading of videos and images associated with collected content. ...

Downloads: 2 This Week

Last Update: 2026-03-13
See Project