Popularity

6.4

Stable

Activity

3.0

Declining

Stars 2,404

Watchers 87

Forks 276

Last Commit over 1 year ago

Description

Grab is a python web scraping framework. Grab provides tons of helpful methods to scrape web sites and to process the scraped content:

Code Quality Rank: L3

Programming language: Python

License: MIT License

Tags: HTTP Web Crawling Application Frameworks Internet WWW

Latest version: v0.6.41

Grab alternatives and similar packages

Based on the "Web Crawling" category.
Alternatively, view Grab alternatives based on common mentions on social networks and blogs.

Scrapy

9.9 9.6 L4 Grab VS Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
pyspider

9.5 0.0 L3 Grab VS pyspider

DISCONTINUED. A Powerful Spider(Web Crawler) System in Python.

InfluxDB – Built for High-Performance Time Series Workloads

InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

Promo www.influxdata.com

requests-html

9.1 0.0 Grab VS requests-html

Pythonic HTML Parsing for Humans™
portia

8.9 0.0 L2 Grab VS portia

Visual scraping for Scrapy
MechanicalSoup

7.7 4.8 L4 Grab VS MechanicalSoup

A Python library for automating interaction with websites.
RoboBrowser

7.2 0.0 L4 Grab VS RoboBrowser

A simple, Pythonic library for browsing the web without a standalone web browser.
PSpider

6.4 0.0 Grab VS PSpider

简单易用的Python爬虫框架，QQ交流群：597510560
feedparser

6.3 7.5 L3 Grab VS feedparser

Parse feeds in Python
cola

6.3 0.0 L3 Grab VS cola

A high-level distributed crawling framework.
Scrapely

6.1 0.0 Grab VS Scrapely

A pure-python HTML screen-scraping library
gain

6.0 0.0 Grab VS gain

Web crawling framework based on asyncio.
Sukhoi

4.3 0.0 Grab VS Sukhoi

Minimalist and powerful Web Crawler.
Google Search Results in Python

4.1 5.0 Grab VS Google Search Results in Python

Google Search Results via SERP API pip Python Package
MSpider

4.0 0.0 Grab VS MSpider

Spider
reader

3.5 8.7 Grab VS reader

A Python feed reader library.
spidy Web Crawler

3.3 0.0 Grab VS spidy Web Crawler

The simple, easy to use command line web crawler.
Crawley

2.7 0.0 Grab VS Crawley

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.
brownant

2.6 0.0 Grab VS brownant

Brownant is a web data extracting framework.
Demiurge

2.1 0.0 L5 Grab VS Demiurge

PyQuery-based scraping micro-framework.
Pomp

1.6 0.0 L5 Grab VS Pomp

Screen scraping and web crawling framework
FastImage

1.1 0.0 L4 Grab VS FastImage

Python library that finds the size / type of an image given its URI by fetching as little as needed
Mariner

0.4 0.0 Grab VS Mariner

This a is mirror of Gitlab repository. Open your issues and pull requests there.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of Grab or a related project?

Add another 'Web Crawling' Package

Popular Comparisons

README

🇷🇺 Grab Framework Project

Project Status

Important notice: pycurl backend is dropped. The only network transport now is urllib3.

The project is being in a slow refactoring stage. It might be possible there will no be new feaures.

Things that are going to happen (no estimation time):

Refactoring the source code while keeping most of external API unchanged
Fixing bugs
Annotating source code with type hints
Improving quality of source code to comply with pylint and other linters
Moving some features into external packages or moving external dependencies inside Grab
Fixing memory leaks
Improving test coverage
Adding more platforms and python versions to test matrix
Releasing new versions on pypi

Installation

$ pip install -U grab

See details about installing Grab on different platforms here http://docs.grablib.org/en/latest/usage/installation.html

Documentation

Get it here grab.readthedocs.io

Telegram chat groups

Russian: t.me/grablab_ru
English: t.me/grablab

About Grab (very old description)

Grab is a python web scraping framework. Grab provides a number of helpful methods to perform network requests, scrape web sites and process the scraped content:

Automatic cookies (session) support
HTTPS/SOCKS proxy support with/without authentication
Keep-Alive support
IDN support
Tools to work with web forms
Easy multipart file uploading
Flexible customization of HTTP requests
Automatic charset detection
Powerful API to extract data from DOM tree of HTML documents with XPATH queries

Grab provides interface called Spider to develop multithreaded web-site scrapers:

Rules and conventions to organize crawling logic
Multiple parallel network requests
Automatic processing of network errors (failed tasks go back to task queue)
You can create network requests and parse responses with Grab API (see above)
Different backends for task queue (in-memory, redis, mongodb)
Tools to debug and collect statistics

Grab Example

    import logging

    from grab import Grab

    logging.basicConfig(level=logging.DEBUG)

    g = Grab()

    g.go('https://github.com/login')
    g.doc.set_input('login', '****')
    g.doc.set_input('password', '****')
    g.doc.submit()

    g.doc.save('/tmp/x.html')

    g.doc('//ul[@id="user-links"]//button[contains(@class, "signout")]').assert_exists()

    home_url = g.doc('//a[contains(@class, "header-nav-link name")]/@href').text()
    repo_url = home_url + '?tab=repositories'

    g.go(repo_url)

    for elem in g.doc.select('//h3[@class="repo-list-name"]/a'):
        print('%s: %s' % (elem.text(),
                          g.make_url_absolute(elem.attr('href'))))

Grab::Spider Example

    import logging

    from grab.spider import Spider, Task

    logging.basicConfig(level=logging.DEBUG)


    class ExampleSpider(Spider):
        def task_generator(self):
            for lang in 'python', 'ruby', 'perl':
                url = 'https://www.google.com/search?q=%s' % lang
                yield Task('search', url=url, lang=lang)

        def task_search(self, grab, task):
            print('%s: %s' % (task.lang,
                              grab.doc('//div[@class="s"]//cite').text()))


    bot = ExampleSpider(thread_number=2)
    bot.run()