pyspider alternatives and similar packages
Based on the "Web Crawling" category.
Alternatively, view pyspider alternatives based on common mentions on social networks and blogs.
-
FastImage
Python library that finds the size / type of an image given its URI by fetching as little as needed
CodeRabbit: AI Code Reviews for Developers
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of pyspider or a related project?
README
pyspider
A Powerful Spider(Web Crawler) System in Python.
- Write script in Python
- Powerful WebUI with script editor, task monitor, project manager and result viewer
- MySQL, MongoDB, Redis, SQLite, Elasticsearch; PostgreSQL with SQLAlchemy as database backend
- RabbitMQ, Redis and Kombu as message queue
- Task priority, retry, periodical, recrawl by age, etc...
- Distributed architecture, Crawl Javascript pages, Python 2.{6,7}, 3.{3,4,5,6} support, etc...
Tutorial: http://docs.pyspider.org/en/latest/tutorial/
Documentation: http://docs.pyspider.org/
Release notes: https://github.com/binux/pyspider/releases
Sample Code
from pyspider.libs.base_handler import *
class Handler(BaseHandler):
crawl_config = {
}
@every(minutes=24 * 60)
def on_start(self):
self.crawl('http://scrapy.org/', callback=self.index_page)
@config(age=10 * 24 * 60 * 60)
def index_page(self, response):
for each in response.doc('a[href^="http"]').items():
self.crawl(each.attr.href, callback=self.detail_page)
def detail_page(self, response):
return {
"url": response.url,
"title": response.doc('title').text(),
}
Installation
pip install pyspider
- run command
pyspider
, visit http://localhost:5000/
WARNING: WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network or enable need-auth
for webui.
Quickstart: http://docs.pyspider.org/en/latest/Quickstart/
Contribute
- Use It
- Open Issue, send PR
- User Group
- 中文问答
TODO
v0.4.0
- [ ] a visual scraping interface like portia
License
Licensed under the Apache License, Version 2.0
*Note that all licence references and agreements mentioned in the pyspider README section above
are relevant to that project's source code only.