Popularity

4.0

Declining

Activity

0.0

Stable

Stars 345

Watchers 55

Forks 194

Last Commit almost 2 years ago

Programming language: Python

License: GNU General Public License v3.0 only

Tags: HTTP Web Crawling Internet

MSpider alternatives and similar packages

Based on the "Web Crawling" category.
Alternatively, view MSpider alternatives based on common mentions on social networks and blogs.

Scrapy

9.9 9.6 L4 MSpider VS Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
pyspider

9.5 0.0 L3 MSpider VS pyspider

A Powerful Spider(Web Crawler) System in Python.

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

requests-html

9.2 0.0 MSpider VS requests-html

Pythonic HTML Parsing for Humans™
portia

9.0 0.0 L2 MSpider VS portia

Visual scraping for Scrapy
MechanicalSoup

7.7 5.9 L4 MSpider VS MechanicalSoup

A Python library for automating interaction with websites.
RoboBrowser

7.4 0.0 L4 MSpider VS RoboBrowser

A simple, Pythonic library for browsing the web without a standalone web browser.
PSpider

6.5 0.0 MSpider VS PSpider

简单易用的Python爬虫框架，QQ交流群：597510560
Grab

6.5 3.0 L3 MSpider VS Grab

Web Scraping Framework
cola

6.4 0.0 L3 MSpider VS cola

A high-level distributed crawling framework.
Scrapely

6.3 0.0 MSpider VS Scrapely

A pure-python HTML screen-scraping library
feedparser

6.1 7.7 L3 MSpider VS feedparser

Parse feeds in Python
gain

6.1 0.0 MSpider VS gain

Web crawling framework based on asyncio.
Sukhoi

4.2 0.0 MSpider VS Sukhoi

Minimalist and powerful Web Crawler.
Google Search Results in Python

3.7 4.5 MSpider VS Google Search Results in Python

Google Search Results via SERP API pip Python Package
spidy Web Crawler

3.2 0.0 MSpider VS spidy Web Crawler

The simple, easy to use command line web crawler.
reader

3.1 9.1 MSpider VS reader

A Python feed reader library.
Crawley

2.6 0.0 MSpider VS Crawley

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.
brownant

2.5 0.0 MSpider VS brownant

Brownant is a web data extracting framework.
Demiurge

2.0 0.0 L5 MSpider VS Demiurge

PyQuery-based scraping micro-framework.
Pomp

1.6 0.0 L5 MSpider VS Pomp

Screen scraping and web crawling framework
FastImage

1.0 0.0 L4 MSpider VS FastImage

Python library that finds the size / type of an image given its URI by fetching as little as needed
Mariner

0.4 0.0 MSpider VS Mariner

This a is mirror of Gitlab repository. Open your issues and pull requests there.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of MSpider or a related project?

Add another 'Web Crawling' Package

Popular Comparisons

README

MSpider

Talk

The information security department of 360 company has been recruiting for a long time and is interested in contacting the mailbox zhangxin1[at]360.cn.

Installation

In Ubuntu, you need to install some libraries.

You can use pip or easy_install or apt-get to do this.

lxml
chardet
splinter
gevent
phantomjs

Example

Use MSpider collect the vulnerability information on the wooyun.org.

python mspider.py -u "http://www.wooyun.org/bugs/" --focus-domain "wooyun.org" --filter-keyword "xxx" --focus-keyword "bugs" -t 15 --random-agent true

Use MSpider collect the news information on the news.sina.com.cn.

python mspider.py -u "http://news.sina.com.cn/c/2015-12-20/doc-ifxmszek7395594.shtml" --focus-domain "news.sina.com.cn"  -t 15 --random-agent true

ToDo

Crawl and storage of information.
Distributed crawling.

MSpider's help

Usage:
  __  __  _____       _     _
 |  \/  |/ ____|     (_)   | |
 | \  / | (___  _ __  _  __| | ___ _ __
 | |\/| |\___ \| '_ \| |/ _` |/ _ \ '__|
 | |  | |____) | |_) | | (_| |  __/ |
 |_|  |_|_____/| .__/|_|\__,_|\___|_|
               | |
               |_|
                        Author: Manning23


Options:
  -h, --help            show this help message and exit
  -u MSPIDER_URL, --url=MSPIDER_URL
                        Target URL (e.g. "http://www.site.com/")
  -t MSPIDER_THREADS_NUM, --threads=MSPIDER_THREADS_NUM
                        Max number of concurrent HTTP(s) requests (default 10)
  --depth=MSPIDER_DEPTH
                        Crawling depth
  --count=MSPIDER_COUNT
                        Crawling number
  --time=MSPIDER_TIME   Crawl time
  --referer=MSPIDER_REFERER
                        HTTP Referer header value
  --cookies=MSPIDER_COOKIES
                        HTTP Cookie header value
  --spider-model=MSPIDER_MODEL
                        Crawling mode: Static_Spider: 0  Dynamic_Spider: 1
                        Mixed_Spider: 2
  --spider-policy=MSPIDER_POLICY
                        Crawling strategy: Breadth-first 0  Depth-first 1
                        Random-first 2
  --focus-keyword=MSPIDER_FOCUS_KEYWORD
                        Focus keyword in URL
  --filter-keyword=MSPIDER_FILTER_KEYWORD
                        Filter keyword in URL
  --filter-domain=MSPIDER_FILTER_DOMAIN
                        Filter domain
  --focus-domain=MSPIDER_FOCUS_DOMAIN
                        Focus domain
  --random-agent=MSPIDER_AGENT
                        Use randomly selected HTTP User-Agent header value
  --print-all=MSPIDER_PRINT_ALL
                        Will show more information