MSpider alternatives and similar packages
Based on the "Web Crawling" category.
Alternatively, view MSpider alternatives based on common mentions on social networks and blogs.
-
Scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python. -
MechanicalSoup
A Python library for automating interaction with websites. -
RoboBrowser
A simple, Pythonic library for browsing the web without a standalone web browser. -
Google Search Results in Python
Google Search Results via SERP API pip Python Package -
spidy Web Crawler
The simple, easy to use command line web crawler. -
Crawley
Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations. -
FastImage
Python library that finds the size / type of an image given its URI by fetching as little as needed -
Mariner
This a is mirror of Gitlab repository. Open your issues and pull requests there.
Write Clean Python Code. Always.
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of MSpider or a related project?
Popular Comparisons
README
MSpider
Talk
The information security department of 360 company has been recruiting for a long time and is interested in contacting the mailbox zhangxin1[at]360.cn.
Installation
In Ubuntu, you need to install some libraries.
You can use pip or easy_install or apt-get to do this.
- lxml
- chardet
- splinter
- gevent
- phantomjs
Example
Use MSpider collect the vulnerability information on the wooyun.org.
python mspider.py -u "http://www.wooyun.org/bugs/" --focus-domain "wooyun.org" --filter-keyword "xxx" --focus-keyword "bugs" -t 15 --random-agent true
Use MSpider collect the news information on the news.sina.com.cn.
python mspider.py -u "http://news.sina.com.cn/c/2015-12-20/doc-ifxmszek7395594.shtml" --focus-domain "news.sina.com.cn" -t 15 --random-agent true
ToDo
- Crawl and storage of information.
- Distributed crawling.
MSpider's help
Usage:
__ __ _____ _ _
| \/ |/ ____| (_) | |
| \ / | (___ _ __ _ __| | ___ _ __
| |\/| |\___ \| '_ \| |/ _` |/ _ \ '__|
| | | |____) | |_) | | (_| | __/ |
|_| |_|_____/| .__/|_|\__,_|\___|_|
| |
|_|
Author: Manning23
Options:
-h, --help show this help message and exit
-u MSPIDER_URL, --url=MSPIDER_URL
Target URL (e.g. "http://www.site.com/")
-t MSPIDER_THREADS_NUM, --threads=MSPIDER_THREADS_NUM
Max number of concurrent HTTP(s) requests (default 10)
--depth=MSPIDER_DEPTH
Crawling depth
--count=MSPIDER_COUNT
Crawling number
--time=MSPIDER_TIME Crawl time
--referer=MSPIDER_REFERER
HTTP Referer header value
--cookies=MSPIDER_COOKIES
HTTP Cookie header value
--spider-model=MSPIDER_MODEL
Crawling mode: Static_Spider: 0 Dynamic_Spider: 1
Mixed_Spider: 2
--spider-policy=MSPIDER_POLICY
Crawling strategy: Breadth-first 0 Depth-first 1
Random-first 2
--focus-keyword=MSPIDER_FOCUS_KEYWORD
Focus keyword in URL
--filter-keyword=MSPIDER_FILTER_KEYWORD
Filter keyword in URL
--filter-domain=MSPIDER_FILTER_DOMAIN
Filter domain
--focus-domain=MSPIDER_FOCUS_DOMAIN
Focus domain
--random-agent=MSPIDER_AGENT
Use randomly selected HTTP User-Agent header value
--print-all=MSPIDER_PRINT_ALL
Will show more information