gain alternatives and similar packages
Based on the "Web Crawling" category.
Alternatively, view gain alternatives based on common mentions on social networks and blogs.
-
FastImage
Python library that finds the size / type of an image given its URI by fetching as little as needed
SaaSHub - Software Alternatives and Reviews
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of gain or a related project?
Popular Comparisons
README
Web crawling framework for everyone. Written with asyncio
, uvloop
and aiohttp
.
[](img/architecture.png)
Requirements
- Python3.5+
Installation
pip install gain
pip install uvloop
(Only linux)
Usage
- Write spider.py:
from gain import Css, Item, Parser, Spider
import aiofiles
class Post(Item):
title = Css('.entry-title')
content = Css('.entry-content')
async def save(self):
async with aiofiles.open('scrapinghub.txt', 'a+') as f:
await f.write(self.results['title'])
class MySpider(Spider):
concurrency = 5
headers = {'User-Agent': 'Google Spider'}
start_url = 'https://blog.scrapinghub.com/'
parsers = [Parser('https://blog.scrapinghub.com/page/\d+/'),
Parser('https://blog.scrapinghub.com/\d{4}/\d{2}/\d{2}/[a-z0-9\-]+/', Post)]
MySpider.run()
Or use XPathParser:
from gain import Css, Item, Parser, XPathParser, Spider
class Post(Item):
title = Css('.breadcrumb_last')
async def save(self):
print(self.title)
class MySpider(Spider):
start_url = 'https://mydramatime.com/europe-and-us-drama/'
concurrency = 5
headers = {'User-Agent': 'Google Spider'}
parsers = [
XPathParser('//span[@class="category-name"]/a/@href'),
XPathParser('//div[contains(@class, "pagination")]/ul/li/a[contains(@href, "page")]/@href'),
XPathParser('//div[@class="mini-left"]//div[contains(@class, "mini-title")]/a/@href', Post)
]
proxy = 'https://localhost:1234'
MySpider.run()
You can add proxy setting to spider as above.
Run
python spider.py
Result:
[](img/sample.png)
Example
The examples are in the /example/
directory.
Contribution
- Pull request.
- Open issue.
*Note that all licence references and agreements mentioned in the gain README section above
are relevant to that project's source code only.