toapi alternatives and similar packages
Based on the "Web Content Extracting" category.
Alternatively, view toapi alternatives based on common mentions on social networks and blogs.
-
TWINT
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations. -
newspaper
News, full-text, and article metadata extraction in Python 3. Advanced docs: -
python-goose
Html Content / Article Extractor, web scrapping lib in Python -
python-readability
fast python port of arc90's readability tool, updated to match latest readability.js! -
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments -
Goose3
A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html -
inscriptis -- HTML to text conversion library, command line client and Web service
A python based HTML to text conversion library, command line client and Web service. -
htmldate
Fast and robust date extraction from web pages, with Python or on the command-line -
Data Extractor
Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Updating dependencies is time-consuming.
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of toapi or a related project?
README
Toapi
[Toapi](logo.png)
Overview
Toapi give you the ability to make every web site provides APIs.
Version v2.0.0, Completely rewrote.
More elegant. More pythonic
- v1.0.0 Documentation: http://www.toapi.org
- Awesome: https://github.com/toapi/awesome-toapi
- Organization: https://github.com/toapi
Features
- Automatic converting HTML web site to API service.
- Automatic caching every page of source site.
- Automatic caching every request.
- Support merging multiple web sites into one API service.
Get Started
Installation
$ pip install toapi
$ toapi -v
toapi, version 2.0.0
Usage
create app.py
and copy the code:
from flask import request
from htmlparsing import Attr, Text
from toapi import Api, Item
api = Api()
@api.site('https://news.ycombinator.com')
@api.list('.athing')
@api.route('/posts?page={page}', '/news?p={page}')
@api.route('/posts', '/news?p=1')
class Post(Item):
url = Attr('.storylink', 'href')
title = Text('.storylink')
@api.site('https://news.ycombinator.com')
@api.route('/posts?page={page}', '/news?p={page}')
@api.route('/posts', '/news?p=1')
class Page(Item):
next_page = Attr('.morelink', 'href')
def clean_next_page(self, value):
return api.convert_string('/' + value, '/news?p={page}', request.host_url.strip('/') + '/posts?page={page}')
api.run(debug=True, host='0.0.0.0', port=5000)
run python app.py
then open your browser and visit http://127.0.0.1:5000/posts?page=1
you will get the result like:
{
"Page": {
"next_page": "http://127.0.0.1:5000/posts?page=2"
},
"Post": [
{
"title": "Mathematicians Crack the Cursed Curve",
"url": "https://www.quantamagazine.org/mathematicians-crack-the-cursed-curve-20171207/"
},
{
"title": "Stuffing a Tesla Drivetrain into a 1981 Honda Accord",
"url": "https://jalopnik.com/this-glorious-madman-stuffed-a-p85-tesla-drivetrain-int-1823461909"
}
]
}
Todo
- Visualization. Create toapi project in a web page by drag and drop.
Contributing
Write code and test code and pull request.
*Note that all licence references and agreements mentioned in the toapi README section above
are relevant to that project's source code only.