Popularity

7.1

Stable

Activity

0.0

Stable

Stars 3,462

Watchers 78

Forks 234

Last Commit almost 2 years ago

Programming language: Python

License: MIT License

Tags: Web Content Extracting

Latest version: v2.1.1

toapi alternatives and similar packages

Based on the "Web Content Extracting" category.
Alternatively, view toapi alternatives based on common mentions on social networks and blogs.

TWINT

9.4 0.0 toapi VS TWINT

DISCONTINUED. An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
newspaper

9.3 0.0 L3 toapi VS newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

python-goose

7.9 0.0 toapi VS python-goose

Html Content / Article Extractor, web scrapping lib in Python
textract

7.6 3.7 toapi VS textract

extract text from any document. no muss. no fuss.
sumy

7.4 6.7 L5 toapi VS sumy

Module for automatic summarization of text documents and HTML pages.
python-readability

6.7 3.4 toapi VS python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!
trafilatura

6.5 8.4 toapi VS trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
html2text

5.7 6.3 L1 toapi VS html2text

Convert HTML to Markdown-formatted text.
Goose3

4.2 6.6 toapi VS Goose3

A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
micawber

3.9 4.8 L5 toapi VS micawber

a small library for extracting rich content from urls
lassie

3.7 0.0 L4 toapi VS lassie

Web Content Retrieval for Humans™
opengraph

3.0 0.0 L5 toapi VS opengraph

A python module to parse the Open Graph Protocol
inscriptis -- HTML to text conversion library, command line client and Web service

2.6 8.5 toapi VS inscriptis -- HTML to text conversion library, command line client and Web service

A python based HTML to text conversion library, command line client and Web service.
Haul

2.5 0.0 L5 toapi VS Haul

An Extensible Image Crawler
htmldate

2.0 7.6 toapi VS htmldate

Fast and robust date extraction from web pages, with Python or on the command-line
sanitize

1.5 0.0 L4 toapi VS sanitize

Bringing sanity to world of messed-up data
JSONPATH

1.0 5.7 toapi VS JSONPATH

A query expression for extracting data from JSON.
Data Extractor

0.9 6.0 toapi VS Data Extractor

Combine XPath, CSS Selectors and JSONPath for Web data extracting.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of toapi or a related project?

Add another 'Web Content Extracting' Package

Popular Comparisons

README

Toapi

[Toapi](logo.png)

Overview

Toapi give you the ability to make every web site provides APIs.

Version v2.0.0, Completely rewrote.

More elegant. More pythonic

v1.0.0 Documentation: http://www.toapi.org
Awesome: https://github.com/toapi/awesome-toapi
Organization: https://github.com/toapi

Features

Automatic converting HTML web site to API service.
Automatic caching every page of source site.
Automatic caching every request.
Support merging multiple web sites into one API service.

Get Started

Installation

$ pip install toapi
$ toapi -v
toapi, version 2.0.0

Usage

create app.py and copy the code:

from flask import request
from htmlparsing import Attr, Text
from toapi import Api, Item

api = Api()


@api.site('https://news.ycombinator.com')
@api.list('.athing')
@api.route('/posts?page={page}', '/news?p={page}')
@api.route('/posts', '/news?p=1')
class Post(Item):
    url = Attr('.storylink', 'href')
    title = Text('.storylink')


@api.site('https://news.ycombinator.com')
@api.route('/posts?page={page}', '/news?p={page}')
@api.route('/posts', '/news?p=1')
class Page(Item):
    next_page = Attr('.morelink', 'href')

    def clean_next_page(self, value):
        return api.convert_string('/' + value, '/news?p={page}', request.host_url.strip('/') + '/posts?page={page}')


api.run(debug=True, host='0.0.0.0', port=5000)

run python app.py

then open your browser and visit http://127.0.0.1:5000/posts?page=1

you will get the result like:

{
  "Page": {
    "next_page": "http://127.0.0.1:5000/posts?page=2"
  }, 
  "Post": [
    {
      "title": "Mathematicians Crack the Cursed Curve", 
      "url": "https://www.quantamagazine.org/mathematicians-crack-the-cursed-curve-20171207/"
    }, 
    {
      "title": "Stuffing a Tesla Drivetrain into a 1981 Honda Accord", 
      "url": "https://jalopnik.com/this-glorious-madman-stuffed-a-p85-tesla-drivetrain-int-1823461909"
    }
  ]
}

Todo

Visualization. Create toapi project in a web page by drag and drop.

Contributing

Write code and test code and pull request.

*Note that all licence references and agreements mentioned in the toapi README section above are relevant to that project's source code only.

toapi

Every web site provides APIs.