Description
Goose was originally an article extractor written in Java that has most
recently (Aug2011) been converted to a scala project.
This is a complete rewrite in Python. The aim of the software is to
take any news article or article-type web page and not only extract what
is the main body of the article but also all meta data and most probable
image candidate.
Goose will try to extract the following information:
python-goose alternatives and similar packages
Based on the "Web Content Extracting" category.
Alternatively, view python-goose alternatives based on common mentions on social networks and blogs.
-
TWINT
DISCONTINUED. An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations. -
newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs: -
python-readability
fast python port of arc90's readability tool, updated to match latest readability.js! -
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments -
inscriptis -- HTML to text conversion library, command line client and Web service
2.7 8.5 python-goose VS inscriptis -- HTML to text conversion library, command line client and Web serviceA python based HTML to text conversion library, command line client and Web service.
InfluxDB - Power Real-Time Data Analytics at Scale
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of python-goose or a related project?