Popularity

2.0

Growing

Activity

7.6

Stars 106

Watchers 5

Forks 27

Last Commit 10 days ago

Description

Find original and updated publication dates of any web page. From the command-line or within Python, all the steps needed from web page download to HTML parsing, scraping, and text analysis are included.

Programming language: Python

License: Apache License 2.0

Tags: Date And Time Text Processing HTTP Web Content Extracting HTML Scientific Engineering Information Analysis Internet WWW Markup Linguistic Web Scraping Scraping Content Extraction Metadata

Latest version: v1.3.2

htmldate alternatives and similar packages

Based on the "Web Content Extracting" category.
Alternatively, view htmldate alternatives based on common mentions on social networks and blogs.

TWINT

9.4 0.0 htmldate VS TWINT

DISCONTINUED. An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
newspaper

9.3 0.0 L3 htmldate VS newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

python-goose

7.9 0.0 htmldate VS python-goose

Html Content / Article Extractor, web scrapping lib in Python
textract

7.6 3.7 htmldate VS textract

extract text from any document. no muss. no fuss.
sumy

7.4 6.7 L5 htmldate VS sumy

Module for automatic summarization of text documents and HTML pages.
toapi

7.1 0.0 htmldate VS toapi

Every web site provides APIs.
python-readability

6.7 3.4 htmldate VS python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!
trafilatura

6.5 8.7 htmldate VS trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
html2text

5.7 6.1 L1 htmldate VS html2text

Convert HTML to Markdown-formatted text.
Goose3

4.2 6.4 htmldate VS Goose3

A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
micawber

3.9 4.8 L5 htmldate VS micawber

a small library for extracting rich content from urls
lassie

3.7 0.0 L4 htmldate VS lassie

Web Content Retrieval for Humans™
opengraph

3.0 0.0 L5 htmldate VS opengraph

A python module to parse the Open Graph Protocol
inscriptis -- HTML to text conversion library, command line client and Web service

2.6 8.5 htmldate VS inscriptis -- HTML to text conversion library, command line client and Web service

A python based HTML to text conversion library, command line client and Web service.
Haul

2.5 0.0 L5 htmldate VS Haul

An Extensible Image Crawler
sanitize

1.5 0.0 L4 htmldate VS sanitize

Bringing sanity to world of messed-up data
JSONPATH

1.0 5.7 htmldate VS JSONPATH

A query expression for extracting data from JSON.
Data Extractor

0.9 6.0 htmldate VS Data Extractor

Combine XPath, CSS Selectors and JSONPath for Web data extracting.