htmldate finds original and updated publication dates of any web page using heuristics on HTML code and linguistic patterns. All the steps needed from web page download to HTML parsing, scraping and text analysis are included: URLs, HTML files or HTML trees are given as input, the library outputs a date string in the desired format.

In a nutshell, with Python:

>>> from htmldate import find_date >>> find_date('') '2016-12-23' >>> find_date('', original_date=True) '2016-06-23'

On the command-line:

$ htmldate -u '2016-12-23'

