Description

htmldate finds original and updated publication dates of any web page using heuristics on HTML code and linguistic patterns. All the steps needed from web page download to HTML parsing, scraping and text analysis are included: URLs, HTML files or HTML trees are given as input, the library outputs a date string in the desired format.

In a nutshell, with Python:

>>> from htmldate import find_date >>> find_date('http://blog.python.org/2016/12/python-360-is-now-available.html') '2016-12-23' >>> find_date('https://netzpolitik.org/2016/die-cider-connection-abmahnungen-gegen-nutzer-von-creative-commons-bildern/', original_date=True) '2016-06-23'

On the command-line:

$ htmldate -u http://blog.python.org/2016/12/python-360-is-now-available.html '2016-12-23'

Programming language: - - -

htmldate alternatives and similar packages

Based on the "Web Content Extracting" category

Do you think we are missing an alternative of htmldate or a related project?

Add another 'Web Content Extracting' Package