All Versions
15
Latest Version
Avg Release Cycle
37 days
Latest Release
-

Changelog History
Page 1

  • v3.1.10 Changes

    • ๐Ÿ›  Fix for float based timezones see issue #128 Thanks @Vasniktel!
    • โž• Add langdetect dependency to help resolve some edge cases when missing language information causes text to not be pulled. see issue #106
  • v3.1.9 Changes

    • ๐Ÿ›  Fix for removing site name from title when it is part of the title see issue #123
    • ๐Ÿ›  Fix parsing encoding string when encoding information is capitalized see issue #109
  • v3.1.8 Changes

    • ๐Ÿ›  Fixed title being an empty string when the title is the same as the site name see PR #117 Thanks @Pradhvan
    • โž• Add optional removal of footnotes see issue #105
  • v3.1.7 Changes

    • ๐Ÿ›  Fixed author configuration see PR #96
    • ๐Ÿ‘Œ Improve parent node scoring to get more of the correct data see PR #102 Thanks @skruse
  • v3.1.6 Changes

    October 20, 2018
    • ๐Ÿ‘Œ Improved handling of page encoding see PR #92
    • ๐Ÿ‘Œ Improved author and published date extraction see PR #93 Thanks @timoilya!
    • โž• Added additional schema extractors for schema.org parser see PR #89
    • ๐Ÿ‘ Allow for pulling more then the first og:type data for Opengraph see PR #90
  • v3.1.5 Changes

    September 11, 2018
  • v3.1.4 Changes

    August 19, 2018
    • ๐Ÿ›  Fix IndexError when title has only an title splitter or is the site name see issue #59 Thanks @dlrobertson!
    • Retry the calculate_top_node function with the root node if the first pass failed to find an article which may occur if one or more known article patterns are found, but none contain content see PR #66 Thanks @dlrobertson!
    • โž• Add parsing of schema.org's ReportageNewsArticle tags see PR #67 Thanks @dlrobertson!
    • โž• Add additional parsing of opengraph tags see PR #64 Thanks @dlrobertson!
  • v3.1.3 Changes

    July 07, 2018
    • ๐Ÿ“œ Parse headers and include in cleaned_text
    • โž• Additional Configuration options:
      • Parse Headers: parse_headers
      • Parse Lists: parse_lists
      • Pretty Lists: pretty_lists
    • ๐Ÿ‘€ Catch mismatch encoding meta tag and document encoding see pull request #53 Thanks @jeffquach!
  • v3.1.2 Changes

    June 02, 2018
    • ๐Ÿ“œ Parse lists out if present in the main article
    • โž• Added configuration option pretty_lists to specify if a list should be represented as text or made to read like a list; default is True
  • v3.1.1 Changes

    May 29, 2018
    • ๐Ÿ‘€ Catch additional PIL exceptions when attempting to read images; see #42
    • ๐Ÿ‘ Better meta processing of opengraph tags for use as keys in returned data; see #45