All Versions
Latest Version
Avg Release Cycle
81 days
Latest Release
229 days ago

Changelog History
Page 1

  • v6.1.1 Changes

    February 09, 2022
    • โšก๏ธ Updated the heuristic to fix the letter รŸ in UTF-8/MacRoman mojibake, which had regressed since version 5.6.

    • ๐Ÿ›  Packaging fixes to pyproject.toml.

  • v6.1 Changes

    February 09, 2022
    • โšก๏ธ Updated the heuristic to fix the letter ร‘ with more confidence.

    • ๐Ÿ›  Fixed type annotations and added py.typed.

    • ๐Ÿ“ฆ ftfy is packaged using Poetry now, and wheels are created and uploaded to PyPI.

  • v6.0.3 Changes

    May 14, 2021
    • ๐Ÿ‘ Allow the keyword argument fix_entities as a deprecated alias for unescape_html, raising a warning.

    • ftfy.formatting functions now disregard ANSI terminal escapes when calculating text width.

  • v6.0.2 Changes

    May 04, 2021

    ๐Ÿ’„ This version is purely a cosmetic change, updating the maintainer's e-mail โž• address and the project's canonical location on GitHub.

  • v6.0.1 Changes

    April 12, 2021
    • The remove_terminal_escapes step was accidentally not being used. This version restores it.

    • Specified in that ftfy 6 requires Python 3.6 or later.

    • ๐Ÿ“„ Use a lighter link color when the docs are viewed in dark mode.

  • v6.0 Changes

    April 02, 2021
    • New function: ftfy.fix_and_explain() can describe all the transformations that happen when fixing a string. This is similar to what ftfy.fixes.fix_encoding_and_explain() did in previous versions, but it can fix more than the encoding.

    • fix_and_explain() and fix_encoding_and_explain() are now in the top-level ftfy module.

    • ๐Ÿ”„ Changed the heuristic entirely. ftfy no longer needs to categorize every Unicode character, but only characters that are expected to appear in mojibake.

    • ๐Ÿš€ Because of the new heuristic, ftfy will no longer have to release a new version for every new version of Unicode. It should also run faster and use less RAM when imported.

    • The heuristic ftfy.badness.is_bad(text) can be used to determine whether there appears to be mojibake in a string. Some users were already using the old function sequence_weirdness() for that, but this one is actually designed for that purpose.

    • Instead of a pile of named keyword arguments, ftfy functions now take in a TextFixerConfig object. The keyword arguments still work, and become settings that override the defaults in TextFixerConfig.

    • โž• Added support for UTF-8 mixups with Windows-1253 and Windows-1254.

    • ๐Ÿ“š Overhauled the documentation:

  • v5.9 Changes

    February 10, 2021

    This version is brought to you by the letter ร  and the number 0xC3.

    • ๐Ÿ‘‰ Tweaked the heuristic to decode, for example, "รƒ " as the letter "ร " more often.

    • This combines with the non-breaking-space fixer to decode "รƒ " as "ร " as well. However, in many cases, the text " รƒ " was intended to be " ร  ", preserving the space -- the underlying mojibake had two spaces after it, but the Web coalesced them into one. We detect this case based on common French and Portuguese words, and preserve the space when it appears intended.

    Thanks to @zehavoc for bringing to my attention how common this case is.

    • โšก๏ธ Updated the data file of Unicode character categories to Unicode 13, as used in Python 3.9. (No matter what version of Python you're on, ftfy uses the same data.)
  • v5.8 Changes

    July 17, 2020
    • ๐Ÿ‘Œ Improved detection of UTF-8 mojibake of Greek, Cyrillic, Hebrew, and Arabic scripts.

    • ๐Ÿ›  Fixed the undeclared dependency on setuptools by removing the use of pkg_resources.

  • v5.7 Changes

    February 18, 2020
    • โšก๏ธ Updated the data file of Unicode character categories to Unicode 12.1, as used in Python 3.8. (No matter what version of Python you're on, ftfy uses the same data.)

    • Corrected an omission where short sequences involving the ACUTE ACCENT character were not being fixed.

  • v5.6 Changes

    August 07, 2019
    • ๐Ÿ‘ The unescape_html function now supports all the HTML5 entities that appear in html.entities.html5, including those with long names such as ˝.

    • Unescaping of numeric HTML entities now uses the standard library's html.unescape, making edge cases consistent.

    (The reason we don't run html.unescape on all text is that it's not always appropriate to apply, and can lead to false positive fixes. The text "This&NotThat" should not have "&Not" replaced by a symbol, as html.unescape would do.)

    • ๐Ÿ‘ On top of Python's support for HTML5 entities, ftfy will also convert HTML escapes of common Latin capital letters that are (nonstandardly) written in all caps, such as Ñ for ร‘.