ftfy v4.3.0 Release Notes

Release Date: 2016-12-29 // over 6 years ago
  • ftfy has gotten by for four years without dependencies on other Python ๐Ÿšง libraries, but now we can spare ourselves some code and some maintenance burden by delegating certain tasks to other libraries that already solve them well. This version now depends on the html5lib and wcwidth libraries.

    ๐Ÿ”‹ Feature changes:

    • The remove_control_chars fixer will now remove some non-ASCII control characters as well, such as deprecated Arabic control characters and byte-order marks. Bidirectional controls are still left as is.

    This should have no impact on well-formed text, while cleaning up many characters that the Unicode Consortium deems "not suitable for markup" (see Unicode Technical Report #20).

    • The unescape_html fixer uses a more thorough list of HTML entities, which it imports from html5lib.

    • ftfy.formatting now uses wcwidth to compute the width that a string will occupy in a text console.

    Heuristic changes:

    • โšก๏ธ Updated the data file of Unicode character categories to Unicode 9, as used in Python 3.6.0. (No matter what version of Python you're on, ftfy uses the same data.)

    ๐Ÿ—„ Pending deprecations:

    • ๐Ÿšš The remove_bom option will become deprecated in 5.0, because it has been superseded by remove_control_chars.

    • ftfy 5.0 will remove the previously deprecated name fix_text_encoding. It was renamed to fix_encoding in 4.0.

    • ftfy 5.0 will require Python 3.2 or later, as planned. Python 2 users, please specify ftfy < 5 in your dependencies if you haven't already.