ftfy v6.0 Release Notes

Release Date: 2021-04-02 // about 3 years ago
    • New function: ftfy.fix_and_explain() can describe all the transformations that happen when fixing a string. This is similar to what ftfy.fixes.fix_encoding_and_explain() did in previous versions, but it can fix more than the encoding.

    • fix_and_explain() and fix_encoding_and_explain() are now in the top-level ftfy module.

    • ๐Ÿ”„ Changed the heuristic entirely. ftfy no longer needs to categorize every Unicode character, but only characters that are expected to appear in mojibake.

    • ๐Ÿš€ Because of the new heuristic, ftfy will no longer have to release a new version for every new version of Unicode. It should also run faster and use less RAM when imported.

    • The heuristic ftfy.badness.is_bad(text) can be used to determine whether there appears to be mojibake in a string. Some users were already using the old function sequence_weirdness() for that, but this one is actually designed for that purpose.

    • Instead of a pile of named keyword arguments, ftfy functions now take in a TextFixerConfig object. The keyword arguments still work, and become settings that override the defaults in TextFixerConfig.

    • โž• Added support for UTF-8 mixups with Windows-1253 and Windows-1254.

    • ๐Ÿ“š Overhauled the documentation: https://ftfy.readthedocs.org