ftfy v4.1.0 Release Notes

Release Date: 2016-02-25 // about 8 years ago
  • Heuristic changes:

    • ftfy can now deal with "lossy" mojibake. If your text has been run through a strict Windows-1252 decoder, such as the one in Python, it may contain the replacement character � (U+FFFD) where there were bytes that are unassigned in Windows-1252.

    Although ftfy won't recover the lost information, it can now detect this situation, replace the entire lossy character with �, and decode the rest of the characters. Previous versions would be unable to fix any string that contained U+FFFD.

    As an example, text in curly quotes that gets corrupted “ like this â€� now gets fixed to be “ like this �.

    • ⚡️ Updated the data file of Unicode character categories to Unicode 8.0, as used in Python 3.5.0. (No matter what version of Python you're on, ftfy uses the same data.)

    • Heuristics now count characters such as ~ and ^ as punctuation instead of wacky math symbols, improving the detection of mojibake in some edge cases.

    🆕 New features:

    • A new module, ftfy.formatting, can be used to justify Unicode text in a monospaced terminal. It takes into account that each character can take up anywhere from 0 to 2 character cells.

    • ⚡️ Internally, the utf-8-variants codec was simplified and optimized.