ftfy v3.3.0 Release Notes

Release Date: 2014-08-16 // over 9 years ago
  • Heuristic changes:

    • Certain symbols are marked as "ending punctuation" that may naturally occur after letters. When they follow an accented capital letter and look like mojibake, they will not be "fixed" without further evidence. An example is that "MARQUÉ…" will become "MARQUÉ...", and not "MARQUɅ".

    🆕 New features:

    • ftfy.explain_unicode is a diagnostic function that shows you what's going on in a Unicode string. It shows you a table with each code point in hexadecimal, its glyph, its name, and its Unicode category.

    • 🛠 ftfy.fixes.decode_escapes adds a feature missing from the standard library: it lets you decode a Unicode string with backslashed escape sequences in it (such as "\u2014") the same way that Python itself would.

    • 🚀 ftfy.streamtester is a release of the code that I use to test ftfy on an endless stream of real-world data from Twitter. With the new heuristics, the false positive rate of ftfy is about 1 per 6 million tweets. (See the "Accuracy" section of the documentation.)

    🗄 Deprecations:

    • 👍 Python 2.6 is no longer supported.

    • remove_unsafe_private_use is no longer needed in any current version of Python. This fixer will disappear in a later version of ftfy.