ftfy v4.0.0 Release Notes
Release Date: 2015-04-10 // almost 8 years ago-
๐ฅ Breaking changes:
0๏ธโฃ The default normalization form is now NFC, not NFKC. NFKC replaces a large number of characters with 'equivalent' characters, and some of these replacements are useful, but some are not desirable to do by default.
The
fix_text
function has some new options that perform more targeted operations that are part of NFKC normalization, such asfix_character_width
, without requiring hitting all your text with the huge mallet that is NFKC.- If you were already using NFC normalization, or in general if you want to
preserve the spacing of CJK text, you should be sure to set
fix_character_width=False
.
- If you were already using NFC normalization, or in general if you want to
preserve the spacing of CJK text, you should be sure to set
The
remove_unsafe_private_use
parameter has been removed entirely, after two versions of deprecation. The function namefix_bad_encoding
is also gone.
๐ New features:
๐ Fixers for strange new forms of mojibake, including particularly clear cases of mixed UTF-8 and Windows-1252.
๐ New heuristics, so that ftfy can fix more stuff, while maintaining approximately zero false positives.
The command-line tool trusts you to know what encoding your input is in, and assumes UTF-8 by default. You can still tell it to guess with the
-g
option.๐ง The command-line tool can be configured with options, and can be used as a pipe.
Recognizes characters that are new in Unicode 7.0, as well as emoji from Unicode 8.0+ that may already be in use on iOS.
๐ Deprecations:
fix_text_encoding
is being renamed again, for conciseness and consistency. It's now simply calledfix_encoding
. The namefix_text_encoding
is available but emits a warning.
๐ Pending deprecations:
๐ Python 2.6 support is largely coincidental.
๐ Python 2.7 support is on notice. If you use Python 2, be sure to pin a version of ftfy less than 5.0 in your requirements.