Changelog History
Page 4
-
v3.3.1 Changes
December 12, 2014⏪ This version restores compatibility with Python 2.6.
-
v3.3.0 Changes
August 16, 2014Heuristic changes:
- Certain symbols are marked as "ending punctuation" that may naturally occur after letters. When they follow an accented capital letter and look like mojibake, they will not be "fixed" without further evidence. An example is that "MARQUÉ…" will become "MARQUÉ...", and not "MARQUɅ".
🆕 New features:
ftfy.explain_unicode
is a diagnostic function that shows you what's going on in a Unicode string. It shows you a table with each code point in hexadecimal, its glyph, its name, and its Unicode category.🛠
ftfy.fixes.decode_escapes
adds a feature missing from the standard library: it lets you decode a Unicode string with backslashed escape sequences in it (such as "\u2014") the same way that Python itself would.🚀
ftfy.streamtester
is a release of the code that I use to test ftfy on an endless stream of real-world data from Twitter. With the new heuristics, the false positive rate of ftfy is about 1 per 6 million tweets. (See the "Accuracy" section of the documentation.)
🗄 Deprecations:
👍 Python 2.6 is no longer supported.
remove_unsafe_private_use
is no longer needed in any current version of Python. This fixer will disappear in a later version of ftfy.
-
v3.2.0 Changes
June 27, 2014fix_line_breaks
fixes three additional characters that are considered line breaks in some environments, such as Javascript, and Python's "codecs" library. These are all now replaced with \n:U+0085 , with alias "NEXT LINE" U+2028 LINE SEPARATOR U+2029 PARAGRAPH SEPARATOR
-
v3.1.3 Changes
May 15, 2014- 🛠 Fix
utf-8-variants
so it never outputs surrogate codepoints, even on Python 2 where that would otherwise be possible.
- 🛠 Fix
-
v3.1.2 Changes
January 29, 2014- 🛠 Fix bug in 3.1.1 where strings with backslashes in them could never be fixed
-
v3.1.1 Changes
January 29, 2014➕ Add the
ftfy.bad_codecs
package, which registers new codecs that can decoding things that Python may otherwise refuse to decode:utf-8-variants
, which decodes CESU-8 and its Java lookalikesloppy-windows-*
, which decodes character-map encodings while treating unmapped characters as Latin-1
Simplify the code using
ftfy.bad_codecs
.
-
v3.0.6 Changes
November 05, 2013fix_entities
can now be True, False, or 'auto'. The new case is True, which will decode all entities, even in text that already contains angle brackets. This may also be faster, because it doesn't have to check.- 🏗
build_data.py
will refuse to run on Python < 3.3, to prevent building an inconsistent data file.
-
v3.0.5 Changes
November 01, 2013- 🛠 Fix the arguments to
fix_file
, because they were totally wrong.
- 🛠 Fix the arguments to
-
v3.0.4 Changes
October 01, 2013- ⏪ Restore compatibility with Python 2.6.
-
v3.0.3 Changes
September 09, 2013- 🛠 Fixed an ugly regular expression bug that prevented ftfy from importing on a narrow build of Python.