All Versions
44
Latest Version
Avg Release Cycle
81 days
Latest Release
155 days ago

Changelog History
Page 4

  • v3.1.3 Changes

    May 15, 2014
    • ๐Ÿ›  Fix utf-8-variants so it never outputs surrogate codepoints, even on Python 2 where that would otherwise be possible.
  • v3.1.2 Changes

    January 29, 2014
    • ๐Ÿ›  Fix bug in 3.1.1 where strings with backslashes in them could never be fixed
  • v3.1.1 Changes

    January 29, 2014
    • โž• Add the ftfy.bad_codecs package, which registers new codecs that can decoding things that Python may otherwise refuse to decode:

      • utf-8-variants, which decodes CESU-8 and its Java lookalike
      • sloppy-windows-*, which decodes character-map encodings while treating unmapped characters as Latin-1
    • Simplify the code using ftfy.bad_codecs.

  • v3.0.6 Changes

    November 05, 2013
    • fix_entities can now be True, False, or 'auto'. The new case is True, which will decode all entities, even in text that already contains angle brackets. This may also be faster, because it doesn't have to check.
    • ๐Ÿ— build_data.py will refuse to run on Python < 3.3, to prevent building an inconsistent data file.
  • v3.0.5 Changes

    November 01, 2013
    • ๐Ÿ›  Fix the arguments to fix_file, because they were totally wrong.
  • v3.0.4 Changes

    October 01, 2013
    • โช Restore compatibility with Python 2.6.
  • v3.0.3 Changes

    September 09, 2013
    • ๐Ÿ›  Fixed an ugly regular expression bug that prevented ftfy from importing on a narrow build of Python.
  • v3.0.2 Changes

    September 04, 2013
    • ๐Ÿ›  Fixed some false positives.

      • Basically, 3.0.1 was too eager to treat text as MacRoman or cp437 when three consecutive characters coincidentally decoded as UTF-8. Increased the cost of those encodings so that they have to successfully decode multiple UTF-8 characters.
      • See tests/test_real_tweets.py for the new test cases that were added as a result.
  • v3.0.1 Changes

    August 30, 2013
    • Fix bug in fix_java_encoding that led to only the first instance of CESU-8 badness per line being fixed
    • โž• Add a fixer that removes unassigned characters that can break Python 3.3 (http://bugs.python.org/issue18183)
  • v3.0 Changes

    August 26, 2013
    • Generally runs faster
    • Idempotent
    • Simplified decoding logic
    • Understands more encodings and more kinds of mistakes
    • Takes options that enable or disable particular normalization steps
    • Long line handling: now the time-consuming step (fix_text_encoding) will be consistently skipped on long lines, but all other fixes will apply
    • โœ… Tested on millions of examples from Twitter, ensuring a near-zero rate of false positives