Changelog History
Page 2
-
v5.6 Changes
August 07, 2019๐ The
unescape_html
function now supports all the HTML5 entities that appear inhtml.entities.html5
, including those with long names such as˝
.Unescaping of numeric HTML entities now uses the standard library's
html.unescape
, making edge cases consistent.
(The reason we don't run
html.unescape
on all text is that it's not always appropriate to apply, and can lead to false positive fixes. The text "This&NotThat" should not have "&Not" replaced by a symbol, ashtml.unescape
would do.)- ๐ On top of Python's support for HTML5 entities, ftfy will also convert HTML
escapes of common Latin capital letters that are (nonstandardly) written
in all caps, such as
Ñ
forร
.
-
v5.5.1 Changes
September 14, 2018โ Added Python 3.7 support.
โก๏ธ Updated the data file of Unicode character categories to Unicode 11, as used in Python 3.7.0. (No matter what version of Python you're on, ftfy uses the same data.)
-
v5.5 Changes
September 06, 2018Recent versions have emphasized making a reasonable attempt to fix short, common mojibake sequences, such as
รยป
. In this version, we've expanded the heuristics to recognize these sequences in MacRoman as well as Windows-125x encodings.๐ A related rule for fixing isolated Windows-1252/UTF-8 mixups, even when they were inconsistent with the rest of the string, claimed to work on Latin-1/UTF-8 mixups as well, but in practice it didn't. We've made the rule more robust.
๐ Fixed a failure when testing the CLI on Windows.
โ Removed the
pytest-runner
invocation from setup.py, as it created complex dependencies that would stop setup.py from working in some environments. Thepytest
command still works fine.pytest-runner
is just too clever.
-
v5.4.1 Changes
June 14, 2018- ๐ Fixed a bug in the
setup.py
metadata.
This bug was causing ftfy, a package that fixes encoding mismatches, to not install in some environments due to an encoding mismatch. (We were really putting the "meta" in "metadata" here.)
- ๐ Fixed a bug in the
-
v5.4 Changes
June 01, 2018- ftfy was still too conservative about fixing short mojibake sequences, such as "aoรยปt" -> "aoรปt", when the broken version contained punctuation such as curly or angle quotation marks.
The new heuristic observes in some cases that, even if quotation marks are expected to appear next to letters, it is strange to have an accented capital A before the quotation mark and more letters after the quotation mark.
๐ Provides better metadata for the new PyPI.
โ Switched from nosetests to pytest.
-
v5.3 Changes
January 25, 2018- A heuristic has been too conservative since version 4.2, causing a regression
compared to previous versions: ftfy would fail to fix mojibake of common
characters such as
รก
when seen in isolation. A new heuristic now makes it possible to fix more of these common cases with less evidence.
- A heuristic has been too conservative since version 4.2, causing a regression
compared to previous versions: ftfy would fail to fix mojibake of common
characters such as
-
v5.2 Changes
November 27, 2017The command-line tool will not accept the same filename as its input and output. (Previously, this would write a zero-length file.)
The
uncurl_quotes
fixer, which replaces curly quotes with straight quotes, now also replaces MODIFIER LETTER APOSTROPHE.Codepoints that contain two Latin characters crammed together for legacy encoding reasons are replaced by those two separate characters, even in NFC mode. We formerly did this just with ligatures such as
๏ฌ
andฤฒ
, but now this includes the Afrikaans digraphล
and Serbian/Croatian digraphs such asว
.
-
v5.1.1 Changes
May 15, 2017๐ These releases fix two unrelated problems with the tests, one in each version.
โ v5.1.1: fixed the CLI tests (which are new in v5) so that they pass on Windows, as long as the Python output encoding is UTF-8.
v4.4.3: added the
# coding: utf-8
declaration to two files that were missing it, so that tests can run on Python 2.
-
v5.1 Changes
April 07, 2017- โ Removed the dependency on
html5lib
by dropping support for Python 3.2.
We previously used the dictionary
html5lib.constants.entities
to decode HTML entities. In Python 3.3 and later, that exact dictionary is now in the standard library ashtml.entities.html5
.- ๐ Moved many test cases about how particular text should be fixed into
test_cases.json
, which may ease porting to other languages.
The functionality of this version remains the same as 5.0.2 and 4.4.2.
- โ Removed the dependency on
-
v5.0.2 Changes
March 21, 2017โ Added a
MANIFEST.in
that puts files such as the license file and this ๐ changelog inside the source distribution.