chardet v3.0.0 Release Notes
Release Date: 2017-04-11 // about 7 years ago-
๐ This release is long overdue, but still mostly serves as a placeholder for the impending 4.0.0 release, which will have retrained models for better accuracy. For now, this release will get the following improvements up on PyPI:
- โ Added support for Turkish ISO-8859-9 detection (PR #41, thanks @queeup)
- Commented out large unused sections of Big5 and EUC-KR tables to save memory (8bc4b89)
- โ Removed Python 3.2 from testing, but add 3.4 - 3.6
- Ensure that stdin is open with mode
'rb'
forchardetect
CLI. (PR #38, thanks @lpsinger) - ๐ Fixed
chardetect
crash with non-ascii file names (PR #39, thanks @nkanaev) - Made naming conventions more Pythonic throughout (no more
mTypicalPositiveRatio
, and insteadtypical_positive_ratio
) - โ Modernized test scripts and infrastructure so we've got Travis testing and all that stuff
- Rename
filter_without_english_words
tofilter_international_words
and make it match current Mozilla implementation (PR #44, thanks @rsnair2) - Updated
filter_english_letters
to match C implementation (c665459) - ๐ Temporarily disabled Hungarian ISO-8859-2 and Windows-1250 detection because it is very inaccurate (da6c0a0)
- ๐ Allow CLI sub-package to be importable (PR #55)
- โ Add a
hypotheis
-based test (PR #66, thanks @DRMacIver) - Strip endianness from UTF with BOM predictions so that the encoding can be passed directly to
bytes.decode()
(PR #73, thanks @snoack) - ๐ Fixed broken links in docs (PR #90, thanks @roskakori)
- โ Added early exit to
chardetect
when encoding is detected instead of looping through entire file (PR #103, thanks @jpz) - ๐ Use
bytearray
objects internally instead ofwrap_ord
calls, which provides a nice performance boost across the board (PR #106) - โ Add
language
property to probers andUniversalDetector
results (PR #180) - ๐ Mark the 5 known test failures as such so we can have more useful Travis build results in the meantime (d588407)