All Versions
40
Latest Version
Avg Release Cycle
26 days
Latest Release
668 days ago
Changelog History
Page 1
Changelog History
Page 1
-
v3.0.1 Changes
November 18, 2022๐ Fixed
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
๐ Changed
- Speedup provided by mypy/c 0.990 on Python >= 3.7
-
v3.0.0 Changes
October 20, 2022โ Added
- ๐ฒ Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- ๐ Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter
language_threshold
infrom_bytes
,from_path
andfrom_fp
to adjust the minimum expected coherence ratio normalizer --version
now specify if current version provide extra speedup (meaning mypyc compilation whl)
๐ Changed
- ๐ Build with static metadata using 'build' frontend
- ๐ Make the language detection stricter
- Optional: Module
md.py
can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
๐ Fixed
- CLI with opt --normalize fail when using full path for files
- ๐ TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
- ๐ Sphinx warnings when generating the documentation
โ Removed
- Coherence detector no longer return 'Simple English' instead return 'English'
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
- ๐ฅ Breaking: Method
first()
andbest()
from CharsetMatch - UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
- ๐ฅ Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
- ๐ฅ Breaking: Top-level function
normalize
- Breaking: Properties
chaos_secondary_pass
,coherence_non_latin
andw_counter
from CharsetMatch - ๐ Support for the backport
unicodedata2
-
v3.0.0.rc1 Changes
October 18, 2022โ Added
- ๐ฒ Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- ๐ Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter
language_threshold
infrom_bytes
,from_path
andfrom_fp
to adjust the minimum expected coherence ratio
๐ Changed
- ๐ Build with static metadata using 'build' frontend
- ๐ Make the language detection stricter
๐ Fixed
- CLI with opt --normalize fail when using full path for files
- ๐ TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
โ Removed
- Coherence detector no longer return 'Simple English' instead return 'English'
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
-
v3.0.0.b2 Changes
August 21, 2022โ Added
normalizer --version
now specify if current version provide extra speedup (meaning mypyc compilation whl)
โ Removed
- ๐ฅ Breaking: Method
first()
andbest()
from CharsetMatch - UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
๐ Fixed
- ๐ Sphinx warnings when generating the documentation
-
v3.0.0.b1 Changes
August 15, 2022๐ Changed
- Optional: Module
md.py
can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
โ Removed
- ๐ฅ Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
- ๐ฅ Breaking: Top-level function
normalize
- Breaking: Properties
chaos_secondary_pass
,coherence_non_latin
andw_counter
from CharsetMatch - ๐ Support for the backport
unicodedata2
- Optional: Module
-
v2.1.1 Changes
August 19, 2022๐ Deprecated
- โฑ Function
normalize
scheduled for removal in 3.0
๐ Changed
- โ Removed useless call to decode in fn is_unprintable (#206)
๐ Fixed
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from @aleksandernovikov (#204)
- โฑ Function
-
v2.1.0 Changes
June 19, 2022โ Added
- Output the Unicode table version when running the CLI with
--version
(PR #194)
๐ Changed
- Re-use decoded buffer for single byte character sets from @nijel (PR #175)
- ๐ Fixing some performance bottlenecks from @deedy5 (PR #183)
๐ Fixed
- โช Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
- 0๏ธโฃ CLI default threshold aligned with the API threshold from @oleksandr-kuzmenko (PR #181)
โ Removed
- ๐ Support for Python 3.5 (PR #192)
๐ Deprecated
- โฑ Use of backport unicodedata from
unicodedata2
as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
- Output the Unicode table version when running the CLI with
-
v2.0.12 Changes
February 12, 2022๐ Fixed
- ASCII miss-detection on rare cases (PR #170)
-
v2.0.11 Changes
January 30, 2022โ Added
- ๐ Explicit support for Python 3.11 (PR #164)
๐ Changed
- ๐ฒ The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
-
v2.0.10 Changes
January 04, 2022๐ Fixed
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
๐ Changed
- Skipping the language-detection (CD) on ASCII (PR #155)