All Versions
40
Latest Version
Avg Release Cycle
26 days
Latest Release
668 days ago

Changelog History
Page 1

  • v3.0.1 Changes

    November 18, 2022

    ๐Ÿ›  Fixed

    • Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)

    ๐Ÿ”„ Changed

    • Speedup provided by mypy/c 0.990 on Python >= 3.7
  • v3.0.0 Changes

    October 20, 2022

    โž• Added

    • ๐ŸŒฒ Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
    • ๐Ÿ‘Œ Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
    • Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
    • normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)

    ๐Ÿ”„ Changed

    • ๐Ÿ“‡ Build with static metadata using 'build' frontend
    • ๐Ÿ‘‰ Make the language detection stricter
    • Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1

    ๐Ÿ›  Fixed

    • CLI with opt --normalize fail when using full path for files
    • ๐Ÿ”Œ TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
    • ๐Ÿ“š Sphinx warnings when generating the documentation

    โœ‚ Removed

    • Coherence detector no longer return 'Simple English' instead return 'English'
    • Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
    • ๐Ÿ’ฅ Breaking: Method first() and best() from CharsetMatch
    • UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
    • ๐Ÿ’ฅ Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
    • ๐Ÿ’ฅ Breaking: Top-level function normalize
    • Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
    • ๐Ÿ‘Œ Support for the backport unicodedata2
  • v3.0.0.rc1 Changes

    October 18, 2022

    โž• Added

    • ๐ŸŒฒ Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
    • ๐Ÿ‘Œ Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
    • Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio

    ๐Ÿ”„ Changed

    • ๐Ÿ“‡ Build with static metadata using 'build' frontend
    • ๐Ÿ‘‰ Make the language detection stricter

    ๐Ÿ›  Fixed

    • CLI with opt --normalize fail when using full path for files
    • ๐Ÿ”Œ TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it

    โœ‚ Removed

    • Coherence detector no longer return 'Simple English' instead return 'English'
    • Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
  • v3.0.0.b2 Changes

    August 21, 2022

    โž• Added

    • normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)

    โœ‚ Removed

    • ๐Ÿ’ฅ Breaking: Method first() and best() from CharsetMatch
    • UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)

    ๐Ÿ›  Fixed

    • ๐Ÿ“š Sphinx warnings when generating the documentation
  • v3.0.0.b1 Changes

    August 15, 2022

    ๐Ÿ”„ Changed

    • Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1

    โœ‚ Removed

    • ๐Ÿ’ฅ Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
    • ๐Ÿ’ฅ Breaking: Top-level function normalize
    • Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
    • ๐Ÿ‘Œ Support for the backport unicodedata2
  • v2.1.1 Changes

    August 19, 2022

    ๐Ÿ—„ Deprecated

    • โฑ Function normalize scheduled for removal in 3.0

    ๐Ÿ”„ Changed

    • โœ‚ Removed useless call to decode in fn is_unprintable (#206)

    ๐Ÿ›  Fixed

    • Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from @aleksandernovikov (#204)
  • v2.1.0 Changes

    June 19, 2022

    โž• Added

    • Output the Unicode table version when running the CLI with --version (PR #194)

    ๐Ÿ”„ Changed

    • Re-use decoded buffer for single byte character sets from @nijel (PR #175)
    • ๐Ÿ›  Fixing some performance bottlenecks from @deedy5 (PR #183)

    ๐Ÿ›  Fixed

    • โ†ช Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
    • 0๏ธโƒฃ CLI default threshold aligned with the API threshold from @oleksandr-kuzmenko (PR #181)

    โœ‚ Removed

    • ๐Ÿ‘Œ Support for Python 3.5 (PR #192)

    ๐Ÿ—„ Deprecated

    • โฑ Use of backport unicodedata from unicodedata2 as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
  • v2.0.12 Changes

    February 12, 2022

    ๐Ÿ›  Fixed

    • ASCII miss-detection on rare cases (PR #170)
  • v2.0.11 Changes

    January 30, 2022

    โž• Added

    • ๐Ÿ‘ Explicit support for Python 3.11 (PR #164)

    ๐Ÿ”„ Changed

    • ๐ŸŒฒ The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
  • v2.0.10 Changes

    January 04, 2022

    ๐Ÿ›  Fixed

    • Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)

    ๐Ÿ”„ Changed

    • Skipping the language-detection (CD) on ASCII (PR #155)