Charset Normalizer v3.0.0 release notes (2022-10-20)

« Changelog History

Charset Normalizer v3.0.0 Release Notes

Release Date: 2022-10-20 // over 1 year ago

➕ Added
- 🌲 Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- 👌 Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
- normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)
🔄 Changed
- 📇 Build with static metadata using 'build' frontend
- 👉 Make the language detection stricter
- Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
🛠 Fixed
- CLI with opt --normalize fail when using full path for files
- 🔌 TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
- 📚 Sphinx warnings when generating the documentation
✂ Removed
- Coherence detector no longer return 'Simple English' instead return 'English'
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
- 💥 Breaking: Method first() and best() from CharsetMatch
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
- 💥 Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
- 💥 Breaking: Top-level function normalize
- Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
- 👌 Support for the backport unicodedata2

Charset Normalizer v3.0.0

Version Release Notes from October 20, 2022 (over 1 year ago)

« Changelog History

Charset Normalizer v3.0.0 Release Notes

➕ Added

🔄 Changed

🛠 Fixed

✂ Removed