Charset Normalizer/CHANGELOG and Charset Normalizer Releases (Page 2)

All Versions

Latest Version

3.0.1

Avg Release Cycle

26 days

Latest Release

516 days ago

Changelog History

Page 2

v2.0.9 Changes
December 03, 2021
🔄 Changed
- 🌲 Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
🛠 Fixed
- 🌲 Wrong logging level applied when setting kwarg explain to True (PR #146)
v2.0.8 Changes
November 24, 2021
🔄 Changed
- 👌 Improvement over Vietnamese detection (PR #126)
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
- Efficiency improvements in cd/alphabet_languages from @adbar (PR #122)
- call sum() without an intermediary list following PEP 289 recommendations from @adbar (PR #129)
- 💅 Code style as refactored by Sourcery-AI (PR #131)
- Minor adjustment on the MD around european words (PR #133)
- ✂ Remove and replace SRTs from assets / tests (PR #139)
- 🎉 Initialize the library logger with a NullHandler by default from @nmaynes (PR #135)
- Setting kwarg explain to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
🛠 Fixed
- 🛠 Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
- Avoid using too insignificant chunk (PR #137)
➕ Added
- 🌲 Add and expose function set_logging_handler to configure a specific StreamHandler from @nmaynes (PR #135)
- ➕ Add CHANGELOG.md entries, format is based on Keep a Changelog (PR #141)
v2.0.7 Changes
October 11, 2021
➕ Added
- ➕ Add support for Kazakh (Cyrillic) language detection (PR #109)
🔄 Changed
- Further, improve inferring the language from a given single-byte code page (PR #112)
- 👍 Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
- 🐎 Refactoring for potential performance improvements in loops from @adbar (PR #113)
- Various detection improvement (MD+CD) (PR #117)
✂ Removed
- ✂ Remove redundant logging entry about detected language(s) (PR #115)
🛠 Fixed
- 🛠 Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
v2.0.6 Changes
September 18, 2021
🛠 Fixed
- 👀 Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
- 🛠 Fix CLI crash when using --minimal output in certain cases (PR #103)
🔄 Changed
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
v2.0.5 Changes
September 14, 2021
🔄 Changed
- 👍 The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
- ⏪ The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
- The Unicode detection is slightly improved (PR #93)
- Add syntax sugar __bool__ for results CharsetMatches list-container (PR #91)
✂ Removed
- ⚠ The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
🛠 Fixed
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
- 🔌 Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
- The MANIFEST.in was not exhaustive (PR #78)
v2.0.4 Changes
July 30, 2021
🛠 Fixed
- 👻 The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
- 🛠 Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
- Submatch factoring could be wrong in rare edge cases (PR #72)
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
- 🛠 Fix line endings from CRLF to LF for certain project files (PR #67)
🔄 Changed
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
- 👍 Allow fallback on specified encoding if any (PR #71)
v2.0.3 Changes
July 16, 2021
🔄 Changed
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
v2.0.2 Changes
July 15, 2021
🛠 Fixed
- 🛰 Empty/Too small JSON payload miss-detection fixed. Report from @tseaver (PR #59)
🔄 Changed
- Don't inject unicodedata2 into sys.modules from @akx (PR #57)
v2.0.1 Changes
July 13, 2021
🛠 Fixed
- 🍱 Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from @sethmlarson. (PR #55)
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
- 🔊 One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
- 🛠 Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
🔄 Changed
- 0️⃣ Public function normalize default args values were not aligned with from_bytes (PR #53)
➕ Added
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
v2.0.0 Changes
July 02, 2021
🔄 Changed
- 🚀 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
- utf_7 detection has been reinstated.
✂ Removed
- 📦 This package no longer require anything when used with Python 3.5 (Dropped cached_property)
- ✂ Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
- 🚚 The exception hook on UnicodeDecodeError has been removed.
🗄 Deprecated
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
🛠 Fixed
- The CLI output used the relative path of the file(s). Should be absolute.

Charset Normalizer changelog

Truly universal encoding detector in pure Python

Changelog History

Page 2

v2.0.9 Changes

🔄 Changed

🛠 Fixed

v2.0.8 Changes

🔄 Changed

🛠 Fixed

➕ Added

v2.0.7 Changes

➕ Added

🔄 Changed

✂ Removed

🛠 Fixed

v2.0.6 Changes

🛠 Fixed

🔄 Changed

v2.0.5 Changes

🔄 Changed

✂ Removed

🛠 Fixed

v2.0.4 Changes

🛠 Fixed

🔄 Changed

v2.0.3 Changes

🔄 Changed

v2.0.2 Changes

🛠 Fixed

🔄 Changed

v2.0.1 Changes

🛠 Fixed

🔄 Changed

➕ Added

v2.0.0 Changes

🔄 Changed

✂ Removed

🗄 Deprecated

🛠 Fixed

Charset Normalizer changelog

Truly universal encoding detector in pure Python

Changelog History Page 2

🔄 Changed

🛠 Fixed

🔄 Changed

🛠 Fixed

➕ Added

➕ Added

🔄 Changed

✂ Removed

🛠 Fixed

🛠 Fixed

🔄 Changed

🔄 Changed

✂ Removed

🛠 Fixed

🛠 Fixed

🔄 Changed

🔄 Changed

🛠 Fixed

🔄 Changed

🛠 Fixed

🔄 Changed

➕ Added

🔄 Changed

✂ Removed

🗄 Deprecated

🛠 Fixed

Changelog History

Page 2