Avg Release Cycle
56 days ago
- ➕ Add support for Kazakh (Cyrillic) language detection (PR #109)
- Further, improve inferring the language from a given single-byte code page (PR #112)
- 👍 Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
- 🐎 Refactoring for potential performance improvements in loops from @adbar (PR #113)
- Various detection improvement (MD+CD) (PR #117)
- ✂ Remove redundant logging entry about detected language(s) (PR #115)
- 🛠 Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
- 👀 Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
- 🛠 Fix CLI crash when using --minimal output in certain cases (PR #103)
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
- 👍 The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
- ⏪ The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
- The Unicode detection is slightly improved (PR #93)
- Add syntax sugar __bool__ for results CharsetMatches list-container (PR #91)
- ⚠ The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
- 🔌 Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
- The MANIFEST.in was not exhaustive (PR #78)
- 👻 The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
- 🛠 Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
- Submatch factoring could be wrong in rare edge cases (PR #72)
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
- 🛠 Fix line endings from CRLF to LF for certain project files (PR #67)
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
- 👍 Allow fallback on specified encoding if any (PR #71)
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
- 🍱 Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from @sethmlarson. (PR #55)
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
- 🔊 One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
- 🛠 Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
- 0️⃣ Public function normalize default args values were not aligned with from_bytes (PR #53)
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
- 🚀 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
- utf_7 detection has been reinstated.
- 📦 This package no longer require anything when used with Python 3.5 (Dropped cached_property)
- ✂ Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
- 🚚 The exception hook on UnicodeDecodeError has been removed.
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
- The CLI output used the relative path of the file(s). Should be absolute.
- 🔧 Logger configuration/usage no longer conflict with others (PR #44)
- 📦 Using standard logging instead of using the package loguru.
- ⬇️ Dropping nose test framework in favor of the maintained pytest.
- 📦 Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
- 👍 Stop support for UTF-7 that does not contain a SIG.
- ⬇️ Dropping PrettyTable, replaced with pure JSON output in CLI.
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
- Not searching properly for the BOM when trying utf32/16 parent codec.
- 📦 Improving the package final size by compressing frequencies.json.
- 🛰 Huge improvement over the larges payload.
- CLI now produces JSON consumable output.
- Return ASCII if given sequences fit. Given reasonable confidence.