All Versions
23
Latest Version
Avg Release Cycle
35 days
Latest Release
56 days ago

Changelog History
Page 1

  • v2.0.7 Changes

    October 11, 2021

    βž• Added

    • βž• Add support for Kazakh (Cyrillic) language detection (PR #109)

    πŸ”„ Changed

    • Further, improve inferring the language from a given single-byte code page (PR #112)
    • πŸ‘ Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
    • 🐎 Refactoring for potential performance improvements in loops from @adbar (PR #113)
    • Various detection improvement (MD+CD) (PR #117)

    βœ‚ Removed

    • βœ‚ Remove redundant logging entry about detected language(s) (PR #115)

    πŸ›  Fixed

    • πŸ›  Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
  • v2.0.6 Changes

    September 18, 2021

    πŸ›  Fixed

    • πŸ‘€ Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
    • πŸ›  Fix CLI crash when using --minimal output in certain cases (PR #103)

    πŸ”„ Changed

    • Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
  • v2.0.5 Changes

    September 14, 2021

    πŸ”„ Changed

    • πŸ‘ The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
    • βͺ The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
    • The Unicode detection is slightly improved (PR #93)
    • Add syntax sugar __bool__ for results CharsetMatches list-container (PR #91)

    βœ‚ Removed

    • ⚠ The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)

    πŸ›  Fixed

    • In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
    • πŸ”Œ Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
    • The MANIFEST.in was not exhaustive (PR #78)
  • v2.0.4 Changes

    July 30, 2021

    πŸ›  Fixed

    • πŸ‘» The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
    • πŸ›  Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
    • The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
    • Submatch factoring could be wrong in rare edge cases (PR #72)
    • Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
    • πŸ›  Fix line endings from CRLF to LF for certain project files (PR #67)

    πŸ”„ Changed

    • Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
    • πŸ‘ Allow fallback on specified encoding if any (PR #71)
  • v2.0.3 Changes

    July 16, 2021

    πŸ”„ Changed

    • Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
    • According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
  • v2.0.2 Changes

    July 15, 2021

    πŸ›  Fixed

    • πŸ›° Empty/Too small JSON payload miss-detection fixed. Report from @tseaver (PR #59)

    πŸ”„ Changed

    • Don't inject unicodedata2 into sys.modules from @akx (PR #57)
  • v2.0.1 Changes

    July 13, 2021

    πŸ›  Fixed

    • 🍱 Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from @sethmlarson. (PR #55)
    • Using explain=False permanently disable the verbose output in the current runtime (PR #47)
    • πŸ”Š One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
    • πŸ›  Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)

    πŸ”„ Changed

    • 0️⃣ Public function normalize default args values were not aligned with from_bytes (PR #53)

    βž• Added

    • You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
  • v2.0.0 Changes

    July 02, 2021

    πŸ”„ Changed

    • πŸš€ 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
    • Accent has been made on UTF-8 detection, should perform rather instantaneous.
    • The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
    • The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
    • The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
    • utf_7 detection has been reinstated.

    βœ‚ Removed

    • πŸ“¦ This package no longer require anything when used with Python 3.5 (Dropped cached_property)
    • βœ‚ Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, VolapΓΌk, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
    • 🚚 The exception hook on UnicodeDecodeError has been removed.

    πŸ—„ Deprecated

    • Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0

    πŸ›  Fixed

    • The CLI output used the relative path of the file(s). Should be absolute.
  • v1.4.1 Changes

    May 28, 2021

    πŸ›  Fixed

    • πŸ”§ Logger configuration/usage no longer conflict with others (PR #44)
  • v1.4.0 Changes

    May 21, 2021

    βœ‚ Removed

    • πŸ“¦ Using standard logging instead of using the package loguru.
    • ⬇️ Dropping nose test framework in favor of the maintained pytest.
    • πŸ“¦ Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
    • Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
    • πŸ‘ Stop support for UTF-7 that does not contain a SIG.
    • ⬇️ Dropping PrettyTable, replaced with pure JSON output in CLI.

    πŸ›  Fixed

    • BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
    • Not searching properly for the BOM when trying utf32/16 parent codec.

    πŸ”„ Changed

    • πŸ“¦ Improving the package final size by compressing frequencies.json.
    • πŸ›° Huge improvement over the larges payload.

    βž• Added

    • CLI now produces JSON consumable output.
    • Return ASCII if given sequences fit. Given reasonable confidence.