Charset Normalizer v2.0.0 Release Notes

Release Date: 2021-07-02 // almost 2 years ago
  • πŸ”„ Changed

    • πŸš€ 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
    • Accent has been made on UTF-8 detection, should perform rather instantaneous.
    • The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
    • The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
    • The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
    • utf_7 detection has been reinstated.

    βœ‚ Removed

    • πŸ“¦ This package no longer require anything when used with Python 3.5 (Dropped cached_property)
    • βœ‚ Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, VolapΓΌk, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
    • 🚚 The exception hook on UnicodeDecodeError has been removed.

    πŸ—„ Deprecated

    • Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0

    πŸ›  Fixed

    • The CLI output used the relative path of the file(s). Should be absolute.