All Versions
Latest Version
Avg Release Cycle
Latest Release

Changelog History

  • v0.5.0 Changes

    • faster, more efficient code
    • โฌ‡๏ธ dropped support for Python 3.5
  • v0.4.0 Changes

    • ๐Ÿ†• new languages: Armenian, Greek, Macedonian, Norwegian (Bokmรฅl), and Polish
    • language data reviewed for: Dutch, Finnish, German, Hungarian, Latin, Russian, and Swedish
    • ๐Ÿšš Urdu removed of language list due to issues with the data
    • โž• add support for Python 3.10 and drop support for Python 3.4
    • ๐Ÿ‘Œ improved decomposition and tokenization algorithms
  • v0.3.0 Changes

    • ๐Ÿ‘Œ improved models and disambiguation
    • ๐Ÿ‘Œ improved tokenization
    • extended rules for German
  • v0.2.2 Changes

    • Work on decomposition rules
    • Reviewed language data
    • Cleaner code
  • v0.2.1 Changes

    • ๐Ÿ‘ Better decomposition into subwords by greedy algorithm
    • First benchmarks and data-based corrections: German, French, English, Spanish
  • v0.2.0 Changes

    • Languages added: Danish, Dutch, Finnish, Georgian, Indonesian, Latin, Latvian, Lithuanian, Luxembourgish, Turkish, Urdu
    • ๐Ÿ‘Œ Improved word pair coverage
    • Tokenization functions added
    • Limit greediness and range of potential candidates
  • v0.1.0 Changes

    • ๐Ÿš€ First release on PyPI