Text Processing packages

Showing projects tagged as Text Processing

  • Jieba

    9.8 0.0 L5 Python
    结巴中文分词
  • MarkItDown

    9.8 9.4 Python
    Python tool for converting files and office documents to Markdown.
  • httpie

    9.7 6.6 L3 Python
    🥧 HTTPie CLI — modern, user-friendly command-line HTTP client for the API era. JSON support, colors, sessions, downloads, plugins & more.
  • mem0

    9.6 9.8 Python
    Memory for AI Agents; SOTA in AI Agent Memory, beating OpenAI Memory in accuracy by 26% - https://mem0.ai/research
  • Docling

    9.6 9.7 Python
    Get your documents ready for gen AI
  • pydantic

    9.5 9.8 Python
    Data validation using Python type hints
  • MkDocs

    9.4 6.8 L5 Python
    Project documentation with Markdown.
  • gensim

    9.4 5.7 L3 Python
    Topic Modelling for Humans
  • Jinja2

    9.0 8.2 L3 Python
    A very fast and expressive template engine.
  • Pattern

    8.8 0.0 L2 Python
    Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
  • Sphinx

    8.7 9.9 L2 Python
    The Sphinx documentation generator
  • fuzzywuzzy

    8.7 0.0 L4 Python
    DISCONTINUED. Fuzzy String Matching in Python
  • TextBlob

    8.7 7.8 L3 Python
    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
  • HTTP Prompt

    8.5 0.0 L4 Python
    An interactive command-line HTTP and API testing client built on top of HTTPie featuring autocomplete, syntax highlighting, and more. https://twitter.com/httpie
  • Stanza

    8.5 9.3 Python
    Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
  • WeasyPrint

    8.4 9.5 L1 Python
    The awesome document factory
  • PDFMiner

    8.3 0.0 L3 Python
    DISCONTINUED. Python PDF Parser (Not actively maintained). Check out pdfminer.six.
  • PyMuPDF

    8.3 9.7 Python
    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
  • xmltodict

    8.0 7.9 L4 Python
    Python module that makes working with XML feel like you are working with JSON
  • 汉字拼音转换工具(Python 版)

    7.9 6.8 Python
    汉字转拼音(pypinyin)
  • coala

    7.9 0.0 L4 Python
    coala provides a unified command-line interface for linting and fixing all your code, regardless of the programming languages you use.
  • Lark

    7.8 7.8 Python
    Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
  • Python-Markdown

    7.7 7.5 Python
    A Python implementation of John Gruber’s Markdown with Extension support.
  • sqlparse

    7.6 7.6 L4 Python
    A non-validating SQL parser module for Python
  • sumy

    7.4 6.3 L5 Python
    Module for automatic summarization of text documents and HTML pages.
  • trafilatura

    7.3 8.5 Python
    Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
  • Pygments

    7.3 -
    A generic syntax highlighter.
  • phonenumbers

    7.2 8.4 L4 Python
    Python port of Google's libphonenumber
  • asciimatics

    7.2 6.4 L2 Python
    A cross platform package to do curses-like operations, plus higher level APIs and widgets to create text UIs and ASCII art animations
  • ftfy

    7.1 8.5 L4 Python
    Fixes mojibake and other glitches in Unicode text, after the fact.