Utilities packages

Showing projects tagged as Text Processing and Utilities

  • httpie

    9.7 6.6 L3 Python
    🥧 HTTPie CLI — modern, user-friendly command-line HTTP client for the API era. JSON support, colors, sessions, downloads, plugins & more.
  • Docling

    9.6 9.7 Python
    Get your documents ready for gen AI
  • pydantic

    9.5 9.8 Python
    Data validation using Python type hints
  • Sphinx

    8.7 9.9 L2 Python
    The Sphinx documentation generator
  • HTTP Prompt

    8.5 0.0 L4 Python
    An interactive command-line HTTP and API testing client built on top of HTTPie featuring autocomplete, syntax highlighting, and more. https://twitter.com/httpie
  • PyMuPDF

    8.3 9.7 Python
    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
  • 汉字拼音转换工具(Python 版)

    7.9 6.8 Python
    汉字转拼音(pypinyin)
  • Lark

    7.8 7.8 Python
    Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
  • trafilatura

    7.3 8.5 Python
    Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
  • xhtml2pdf

    6.8 7.1 L1 Python
    A library for converting HTML into PDFs using ReportLab
  • markdown2

    6.7 8.4 Python
    markdown2: A fast and complete implementation of Markdown in Python
  • python-readability

    6.7 7.2 Python
    fast python port of arc90's readability tool, updated to match latest readability.js!
  • aeneas

    6.5 0.0 L3 Python
    aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
  • pdftabextract

    6.4 0.0 L3 Python
    A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
  • quepy

    5.5 0.0 L5 Python
    A python framework to transform natural language questions to queries in a database query language.
  • typeguard

    5.3 7.9 Python
    Run-time type checker for Python
  • Data Profiler

    5.2 5.5 Python
    What's in your data? Extract schema, statistics and entities from datasets
  • Goose3

    4.3 4.1 HTML
    A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
  • ijson

    4.0 0.3 Python
    DISCONTINUED. Iterative JSON parser with Pythonic interface
  • Charset Normalizer

    3.8 9.0 Python
    Truly universal encoding detector in pure Python
  • inscriptis -- HTML to text conversion library, command line client and Web service

    2.9 7.1 Python
    A python based HTML to text conversion library, command line client and Web service.
  • AnyAscii

    2.8 0.0 Kotlin
    Unicode to ASCII transliteration - C Elixir Go Java JS Julia PHP Python Ruby Rust Shell .NET
  • pangu.py

    2.7 1.9 L5 Python
    Paranoid text spacing in Python
  • json-streamer

    2.6 2.3 Python
    A fast streaming JSON parser for Python that generates SAX-like events using yajl
  • uniout

    2.3 1.8 L5 Python
    Never see escaped bytes in output.
  • PatZilla

    2.2 5.4 Python
    PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.
  • nider

    2.2 0.0 Python
    Python package to add text to images, textures and different backgrounds
  • Kotori

    2.1 2.0 Python
    A flexible data historian based on InfluxDB, Grafana, MQTT, and more. Free, open, simple.
  • odin-slides

    2.0 7.8 Python
    This is an advanced Python tool that empowers you to effortlessly draft customizable PowerPoint slides using the Generative Pre-trained Transformer (GPT) of your choice. Leveraging the capabilities of Large Language Models (LLM), odin-slides enables you to turn the lengthiest Word documents into well organized presentations.
  • json2xml

    2.0 7.5 Python
    json to xml converter in python3