Description
bcolz provides columnar, chunked data containers that can be
compressed either in-memory and on-disk. Column storage allows for
efficiently querying tables, as well as for cheap column addition and
removal. It is based on NumPy, and uses it
as the standard data container to communicate with bcolz objects, but
it also comes with support for import/export facilities to/from
HDF5/PyTables tables and pandas
dataframes.
bcolz objects are compressed by default not only for reducing
memory/disk storage, but also to improve I/O speed. The compression
process is carried out internally by Blosc, a
high-performance, multithreaded meta-compressor that is optimized for
binary data (although it works with text data just fine too).
bcolz can also use numexpr
internally (it does that by default if it detects numexpr installed)
so as to accelerate many vector and query operations (although it can
use pure NumPy for doing so too). numexpr can optimize the memory
usage and use multithreading for doing the computations, so it is
blazing fast. This, in combination with carray/ctable disk-based,
compressed containers, can be used for performing out-of-core
computations efficiently, but most importantly transparently.
Just to whet your appetite, here it is an example with real data, where
bcolz is already fulfilling the promise of accelerating memory I/O by
using compression:
bcolz alternatives and similar packages
Based on the "Science and Data Analysis" category.
Alternatively, view bcolz alternatives based on common mentions on social networks and blogs.
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more -
Interactive Parallel Computing with IPython
IPython Parallel: Interactive Parallel Computing in Python -
#<Sawyer::Resource:0x00007f547e829e00>
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites. -
bcbio-nextgen
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis -
PatZilla
PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources. -
ElasticBatch
Elasticsearch tool for easily collecting and batch inserting Python data and pandas DataFrames -
cclib
0.9 bcolz VS cclibA library for parsing and interpreting the results of computational chemistry packages.
CodeRabbit: AI Code Reviews for Developers
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of bcolz or a related project?