Add check_call_in_cache method to check cache without calling function. https://github.com/joblib/joblib/pull/820
dask: avoid redundant scattering of large arguments to make a more efficient use of the network resources and avoid crashing dask with "OSError: [Errno 55] No buffer space available" or "ConnectionResetError: [Errno 104] connection reset by peer". https://github.com/joblib/joblib/pull/1133
joblib.Memorycaching system compatible with `numpy
. Also make it explicit in the documentation that users should now expect to have theirjoblib.Memory
cache invalidated when eitherjoblib
or a third party library involved in the cached values definition is upgraded. In particular, users updatingjoblib
to a release that includes this fix will see their previous cache invalidated if they contained reference tonumpy` objects. https://github.com/joblib/joblib/pull/1136
✂ Remove deprecated
🚀 Release 0.17.0
🛠 Fix a spurious invalidation of
Memory.cache'd functions called with
Parallelunder Jupyter or IPython. https://github.com/joblib/joblib/pull/1093
⬆️ Bump vendored loky to 2.9.0 and cloudpickle to 1.6.0. In particular this fixes a problem to add compat for Python 3.9.
🚀 Release 0.16.0
🛠 Fix a problem in the constructors of of Parallel backends classes that inherit from the
AutoBatchingMixinthat prevented the dask backend to properly batch short tasks. https://github.com/joblib/joblib/pull/1062
🛠 Fix a problem in the way the joblib dask backend batches calls that would badly interact with the dask callable pickling cache and lead to wrong results or errors. https://github.com/joblib/joblib/pull/1055
👷 Prevent a dask.distributed bug from surfacing in joblib's dask backend during nested Parallel calls (due to joblib's auto-scattering feature) https://github.com/joblib/joblib/pull/1061
↪ Workaround for a race condition after Parallel calls with the dask backend that would cause low level warnings from asyncio coroutines: https://github.com/joblib/joblib/pull/1078
🚀 Release 0.15.1
- 👷 Make joblib work on Python 3 installation that do not ship with the lzma package in their standard library.
🚀 Release 0.15.0
⬇️ Drop support for Python 2 and Python 3.5. All objects in
joblib.format_stackare now deprecated and will be removed in joblib 0.16. Note that no deprecation warning will be raised for these objects Python < 3.7. https://github.com/joblib/joblib/pull/1018
🛠 Fix many bugs related to the temporary files and folder generated when automatically memory mapping large numpy arrays for efficient inter-process communication. In particular, this would cause
PermissionErrorexceptions to be raised under Windows and large leaked files in
/dev/shmunder Linux in case of crash. https://github.com/joblib/joblib/pull/966
👉 Make the dask backend collect results as soon as they complete leading to a performance improvement: https://github.com/joblib/joblib/pull/1025
Fix the number of jobs reported by
n_jobs=Nonecalled in a parallel backend context. https://github.com/joblib/joblib/pull/985
⬆️ Upgraded vendored cloupickle to 1.4.1 and loky to 2.8.0. This allows for Parallel calls of dynamically defined functions with type annotations in particular.
🚀 Release 0.14.1
🔧 Configure the loky workers' environment to mitigate oversubsription with nested multi-threaded code in the following case:
- allow for a suitable number of threads for numba (
- enable Interprocess Communication for scheduler coordination when the
nested code uses Threading Building Blocks (TBB) (
- allow for a suitable number of threads for numba (
🛠 Fix a regression where the loky backend was not reusing previously spawned workers. https://github.com/joblib/joblib/pull/968
🚀 Release 0.14.0
👌 Improved the load balancing between workers to avoid stranglers caused by an excessively large batch size when the task duration is varying significantly (because of the combined use of
joblib.Memorywith a partially warmed cache for instance). https://github.com/joblib/joblib/pull/899
➕ Add official support for Python 3.8: fixed protocol number in
Hasherand updated tests.
🛠 Fix a deadlock when using the dask backend (when scattering large numpy arrays). https://github.com/joblib/joblib/pull/914
👷 Warn users that they should never use
joblib.loadwith files from untrusted sources. Fix security related API change introduced in numpy 1.6.3 that would prevent using joblib with recent numpy versions. https://github.com/joblib/joblib/pull/879
⬆️ Upgrade to cloudpickle 1.1.1 that add supports for the upcoming Python 3.8 release among other things. https://github.com/joblib/joblib/pull/878
🛠 Fix semaphore availability checker to avoid spawning resource trackers on module import. https://github.com/joblib/joblib/pull/893
🛠 Fix the oversubscription protection to only protect against nested
Parallelcalls. This allows
joblibto be run in background threads. https://github.com/joblib/joblib/pull/934
ValueError(negative dimensions) when pickling large numpy arrays on Windows. https://github.com/joblib/joblib/pull/920
⬆️ Upgrade to loky 2.6.0 that add supports for the setting environment variables in child before loading any module. https://github.com/joblib/joblib/pull/940
🛠 Fix the oversubscription protection for native libraries using threadpools (OpenBLAS, MKL, Blis and OpenMP runtimes). The maximal number of threads is can now be set in children using the
parallel_backend. It defaults to
cpu_count() // n_jobs. https://github.com/joblib/joblib/pull/940
🚀 Release 0.13.2
Upgrade to cloudpickle 0.8.0
Add a non-regression test related to joblib issues #836 and #833, reporting that cloudpickle versions between 0.5.4 and 0.7 introduced a bug where global variables changes in a parent process between two calls to joblib.Parallel would not be propagated into the workers
🚀 Release 0.13.1
Memory now accepts pathlib.Path objects as
locationparameter. Also, a warning is raised if the returned backend is None while
locationis not None.
Parallelraise an informative
RuntimeErrorwhen the active parallel backend has zero worker.
DaskDistributedBackendwait for workers before trying to schedule work. This is useful in particular when the workers are provisionned dynamically but provisionning is not immediate (for instance using Kubernetes, Yarn or an HPC job queue).
🚀 Release 0.13.0
Include loky 2.4.2 with default serialization with
cloudpickle. This can be tweaked with the environment variable
Fix nested backend in SequentialBackend to avoid changing the default backend to Sequential. (#792)
Thomas Moreau, Olivier Grisel
Fix nested_backend behavior to avoid setting the default number of workers to -1 when the backend is not dask. (#784)
🚀 Release 0.12.5
Thomas Moreau, Olivier Grisel
Include loky 2.3.1 with better error reporting when a worker is abruptly terminated. Also fixes spurious debug output.
Include cloudpickle 0.5.6. Fix a bug with the handling of global variables by locally defined functions.
🚀 Release 0.12.4
Thomas Moreau, Pierre Glaser, Olivier Grisel
Include loky 2.3.0 with many bugfixes, notably w.r.t. when setting non-default multiprocessing contexts. Also include improvement on memory management of long running worker processes and fixed issues when using the loky backend under PyPy.
Raises a more explicit exception when a corrupted MemorizedResult is loaded.
Loading a corrupted cached file with mmap mode enabled would recompute the results and return them without memmory mapping.
🚀 Release 0.12.3
Fix joblib import setting the global start_method for multiprocessing.
Fix MemorizedResult not picklable (#747).
Fix Memory, MemorizedFunc and MemorizedResult round-trip pickling + unpickling (#746).
Fixed a regression in Memory when positional arguments are called as kwargs several times with different values (#751).
Thomas Moreau and Olivier Grisel
Integration of loky 2.2.2 that fixes issues with the selection of the default start method and improve the reporting when calling functions with arguments that raise an exception when unpickling.
Prevent MemorizedFunc.call_and_shelve from loading cached results to RAM when not necessary. Results in big performance improvements
🚀 Release 0.12.2
Integrate loky 2.2.0 to fix regression with unpicklable arguments and functions reported by users (#723, #643).
Loky 2.2.0 also provides a protection against memory leaks long running applications when psutil is installed (reported as #721).
Joblib now includes the code for the dask backend which has been updated to properly handle nested parallelism and data scattering at the same time (#722).
Alexandre Abadie and Olivier Grisel
Restored some private API attribute and arguments (
pickle_cache) for backward compat. (#716, #732).
Joris Van den Bossche
Fix a deprecation warning message (for
🚀 Release 0.12.1
Make sure that any exception triggered when serializing jobs in the queue will be wrapped as a PicklingError as in past versions of joblib.
Fix kwonlydefaults key error in filter_args (#715)
🚀 Release 0.12
Implement the ``'loky'`` backend with @ogrisel. This backend relies on a robust implementation of ``concurrent.futures.ProcessPoolExecutor`` with spawned processes that can be reused accross the ``Parallel`` calls. This fixes the bad interation with third paty libraries relying on thread pools, described in https://pythonhosted.org/joblib/parallel.html#bad-interaction-of-multiprocessing-and-third-party-libraries Limit the number of threads used in worker processes by C-libraries that relies on threadpools. This functionality works for MKL, OpenBLAS, OpenMP and Accelerated.
Prevent numpy arrays with the same shape and data from hashing to the same memmap, to prevent jobs with preallocated arrays from writing over each other.
Reduce overhead of automatic memmap by removing the need to hash the array. Make ``Memory.cache`` robust to ``PermissionError (errno 13)`` under Windows when run in combination with ``Parallel``. The automatic array memory mapping feature of ``Parallel`` does no longer use ``/dev/shm`` if it is too small (less than 2 GB). In particular in docker containers ``/dev/shm`` is only 64 MB by default which would cause frequent failures when running joblib in Docker containers. Make it possible to hint for thread-based parallelism with ``prefer='threads'`` or enforce shared-memory semantics with ``require='sharedmem'``. Rely on the built-in exception nesting system of Python 3 to preserve traceback information when an exception is raised on a remote worker process. This avoid verbose and redundant exception reports under Python 3. Preserve exception type information when doing nested Parallel calls instead of mapping the exception to the generic ``JoblibException`` type.
Introduce the concept of 'store' and refactor the ``Memory`` internal storage implementation to make it accept extra store backends for caching results. ``backend`` and ``backend_options`` are the new options added to ``Memory`` to specify and configure a store backend. Add the ``register_store_backend`` function to extend the store backend used by default with Memory. This default store backend is named 'local' and corresponds to the local filesystem. The store backend API is experimental and thus is subject to change in the future without deprecation. The ``cachedir`` parameter of ``Memory`` is now marked as deprecated, use ``location`` instead. Add support for LZ4 compression if ``lz4`` package is installed. Add ``register_compressor`` function for extending available compressors. Allow passing a string to ``compress`` parameter in ``dump`` funtion. This string should correspond to the compressor used (e.g. zlib, gzip, lz4, etc). The default compression level is used in this case.
Allow ``parallel_backend`` to be used globally instead of only as a context manager. Support lazy registration of external parallel backends
🚀 Release 0.11
Remove support for python 2.6
Remove deprecated `format_signature`, `format_call` and `load_output` functions from Memory API.
Add initial implementation of LRU cache cleaning. You can specify the size limit of a ``Memory`` object via the ``bytes_limit`` parameter and then need to clean explicitly the cache via the ``Memory.reduce_size`` method.
Make the multiprocessing backend work even when the name of the main thread is not the Python default. Thanks to Roman Yurchak for the suggestion.
pytest is used to run the tests instead of nosetests. ``python setup.py test`` or ``python setup.py nosetests`` do not work anymore, run ``pytest joblib`` instead.
An instance of ``joblib.ParallelBackendBase`` can be passed into the ``parallel`` argument in ``joblib.Parallel``.
Fix handling of memmap objects with offsets greater than mmap.ALLOCATIONGRANULARITY in ``joblib.Parrallel``. See https://github.com/joblib/joblib/issues/451 for more details.
Fix performance regression in ``joblib.Parallel`` with n_jobs=1. See https://github.com/joblib/joblib/issues/483 for more details.
Fix race condition when a function cached with ``joblib.Memory.cache`` was used inside a ``joblib.Parallel``. See https://github.com/joblib/joblib/issues/490 for more details.
🚀 Release 0.10.3
Fix tests when multiprocessing is disabled via the JOBLIB_MULTIPROCESSING environment variable.
Remove warnings in nested Parallel objects when the inner Parallel has n_jobs=1. See https://github.com/joblib/joblib/pull/406 for more details.
🚀 Release 0.10.2
FIX a bug in stack formatting when the error happens in a compiled extension. See https://github.com/joblib/joblib/pull/382 for more details.
FIX a bug in the constructor of BinaryZlibFile that would throw an exception when passing unicode filename (Python 2 only). See https://github.com/joblib/joblib/pull/384 for more details.
Expose :class:`joblib.parallel.ParallelBackendBase` and :class:`joblib.parallel.AutoBatchingMixin` in the public API to make them officially re-usable by backend implementers.
🚀 Release 0.10.0
ENH: joblib.dump/load now accept file-like objects besides filenames. https://github.com/joblib/joblib/pull/351 for more details.
Niels Zeilemaker and Olivier Grisel
Refactored joblib.Parallel to enable the registration of custom computational backends. https://github.com/joblib/joblib/pull/306 Note the API to register custom backends is considered experimental and subject to change without deprecation.
Joblib pickle format change: joblib.dump always create a single pickle file and joblib.dump/joblib.save never do any memory copy when writing/reading pickle files. Reading pickle files generated with joblib versions prior to 0.10 will be supported for a limited amount of time, we advise to regenerate them from scratch when convenient. joblib.dump and joblib.load also support pickle files compressed using various strategies: zlib, gzip, bz2, lzma and xz. Note that lzma and xz are only available with python >= 3.3. https://github.com/joblib/joblib/pull/260 for more details.
ENH: joblib.dump/load now accept pathlib.Path objects as filenames. https://github.com/joblib/joblib/pull/316 for more details.
Workaround for "WindowsError: [Error 5] Access is denied" when trying to terminate a multiprocessing pool under Windows: https://github.com/joblib/joblib/issues/354
🚀 Release 0.9.4
FIX a race condition that could cause a joblib.Parallel to hang when collecting the result of a job that triggers an exception. https://github.com/joblib/joblib/pull/296
FIX a bug that caused joblib.Parallel to wrongly reuse previously memmapped arrays instead of creating new temporary files. https://github.com/joblib/joblib/pull/294 for more details.
FIX for raising non inheritable exceptions in a Parallel call. See https://github.com/joblib/joblib/issues/269 for more details.
FIX joblib.hash error with mixed types sets and dicts containing mixed types keys when using Python 3. see https://github.com/joblib/joblib/issues/254
FIX joblib.dump/load for big numpy arrays with dtype=object. See https://github.com/joblib/joblib/issues/220 for more details.
FIX joblib.Parallel hanging when used with an exhausted iterator. See https://github.com/joblib/joblib/issues/292 for more details.
🚀 Release 0.9.3
Revert back to the ``fork`` start method (instead of ``forkserver``) as the latter was found to cause crashes in interactive Python sessions.
🚀 Release 0.9.2
Joblib hashing now uses the default pickle protocol (2 for Python 2 and 3 for Python 3). This makes it very unlikely to get the same hash for a given object under Python 2 and Python 3. In particular, for Python 3 users, this means that the output of joblib.hash changes when switching from joblib 0.8.4 to 0.9.2 . We strive to ensure that the output of joblib.hash does not change needlessly in future versions of joblib but this is not officially guaranteed.
Joblib pickles generated with Python 2 can not be loaded with Python 3 and the same applies for joblib pickles generated with Python 3 and loaded with Python 2. During the beta period 0.9.0b2 to 0.9.0b4, we experimented with a joblib serialization that aimed to make pickles serialized with Python 3 loadable under Python 2. Unfortunately this serialization strategy proved to be too fragile as far as the long-term maintenance was concerned (For example see https://github.com/joblib/joblib/pull/243). That means that joblib pickles generated with joblib 0.9.0bN can not be loaded under joblib 0.9.2. Joblib beta testers, who are the only ones likely to be affected by this, are advised to delete their joblib cache when they upgrade from 0.9.0bN to 0.9.2.
Fixed a bug with ``joblib.hash`` that used to return unstable values for strings and numpy.dtype instances depending on interning states.
Make joblib use the 'forkserver' start method by default under Python 3.4+ to avoid causing crash with 3rd party libraries (such as Apple vecLib / Accelerate or the GCC OpenMP runtime) that use an internal thread pool that is not not reinitialized when a ``fork`` system call happens.
New context manager based API (``with`` block) to re-use the same pool of workers across consecutive parallel calls.
Vlad Niculae and Olivier Grisel
Automated batching of fast tasks into longer running jobs to hide multiprocessing dispatching overhead when possible.
FIX make it possible to call ``joblib.load(filename, mmap_mode='r')`` on pickled objects that include a mix of arrays of both memory memmapable dtypes and object dtype.
🚀 Release 0.8.4
2014-11-20 Olivier Grisel
OPTIM use the C-optimized pickler under Python 3 This makes it possible to efficiently process parallel jobs that deal with numerous Python objects such as large dictionaries.
🚀 Release 0.8.3
2014-08-19 Olivier Grisel
FIX disable memmapping for object arrays
2014-08-07 Lars Buitinck
MAINT NumPy 1.10-safe version comparisons
2014-07-11 Olivier Grisel
FIX #146: Heisen test failure caused by thread-unsafe Python lists This fix uses a queue.Queue datastructure in the failing test. This datastructure is thread-safe thanks to an internal Lock. This Lock instance not picklable hence cause the picklability check of delayed to check fail. When using the threading backend, picklability is no longer required, hence this PRs give the user the ability to disable it on a case by case basis.
🚀 Release 0.8.2
2014-06-30 Olivier Grisel
BUG: use mmap_mode='r' by default in Parallel and MemmappingPool The former default of mmap_mode='c' (copy-on-write) caused problematic use of the paging file under Windows.
2014-06-27 Olivier Grisel
BUG: fix usage of the /dev/shm folder under Linux
🚀 Release 0.8.1
2014-05-29 Gael Varoquaux
BUG: fix crash with high verbosity
🚀 Release 0.8.0
2014-05-14 Olivier Grisel
Fix a bug in exception reporting under Python 3
2014-05-10 Olivier Grisel
Fixed a potential segfault when passing non-contiguous memmap instances.
2014-04-22 Gael Varoquaux
ENH: Make memory robust to modification of source files while the interpreter is running. Should lead to less spurious cache flushes and recomputations.
2014-02-24 Philippe Gervais
Memory.call_and_shelveAPI to handle memoized results by reference instead of by value.
v0.15.1May 16, 2020
v0.15.0May 15, 2020
v0.14.1December 10, 2019
v0.14.0October 01, 2019
v0.13.2February 13, 2019
v0.13.1January 11, 2019
v0.13.0November 14, 2018
v0.12.5September 13, 2018