Joblib v1.0.0 Release Notes

    • 👷 Make joblib.hash and joblib.Memory caching system compatible with `numpy

      = 1.20.0. Also make it explicit in the documentation that users should now expect to have theirjoblib.Memorycache invalidated when eitherjoblib or a third party library involved in the cached values definition is upgraded. In particular, users updatingjoblibto a release that includes this fix will see their previous cache invalidated if they contained reference tonumpy` objects. https://github.com/joblib/joblib/pull/1136

    • ✂ Remove deprecated check_pickle argument in delayed. https://github.com/joblib/joblib/pull/903

    🚀 Release 0.17.0

    • 🛠 Fix a spurious invalidation of Memory.cache'd functions called with Parallel under Jupyter or IPython. https://github.com/joblib/joblib/pull/1093

    • ⬆️ Bump vendored loky to 2.9.0 and cloudpickle to 1.6.0. In particular this fixes a problem to add compat for Python 3.9.

    🚀 Release 0.16.0

    🚀 Release 0.15.1

    • 👷 Make joblib work on Python 3 installation that do not ship with the lzma package in their standard library.

    🚀 Release 0.15.0

    • ⬇️ Drop support for Python 2 and Python 3.5. All objects in joblib.my_exceptions and joblib.format_stack are now deprecated and will be removed in joblib 0.16. Note that no deprecation warning will be raised for these objects Python < 3.7. https://github.com/joblib/joblib/pull/1018

    • 🛠 Fix many bugs related to the temporary files and folder generated when automatically memory mapping large numpy arrays for efficient inter-process communication. In particular, this would cause PermissionError exceptions to be raised under Windows and large leaked files in /dev/shm under Linux in case of crash. https://github.com/joblib/joblib/pull/966

    • 👉 Make the dask backend collect results as soon as they complete leading to a performance improvement: https://github.com/joblib/joblib/pull/1025

    • Fix the number of jobs reported by effective_n_jobs when n_jobs=None called in a parallel backend context. https://github.com/joblib/joblib/pull/985

    • ⬆️ Upgraded vendored cloupickle to 1.4.1 and loky to 2.8.0. This allows for Parallel calls of dynamically defined functions with type annotations in particular.

    🚀 Release 0.14.1

    • 🔧 Configure the loky workers' environment to mitigate oversubsription with nested multi-threaded code in the following case:

      • allow for a suitable number of threads for numba (NUMBA_NUM_THREADS);
      • enable Interprocess Communication for scheduler coordination when the nested code uses Threading Building Blocks (TBB) (ENABLE_IPC=1)

    https://github.com/joblib/joblib/pull/951

    🚀 Release 0.14.0

    • 👌 Improved the load balancing between workers to avoid stranglers caused by an excessively large batch size when the task duration is varying significantly (because of the combined use of joblib.Parallel and joblib.Memory with a partially warmed cache for instance). https://github.com/joblib/joblib/pull/899

    • ➕ Add official support for Python 3.8: fixed protocol number in Hasher and updated tests.

    • 🛠 Fix a deadlock when using the dask backend (when scattering large numpy arrays). https://github.com/joblib/joblib/pull/914

    • 👷 Warn users that they should never use joblib.load with files from untrusted sources. Fix security related API change introduced in numpy 1.6.3 that would prevent using joblib with recent numpy versions. https://github.com/joblib/joblib/pull/879

    • ⬆️ Upgrade to cloudpickle 1.1.1 that add supports for the upcoming Python 3.8 release among other things. https://github.com/joblib/joblib/pull/878

    • 🛠 Fix semaphore availability checker to avoid spawning resource trackers on module import. https://github.com/joblib/joblib/pull/893

    • 🛠 Fix the oversubscription protection to only protect against nested Parallel calls. This allows joblib to be run in background threads. https://github.com/joblib/joblib/pull/934

    • 🛠 Fix ValueError (negative dimensions) when pickling large numpy arrays on Windows. https://github.com/joblib/joblib/pull/920

    • ⬆️ Upgrade to loky 2.6.0 that add supports for the setting environment variables in child before loading any module. https://github.com/joblib/joblib/pull/940

    • 🛠 Fix the oversubscription protection for native libraries using threadpools (OpenBLAS, MKL, Blis and OpenMP runtimes). The maximal number of threads is can now be set in children using the inner_max_num_threads in parallel_backend. It defaults to cpu_count() // n_jobs. https://github.com/joblib/joblib/pull/940

    🚀 Release 0.13.2

    Pierre Glaser

    Upgrade to cloudpickle 0.8.0

    Add a non-regression test related to joblib issues #836 and #833, reporting that cloudpickle versions between 0.5.4 and 0.7 introduced a bug where global variables changes in a parent process between two calls to joblib.Parallel would not be propagated into the workers

    🚀 Release 0.13.1

    Pierre Glaser

    Memory now accepts pathlib.Path objects as location parameter. Also, a warning is raised if the returned backend is None while location is not None.

    Olivier Grisel

    Make Parallel raise an informative RuntimeError when the active parallel backend has zero worker.

    Make the DaskDistributedBackend wait for workers before trying to schedule work. This is useful in particular when the workers are provisionned dynamically but provisionning is not immediate (for instance using Kubernetes, Yarn or an HPC job queue).

    🚀 Release 0.13.0

    Thomas Moreau

    Include loky 2.4.2 with default serialization with cloudpickle. This can be tweaked with the environment variable LOKY_PICKLER.

    Thomas Moreau

    Fix nested backend in SequentialBackend to avoid changing the default backend to Sequential. (#792)

    Thomas Moreau, Olivier Grisel

    Fix nested_backend behavior to avoid setting the default number of
    workers to -1 when the backend is not dask. (#784)
    

    🚀 Release 0.12.5

    Thomas Moreau, Olivier Grisel

    Include loky 2.3.1 with better error reporting when a worker is
    abruptly terminated. Also fixes spurious debug output.
    

    Pierre Glaser

    Include cloudpickle 0.5.6. Fix a bug with the handling of global
    variables by locally defined functions.
    

    🚀 Release 0.12.4

    Thomas Moreau, Pierre Glaser, Olivier Grisel

    Include loky 2.3.0 with many bugfixes, notably w.r.t. when setting
    non-default multiprocessing contexts. Also include improvement on
    memory management of long running worker processes and fixed issues
    when using the loky backend under PyPy.
    

    Maxime Weyl

    Raises a more explicit exception when a corrupted MemorizedResult is loaded.
    

    Maxime Weyl

    Loading a corrupted cached file with mmap mode enabled would
    recompute the results and return them without memmory mapping.
    

    🚀 Release 0.12.3

    Thomas Moreau

    Fix joblib import setting the global start_method for multiprocessing.
    

    Alexandre Abadie

    Fix MemorizedResult not picklable (#747).
    

    Loïc Estève

    Fix Memory, MemorizedFunc and MemorizedResult round-trip pickling +
    unpickling (#746).
    

    James Collins

    Fixed a regression in Memory when positional arguments are called as
    kwargs several times with different values (#751).
    

    Thomas Moreau and Olivier Grisel

    Integration of loky 2.2.2 that fixes issues with the selection of the
    default start method and improve the reporting when calling functions
    with arguments that raise an exception when unpickling.
    

    Maxime Weyl

    Prevent MemorizedFunc.call_and_shelve from loading cached results to
    RAM when not necessary. Results in big performance improvements
    

    🚀 Release 0.12.2

    Olivier Grisel

    Integrate loky 2.2.0 to fix regression with unpicklable arguments and functions reported by users (#723, #643).

    Loky 2.2.0 also provides a protection against memory leaks long running applications when psutil is installed (reported as #721).

    Joblib now includes the code for the dask backend which has been updated to properly handle nested parallelism and data scattering at the same time (#722).

    Alexandre Abadie and Olivier Grisel

    Restored some private API attribute and arguments (MemorizedResult.argument_hash and BatchedCalls.__init__'s pickle_cache) for backward compat. (#716, #732).

    Joris Van den Bossche

    Fix a deprecation warning message (for Memory's cachedir) (#720).

    🚀 Release 0.12.1

    Thomas Moreau

    Make sure that any exception triggered when serializing jobs in the queue
    will be wrapped as a PicklingError as in past versions of joblib.
    

    Noam Hershtig

    Fix kwonlydefaults key error in filter_args (#715)
    

    🚀 Release 0.12

    Thomas Moreau

    Implement the ``'loky'`` backend with @ogrisel. This backend relies on
    a robust implementation of ``concurrent.futures.ProcessPoolExecutor``
    with spawned processes that can be reused accross the ``Parallel``
    calls. This fixes the bad interation with third paty libraries relying on
    thread pools, described in https://pythonhosted.org/joblib/parallel.html#bad-interaction-of-multiprocessing-and-third-party-libraries
    
    Limit the number of threads used in worker processes by C-libraries that
    relies on threadpools. This functionality works for MKL, OpenBLAS, OpenMP
    and Accelerated.
    

    Elizabeth Sander

    Prevent numpy arrays with the same shape and data from hashing to
    the same memmap, to prevent jobs with preallocated arrays from
    writing over each other.
    

    Olivier Grisel

    Reduce overhead of automatic memmap by removing the need to hash the
    array.
    
    Make ``Memory.cache`` robust to ``PermissionError (errno 13)`` under
    Windows when run in combination with ``Parallel``.
    
    The automatic array memory mapping feature of ``Parallel`` does no longer
    use ``/dev/shm`` if it is too small (less than 2 GB). In particular in
    docker containers ``/dev/shm`` is only 64 MB by default which would cause
    frequent failures when running joblib in Docker containers.
    
    Make it possible to hint for thread-based parallelism with
    ``prefer='threads'`` or enforce shared-memory semantics with
    ``require='sharedmem'``.
    
    Rely on the built-in exception nesting system of Python 3 to preserve
    traceback information when an exception is raised on a remote worker
    process. This avoid verbose and redundant exception reports under
    Python 3.
    
    Preserve exception type information when doing nested Parallel calls
    instead of mapping the exception to the generic ``JoblibException`` type.
    

    Alexandre Abadie

    Introduce the concept of 'store' and refactor the ``Memory`` internal
    storage implementation to make it accept extra store backends for caching
    results. ``backend`` and ``backend_options`` are the new options added to
    ``Memory`` to specify and configure a store backend.
    
    Add the ``register_store_backend`` function to extend the store backend
    used by default with Memory. This default store backend is named 'local'
    and corresponds to the local filesystem.
    
    The store backend API is experimental and thus is subject to change in the
    future without deprecation.
    
    The ``cachedir`` parameter of ``Memory`` is now marked as deprecated, use
    ``location`` instead.
    
    Add support for LZ4 compression if ``lz4`` package is installed.
    
    Add ``register_compressor`` function for extending available compressors.
    
    Allow passing a string to ``compress`` parameter in ``dump`` funtion. This
    string should correspond to the compressor used (e.g. zlib, gzip, lz4,
    etc). The default compression level is used in this case.
    

    Matthew Rocklin

    Allow ``parallel_backend`` to be used globally instead of only as a context
    manager.
    Support lazy registration of external parallel backends
    

    🚀 Release 0.11

    Alexandre Abadie

    Remove support for python 2.6
    

    Alexandre Abadie

    Remove deprecated `format_signature`, `format_call` and `load_output`
    functions from Memory API.
    

    Loïc Estève

    Add initial implementation of LRU cache cleaning. You can specify
    the size limit of a ``Memory`` object via the ``bytes_limit``
    parameter and then need to clean explicitly the cache via the
    ``Memory.reduce_size`` method.
    

    Olivier Grisel

    Make the multiprocessing backend work even when the name of the main
    thread is not the Python default. Thanks to Roman Yurchak for the
    suggestion.
    

    Karan Desai

    pytest is used to run the tests instead of nosetests.
    ``python setup.py test`` or ``python setup.py nosetests`` do not work
    anymore, run ``pytest joblib`` instead.
    

    Loïc Estève

    An instance of ``joblib.ParallelBackendBase`` can be passed into
    the ``parallel`` argument in ``joblib.Parallel``.
    

    Loïc Estève

    Fix handling of memmap objects with offsets greater than
    mmap.ALLOCATIONGRANULARITY in ``joblib.Parrallel``. See
    https://github.com/joblib/joblib/issues/451 for more details.
    

    Loïc Estève

    Fix performance regression in ``joblib.Parallel`` with
    n_jobs=1. See https://github.com/joblib/joblib/issues/483 for more
    details.
    

    Loïc Estève

    Fix race condition when a function cached with
    ``joblib.Memory.cache`` was used inside a ``joblib.Parallel``. See
    https://github.com/joblib/joblib/issues/490 for more details.
    

    🚀 Release 0.10.3

    Loïc Estève

    Fix tests when multiprocessing is disabled via the
    JOBLIB_MULTIPROCESSING environment variable.
    

    harishmk

    Remove warnings in nested Parallel objects when the inner Parallel
    has n_jobs=1. See https://github.com/joblib/joblib/pull/406 for
    more details.
    

    🚀 Release 0.10.2

    Loïc Estève

    FIX a bug in stack formatting when the error happens in a compiled
    extension. See https://github.com/joblib/joblib/pull/382 for more
    details.
    

    Vincent Latrouite

    FIX a bug in the constructor of BinaryZlibFile that would throw an
    exception when passing unicode filename (Python 2 only).
    See https://github.com/joblib/joblib/pull/384 for more details.
    

    Olivier Grisel

    Expose :class:`joblib.parallel.ParallelBackendBase` and
    :class:`joblib.parallel.AutoBatchingMixin` in the public API to
    make them officially re-usable by backend implementers.
    

    🚀 Release 0.10.0

    Alexandre Abadie

    ENH: joblib.dump/load now accept file-like objects besides filenames.
    https://github.com/joblib/joblib/pull/351 for more details.
    

    Niels Zeilemaker and Olivier Grisel

    Refactored joblib.Parallel to enable the registration of custom
    computational backends.
    https://github.com/joblib/joblib/pull/306
    Note the API to register custom backends is considered experimental
    and subject to change without deprecation.
    

    Alexandre Abadie

    Joblib pickle format change: joblib.dump always create a single pickle file
    and joblib.dump/joblib.save never do any memory copy when writing/reading
    pickle files. Reading pickle files generated with joblib versions prior
    to 0.10 will be supported for a limited amount of time, we advise to
    regenerate them from scratch when convenient.
    joblib.dump and joblib.load also support pickle files compressed using
    various strategies: zlib, gzip, bz2, lzma and xz. Note that lzma and xz are
    only available with python >= 3.3.
    https://github.com/joblib/joblib/pull/260 for more details.
    

    Antony Lee

    ENH: joblib.dump/load now accept pathlib.Path objects as filenames.
    https://github.com/joblib/joblib/pull/316 for more details.
    

    Olivier Grisel

    Workaround for "WindowsError: [Error 5] Access is denied" when trying to
    terminate a multiprocessing pool under Windows:
    https://github.com/joblib/joblib/issues/354
    

    🚀 Release 0.9.4

    Olivier Grisel

    FIX a race condition that could cause a joblib.Parallel to hang
    when collecting the result of a job that triggers an exception.
    https://github.com/joblib/joblib/pull/296
    

    Olivier Grisel

    FIX a bug that caused joblib.Parallel to wrongly reuse previously
    memmapped arrays instead of creating new temporary files.
    https://github.com/joblib/joblib/pull/294 for more details.
    

    Loïc Estève

    FIX for raising non inheritable exceptions in a Parallel call. See
    https://github.com/joblib/joblib/issues/269 for more details.
    

    Alexandre Abadie

    FIX joblib.hash error with mixed types sets and dicts containing mixed
    types keys when using Python 3.
    see https://github.com/joblib/joblib/issues/254
    

    Loïc Estève

    FIX joblib.dump/load for big numpy arrays with dtype=object. See
    https://github.com/joblib/joblib/issues/220 for more details.
    

    Loïc Estève

    FIX joblib.Parallel hanging when used with an exhausted
    iterator. See https://github.com/joblib/joblib/issues/292 for more
    details.
    

    🚀 Release 0.9.3

    Olivier Grisel

    Revert back to the ``fork`` start method (instead of
    ``forkserver``) as the latter was found to cause crashes in
    interactive Python sessions.
    

    🚀 Release 0.9.2

    Loïc Estève

    Joblib hashing now uses the default pickle protocol (2 for Python
    2 and 3 for Python 3). This makes it very unlikely to get the same
    hash for a given object under Python 2 and Python 3.
    
    In particular, for Python 3 users, this means that the output of
    joblib.hash changes when switching from joblib 0.8.4 to 0.9.2 . We
    strive to ensure that the output of joblib.hash does not change
    needlessly in future versions of joblib but this is not officially
    guaranteed.
    

    Loïc Estève

    Joblib pickles generated with Python 2 can not be loaded with
    Python 3 and the same applies for joblib pickles generated with
    Python 3 and loaded with Python 2.
    
    During the beta period 0.9.0b2 to 0.9.0b4, we experimented with
    a joblib serialization that aimed to make pickles serialized with
    Python 3 loadable under Python 2. Unfortunately this serialization
    strategy proved to be too fragile as far as the long-term
    maintenance was concerned (For example see
    https://github.com/joblib/joblib/pull/243). That means that joblib
    pickles generated with joblib 0.9.0bN can not be loaded under
    joblib 0.9.2. Joblib beta testers, who are the only ones likely to
    be affected by this, are advised to delete their joblib cache when
    they upgrade from 0.9.0bN to 0.9.2.
    

    Arthur Mensch

    Fixed a bug with ``joblib.hash`` that used to return unstable values for
    strings and numpy.dtype instances depending on interning states.
    

    Olivier Grisel

    Make joblib use the 'forkserver' start method by default under Python 3.4+
    to avoid causing crash with 3rd party libraries (such as Apple vecLib /
    Accelerate or the GCC OpenMP runtime) that use an internal thread pool that
    is not not reinitialized when a ``fork`` system call happens.
    

    Olivier Grisel

    New context manager based API (``with`` block) to re-use
    the same pool of workers across consecutive parallel calls.
    

    Vlad Niculae and Olivier Grisel

    Automated batching of fast tasks into longer running jobs to
    hide multiprocessing dispatching overhead when possible.
    

    Olivier Grisel

    FIX make it possible to call ``joblib.load(filename, mmap_mode='r')``
    on pickled objects that include a mix of arrays of both
    memory memmapable dtypes and object dtype.
    

    🚀 Release 0.8.4

    2014-11-20 Olivier Grisel

    OPTIM use the C-optimized pickler under Python 3
    
    This makes it possible to efficiently process parallel jobs that deal with
    numerous Python objects such as large dictionaries.
    

    🚀 Release 0.8.3

    2014-08-19 Olivier Grisel

    FIX disable memmapping for object arrays
    

    2014-08-07 Lars Buitinck

    MAINT NumPy 1.10-safe version comparisons
    

    2014-07-11 Olivier Grisel

    FIX #146: Heisen test failure caused by thread-unsafe Python lists
    
    This fix uses a queue.Queue datastructure in the failing test. This
    datastructure is thread-safe thanks to an internal Lock. This Lock instance
    not picklable hence cause the picklability check of delayed to check fail.
    
    When using the threading backend, picklability is no longer required, hence
    this PRs give the user the ability to disable it on a case by case basis.
    

    🚀 Release 0.8.2

    2014-06-30 Olivier Grisel

    BUG: use mmap_mode='r' by default in Parallel and MemmappingPool
    
    The former default of mmap_mode='c' (copy-on-write) caused
    problematic use of the paging file under Windows.
    

    2014-06-27 Olivier Grisel

    BUG: fix usage of the /dev/shm folder under Linux
    

    🚀 Release 0.8.1

    2014-05-29 Gael Varoquaux

    BUG: fix crash with high verbosity
    

    🚀 Release 0.8.0

    2014-05-14 Olivier Grisel

    Fix a bug in exception reporting under Python 3

    2014-05-10 Olivier Grisel

    Fixed a potential segfault when passing non-contiguous memmap instances.

    2014-04-22 Gael Varoquaux

    ENH: Make memory robust to modification of source files while the
    interpreter is running. Should lead to less spurious cache flushes
    and recomputations.
    

    2014-02-24 Philippe Gervais

    New Memory.call_and_shelve API to handle memoized results by reference instead of by value.