MXNet v1.0.0 Release Notes

Release Date: 2017-12-04 // over 6 years ago
  • 🌲 MXNet Change Log

    1.0.0

    🐎 Performance

    • ✨ Enhanced the performance of sparse.dot operator.
    • MXNet now automatically set OpenMP to use all available CPU cores to maximize CPU utilization when NUM_OMP_THREADS is not set.
    • 🐎 Unary and binary operators now avoid using OpenMP on small arrays if using OpenMP actually hurts performance due to multithreading overhead.
    • βž• Significantly improved performance of broadcast_add, broadcast_mul, etc on CPU.
    • βž• Added bulk execution to imperative mode. You can control segment size with mxnet.engine.bulk. As a result, the speed of Gluon in hybrid mode is improved, especially on small networks and multiple GPUs.
    • πŸ‘Œ Improved speed for ctypes invocation from Python frontend.

    πŸ†• New Features - Gradient Compression [Experimental]

    • Speed up multi-GPU and distributed training by compressing communication of gradients. This is especially effective when training networks with large fully-connected layers. In Gluon this can be activated with compression_params in Trainer.

    πŸ†• New Features - Support of NVIDIA Collective Communication Library (NCCL) [Experimental]

    • πŸ‘‰ Use kvstore=’nccl’ for (in some cases) faster training on multiple GPUs.
    • Significantly faster than kvstore=’device’ when batch size is small.
    • It is recommended to set environment variable NCCL_LAUNCH_MODE to PARALLEL when using NCCL version 2.1 or newer.

    πŸ†• New Features - Advanced Indexing [General Availability]

    πŸ†• New Features - Gluon [General Availability]

    • 🐎 Performance optimizations discussed above.
    • βž• Added support for loading data in parallel with multiple processes to gluon.data.DataLoader. The number of workers can be set with num_worker. Does not support windows yet.
    • βž• Added Block.cast to support networks with different data types, e.g. float16.
    • βž• Added Lambda block for wrapping a user defined function as a block.
    • πŸ‘ Generalized gluon.data.ArrayDataset to support arbitrary number of arrays.

    πŸ†• New Features - ARM / Raspberry Pi support [Experimental]

    πŸ†• New Features - NVIDIA Jetson support [Experimental]

    • MXNet now compiles and runs on NVIDIA Jetson TX2 boards with GPU acceleration.
    • πŸ“¦ You can install the python MXNet package on a Jetson board by running - $ pip install mxnet-jetson-tx2.

    πŸ†• New Features - Sparse Tensor Support [General Availability]

    • βž• Added more sparse operators: contrib.SparseEmbedding, sparse.sum and sparse.mean.
    • βž• Added asscipy() for easier conversion to scipy.
    • βž• Added check_format() for sparse ndarrays to check if the array format is valid.

    πŸ› Bug-fixes

    • πŸ›  Fixed a[-1] indexing doesn't work on NDArray.
    • πŸ›  Fixed expand_dims if axis < 0.
    • πŸ›  Fixed a bug that causes topk to produce incorrect result on large arrays.
    • πŸ‘Œ Improved numerical precision of unary and binary operators for float64 data.
    • πŸ›  Fixed derivatives of log2 and log10. They used to be the same with log.
    • πŸ›  Fixed a bug that causes MXNet to hang after fork. Note that you still cannot use GPU in child processes after fork due to limitations of CUDA.
    • πŸ›  Fixed a bug that causes CustomOp to fail when using auxiliary states.
    • πŸ›  Fixed a security bug that is causing MXNet to listen on all available interfaces when running training in distributed mode.

    ⚑️ Doc Updates

    • βž• Added a security best practices document under FAQ section.
    • πŸ›  Fixed License Headers including restoring copyright attributions.
    • πŸ“š Documentation updates.
    • πŸ”— Links for viewing source.

    πŸš€ For more information and examples, see full release notes