Ray v0.8.3 Release Notes

Release Date: 2020-03-25 // about 4 years ago
  • Highlights

    • ๐Ÿ‘ Autoscaler has added Azure Support. (#7080, #7515, #7558, #7494)
      • Ray autoscaler helps you launch a distributed ray cluster using a single command line call!
      • It works on Azure, AWS, GCP, Kubernetes, Yarn, Slurm and local nodes.
    • 0๏ธโƒฃ Distributed reference counting is turned on by default. (#7628, #7337)
      • This means all ray objects are tracked and garbage collected only when all references go out of scope. It can be turned off with: ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 0})).
      • When the object store is full with objects that are still in scope, you can turn on least-recently-used eviction to force remove objects using ray.init(lru_evict=True).
    • A new command ray memory is added to help debug memory usage: (#7589)

      ray memory

      Object ID Reference Type Object Size Reference Creation Site

      ; worker pid=51230 ffffffffffffffffffffffff0100008801000000 PINNED_IN_MEMORY 8231 (deserialize task arg) main..sum_task ; driver pid=51174 45b95b1c8bd3a9c4ffffffff010000c801000000 USED_BY_PENDING_TASK ? (task call) memory_demo.py::13 ffffffffffffffffffffffff0100008801000000 USED_BY_PENDING_TASK 8231 (put object) memory_demo.py::6

      ef0a6c221819881cffffffff010000c801000000 LOCAL_REFERENCE ? (task call) memory_demo.py::14

    API change

    • Change actor. __ray_kill__ () to ray.kill(actor). (#7360)
    • ๐Ÿ—„ Deprecate use_pickle flag for serialization. (#7474)
    • โœ‚ Remove experimental.NoReturn. (#7475)
    • โœ‚ Remove experimental.signal API. (#7477)

    Core

    • โž• Add Apache 2 license header to C++ files. (#7520)
    • โฌ‡๏ธ Reduce per worker memory usage to 50MB. (#7573)
    • Option to fallback to LRU on OutOfMemory. (#7410)
    • Reference counting for actor handles. (#7434)
    • Reference counting for returning object IDs created by a different process. (#7221)
    • Use prctl(PR_SET_PDEATHSIG) on Linux instead of reaper. (#7150)
    • Route asyncio plasma through raylet instead of direct plasma connection. (#7234)
    • โœ‚ Remove static concurrency limit from gRPC server. (#7544)
    • Remove get_global_worker(), RuntimeContext. (#7638)
    • ๐Ÿ›  Fix known issues from 0.8.2 release:
      • Fix passing duplicate by-reference arguments. (#7306)
      • Fix Raise gRPC message size limit to 100MB. (#7269)

    RLlib

    • ๐Ÿ†• New features:
      • Exploration API improvements. (#7373, #7314, #7380)
      • SAC: add discrete action support. (#7320, #7272)
      • Add high-performance external application connector. (#7641)
    • ๐Ÿ› Bug fix highlights:
      • PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238)
      • Rename sample_batch_size => rollout_fragment_length. (#7503)
      • Fix bugs and speed up SegmentTree.

    Tune

    • โ†” Integrate Dragonfly optimizer. (#5955)
    • ๐Ÿ›  Fix HyperBand errors. (#7563)
    • Access Trial Name, Trial ID inside trainable. (#7378)
    • โž• Add a new repeater class for high variance trials. (#7366)
    • Prevent deletion of checkpoint from user-initiated restoration. (#7501)

    Libraries

    • [Parallel Iterators] Allow for operator chaining after repartition. (#7268)
    • [Parallel Iterators] Repartition functionality. (#7163)
    • [Serve] @serve.route returns a handle, add handle.scale, handle.set_max_batch_size. (#7569)
    • [RaySGD] PyTorchTrainer --> TorchTrainer. (#7425)
    • [RaySGD] Custom training API. (#7211)
    • [RaySGD] Breaking User API changes: (#7384)
      • data_creator fed to TorchTrainer now must return a dataloader rather than datasets.
      • TorchTrainer automatically sets "DistributedSampler" if a DataLoader is returned.
      • data_loader_config and batch_size are no longer parameters for TorchTrainer.
      • TorchTrainer parallelism is now set by num_workers.
      • All TorchTrainer args now must be named parameters.

    Java

    • ๐Ÿ†• New Java actor API (#7414)
      • @RayRemote annotation is removed.
      • Instead of Ray.call(ActorClass::method, actor), the new API is actor.call(ActorClass::method).
    • ๐Ÿ‘ Allow passing internal config from raylet to Java worker. (#7532)
    • 0๏ธโƒฃ Enable direct call by default. (#7408)
    • Pass large object by reference. (#7595)

    Others

    Known issues

    • Ray currently doesn't work on Python 3.5.0, but works on 3.5.3 and above.

    Thanks

    ๐Ÿš€ We thank the following contributors for their work on this release:
    @rkooo567, @maximsmol, @suquark, @mitchellstern, @micafan, @ClarkZinzow, @Jimpachnet, @mwbrulhardt, @ujvl, @chaokunyang, @robertnishihara, @jovany-wang, @hyeonjames, @zhijunfu, @datayjz, @fyrestone, @eisber, @stephanie-wang, @allenyin55, @BalaBalaYi, @simon-mo, @thedrow, @ffbin, @amogkam, @TisonKun, @richardliaw, @ijrsvt, @wumuzi520, @mehrdadn, @raulchen, @landcold7, @ericl, @edoakes, @sven1977, @ashione, @jorenretel, @gramhagen, @kfstorm, @anthonyhsyu, @pcmoritz