On-demand Ray

Ray.io is a distributed execution framework that makes it easy to scale your single machine applications, with little or no changes, and to leverage state-of-the-art machine learning libraries.

Ray provides a set of core low-level primitives as well as a family of pre-packaged libraries that take advantage of these primitives to enable solving powerful machine learning problems.

The following libraries come packaged with Ray:

Additionally, Ray has been adopted as a foundational framework by a large number of open source ML frameworks which now have community-maintained Ray integrations.

Orchestrate Ray on Domino

Domino can dynamically provision and orchestrate a Ray cluster directly on the infrastructure backing the Domino instance. This allows Domino users to get quick access to Ray without having to rely on their IT team.

When you start a Domino workspace for interactive work or a Domino job for batch processing, Domino will create, manage, and make available a containerized Ray cluster to your execution.

See Domino’s quick-start-ray project.

Suitable use cases

Domino on-demand Ray clusters are suitable for the following workloads:

Distributed multi-node training: RaySGD provides a lightweight mechanism for taking existing PyTorch and Tensorflow models and scaling them across multiple machines to dramatically reduce training times. Ray is suitable for both distributed CPU and GPU training.
Hyperparameter optimization: Launch a distributed hyperparameter sweep with just a few lines of code and no adaptation of your existing training harness, and take advantage of a large number of advanced parameter search algorithms.
Reinforcement learning: Ray, in combination with the RLlib library, allows you to take advantage of a number of built-in reinforcement learning algorithms, but also provides a general framework for developing your own.

Note	While Ray offers a scalable serving capability, the ephemeral nature of the Domino Ray clusters does not make it a good fit for this use case.