Ray.io is a distributed execution framework that makes it easy to scale your single machine applications, with little or no changes, and to leverage state-of-the-art machine learning libraries.
Ray provides a set of core low-level primitives as well as a family of pre-packaged libraries that take advantage of these primitives to enable solving powerful machine learning problems.
The following libraries come packaged with Ray:
Additionally, Ray has been adopted as a foundational framework by a large number of open source ML frameworks which now have community-maintained Ray integrations.
Domino can dynamically provision and orchestrate a Ray cluster directly on the infrastructure backing the Domino instance. This allows Domino users to get quick access to Ray without having to rely on their IT team.
Domino on-demand Ray clusters are suitable for the following workloads:
- Distributed multi-node training
RaySGD provides a lightweight mechanism for taking existing PyTorch and Tensorflow models and scaling them across multiple machines to dramatically reduce training times. Ray is suitable for both distributed CPU and GPU training.
- Hyperparameter optimization
Launch a distributed hyperparameter sweep with just a few lines of code and no adaptation of your existing training harness, and take advantage of a large number of advanced parameter search algorithms.
- Reinforcement learning
Ray, in combination with the RLlib library, allows you to take advantage of a number of built-in reinforcement learning algorithms, but also provides a general framework for developing your own.
While Ray offers a scalable serving capability, the ephemeral nature of the Domino Ray clusters does not make it a good fit for this use case.