Apache Spark is a fast and general-purpose cluster computing system that offers a unified analytics engine for large-scale data processing and machine learning.
- Hadoop and Spark
-
Domino projects can use the environment to work with Hadoop applications.
Spark clusters can use Spot instances to save the infrastructure costs. We recommend to use Spot instances only for the driver nodes as they can recover in case of failure. For Master note, always use on-demand nodes.
If AWS interrupts a spot instance, the on-demand or scheduled job on the Spark cluster may slow down the execution. If this happens, and until AWS spot instances of the requested type become available again, the remediation is to change the hardware tier of the job to use a non-spot node pool.