You can configure a Domino workspace to launch a Jupyter notebook with a connection to your Spark cluster.
This allows you to operate the cluster interactively from Jupyter with PySpark.
The instructions for configuring a PySpark workspace are below. To use them, you must have a Domino environment that meets the following prerequisites:
-
The environment must use one of the Domino Standard Environments as its base image.
-
The necessary binaries and configurations for connecting to your Spark cluster must be installed in the environment. See the provider-specific guides for setting up the environment.
Note
|
PySpark 2 does not support Python 3.8 or higher. Build PySpark 2 compute environments from images with Python versions before 3.8 or use PySpark 3. |
-
From the Domino main menu, click Environments.
-
Click the name of an environment that meets the prerequisites listed previously. It must use a Domino standard base image and already have the necessary binaries and configuration files installed for connecting to your spark cluster.
-
On the environment overview page, click Edit Definition.
-
In the Pluggable Workspace Tools field, paste the following YAML configuration.
pyspark: title: "PySpark" start: [ "/var/opt/workspaces/pyspark/start" ] iconUrl: "https://raw.githubusercontent.com/dominodatalab/workspace-configs/develop/workspace-logos/PySpark.png" httpProxy: port: 8888 internalPath: "/{{ownerUsername}}/{{projectName}}/{{sessionPathComponent}}/{{runId}}/{{ rewrite: false requireSubdomains: false supportedFileExtensions: [ ".ipynb" ]