Learn how to connect to a Databricks Cluster or Databricks SQL Warehouse from Domino.
-
From the navigation pane, click Data > Data Sources.
-
Click Create a Data Source.
-
In the New Data Source window, from Select Data Store, select Databricks.
-
Enter the server Host, Port, and cluster/warehouse HTTP Path. Details on where to find these can be found on the Databricks Documentation. The host should not contain the protocol. For example, a valid host format is "name.cloud.databricks.com".
-
Optional: Enter the initial Catalog and/or Schema to use for this session. If you include schema, you may need to include the corresponding catalog so it can be identifiable.
-
-
Enter the Data Source Name and a Description to explain the purpose of the Data Source to others.
-
Enter your Personal Access Token to connect to Databricks. The Domino secret store backed by HashiCorp Vault securely stores the credentials.
NoteIf this Personal Access Token expires after you created the Data Source, you can generate a new Personal Access Token on Databricks and edit this field with the new value. -
Click Test Credentials to validate the authentication.
-
Select who can view and use the Data Source in projects.
-
Click Finish Setup.
If your users have Domino permissions to use the Data Source and enter their credentials, they can now use the Domino Data API to retrieve data with the connector.
See Retrieve data for more information.
Note
| Domino recommends that you connect to Databricks using Data Sources to take advantage of flexible access controls, the Domino Data API, and receive technical support. However, you can also connect using the Databricks SDK if your business requires it. This SDK allows for Databricks workspace and compute management among other data operations. |
Prepare your workspace environment
-
Ensure you have a workspace environment that uses a Python compatible workspace IDE like Jupyter Notebook.
-
Add Python 3.8 or above and the Databricks SDK package "databricks-sdk" to the environment. For more information, please see Add Packages to Environments.
Note
| You may want to use the Domino Standard Environment base image as a starting environment, which is compatible with Python workspace IDEs and contains a Python version above 3.8. |
Databricks authentication.
-
Set up a Databricks configuration profile.
-
Add the ".databrickscfg" file mentioned in the link above to your project.
-
Set the DATABRICKS_CONFIG_PROFILE environment variable to be the name of the custom configuration profile you want to use.
-
Launch your workspace.
-
Add the following lines at the top of your code file:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
Alternatively, set environment variables for your workspace using your Databricks workspace host, token, and any other relevant configurations. Pass them in as parameters when calling the WorkspaceClient function:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient(
host = MY_HOST_VARIABLE,
token = MY_TOKEN_VARIABLE
)
The Databricks SDK is ready to be used in your workspace. For more information, please see the Databricks SDK documentation.
-
After connecting to your Data Source, learn how to Use Data Sources.
-
Share this Data Source with your collaborators.