domino logo
Tech Ecosystem
Get started with Python
Step 0: Orient yourself to DominoStep 1: Create a projectStep 2: Configure your projectStep 3: Start a workspaceStep 4: Get your files and dataStep 5: Develop your modelStep 6: Clean up WorkspacesStep 7: Deploy your model
Get started with R
Step 0: Orient yourself to Domino (R Tutorial)Step 1: Create a projectStep 2: Configure your projectStep 3: Start a workspaceStep 4: Get your files and dataStep 5: Develop your modelStep 6: Clean up WorkspacesStep 7: Deploy your model
Domino Reference
Projects
Projects OverviewProjects PortfolioProject Goals in Domino 4+Jira Integration in DominoUpload Files to Domino using your BrowserFork and Merge ProjectsSearchSharing and CollaborationCommentsCompare File Revisions
Revert Projects and Files
Revert a FileRevert a Project
Archive a Project
Advanced Project Settings
Project DependenciesProject TagsRename a ProjectSet up your Project to Ignore FilesUpload files larger than 550MBExporting Files as a Python or R PackageTransfer Project Ownership
Domino Runs
JobsDiagnostic Statistics with dominostats.jsonNotificationsResultsRun Comparison
Advanced Options for Domino Runs
Run StatesDomino Environment VariablesEnvironment Variables for Secure Credential StorageUse Apache Airflow with Domino
Scheduled Jobs
Domino Workspaces
WorkspacesUse Visual Studio Code in Domino WorkspacesPersist RStudio PreferencesAccess Multiple Hosted Applications in one Workspace SessionUse Domino Workspaces in Safari
Spark on Domino
On-Demand Spark
On-Demand Spark OverviewValidated Spark VersionConfigure PrerequisitesWork with your ClusterManage DependenciesWork with Data
External Hadoop and Spark
Hadoop and Spark OverviewConnect to a Cloudera CDH5 cluster from DominoConnect to a Hortonworks cluster from DominoConnect to a MapR cluster from DominoConnect to an Amazon EMR cluster from DominoRun Local Spark on a Domino ExecutorUse PySpark in Jupyter WorkspacesKerberos Authentication
Customize the Domino Software Environment
Environment ManagementDomino Standard EnvironmentsInstall Packages and DependenciesAdd Workspace IDEs
Advanced Options for Domino Software Environment
Install Custom Packages in Domino with Git IntegrationAdd Custom DNS Servers to Your Domino EnvironmentConfigure a Compute Environment to User Private Cran/Conda/PyPi MirrorsScala notebooksUse TensorBoard in Jupyter WorkspacesUse MATLAB as a WorkspaceCreate a SAS Data Science Workspace Environment
Publish your Work
Publish a Model API
Model Publishing OverviewModel Invocation SettingsModel Access and CollaborationModel Deployment ConfigurationPromote Projects to ProductionExport Model Image
Publish a Web Application
Cross-Origin Security in Domino web appsApp Publishing OverviewGet Started with DashGet Started with ShinyGet Started with Flask
Advanced Web Application Settings in Domino
App Scaling and PerformanceHost HTML Pages from DominoHow to Get the Domino Username of an App Viewer
Launchers
Launchers OverviewAdvanced Launcher Editor
Assets Portfolio Overview
Connect to your Data
Domino Datasets
Datasets OverviewDatasets Best PracticesAbout domino.yamlDatasets Advanced Mode TutorialDatasets Scratch SpacesConvert Legacy Data Sets to Domino Datasets
Data Sources OverviewConnect to Data Sources
Git and Domino
Git Repositories in DominoWork From a Commit ID in Git
Work with Data Best Practices
Work with Big Data in DominoWork with Lots of FilesMove Data Over a Network
Advanced User Configuration Settings
User API KeysOrganizations Overview
Use the Domino Command Line Interface (CLI)
Install the Domino Command Line (CLI)Domino CLI ReferenceDownload Files with the CLIForce-Restore a Local ProjectMove a Project Between Domino DeploymentsUse the Domino CLI Behind a Proxy
Browser Support
Get Help with Domino
Additional ResourcesGet Domino VersionContact Domino Technical SupportSupport Bundles
domino logo
About Domino
Domino Data LabKnowledge BaseData Science BlogTraining
User Guide
>
Domino Reference
>
Connect to your Data
>
Work with Data Best Practices
>
Move Data Over a Network

Move Data Over a Network

When you start run or workspace in Domino, the software and filesystem context for your code is defined by two things:

  • A Domino environment defines the container your run executes in * your project files are mounted at /mnt in the container

Both of these are stored within Domino itself. Domino maintains a versioned repository of your project files, and caches the latest image built from your environment.

There are several circumstances where you may want to retrieve data from a source outside of Domino:

  • When executing code stored in your project files, you may want to retrieve fresh data from an external source for analysis.

  • When building a new revision of your environment, you may want to retrieve and install new dependencies or different versions of existing dependencies.

  • When running a Domino workspace, you may want to retrieve either dependencies or fresh data to advance your experimentation.

In this topic, we’ll introduce some standard tools for moving data from one filesystem to another. Note that all of these require that you have network access to the computer you’re trying to get data from. This can mean accessing a machine over your corporate LAN, or the Internet.

Domino executors run on Linux. All of the tools and examples in this topic are presented for use on a Domino-supported Linux operating system like Ubuntu or RHEL. However, these tools will work in any GNU Bash shell, including the macOS terminal.

These methods are suited to retrieving specific files that are hosted at a URL or stored on a filesystem. If you have a relational database or other data source that doesn’t serve simple files, see the guides about connecting to external data sources.

Wget

Wget is a built-in utility for GNU operating systems that can download files from network locations over HTTP, HTTPS, and FTP. Files that you want to retrieve with Wget must be served over one of those protocols at a URL your machine has access to.

Wget is extremely simple to use. Commands take the form:

wget PROTOCOL://URL

If you need to supply the target web server with a basic username and password for authentication, you can use the --user and --password flags. Here’s a complete example:

wget --user myUsername --password myPassword HTTPS://web.server.url/path/to/file.csv

Many cloud object stores like Amazon S3 and Azure Blob Storage can be configured to serve files at a URL over the Internet. See the first part of the Get Started (Python) tutorial for an example of retrieving data from S3 with Wget. You can also host files on computers in your local network with web servers like Apache or SimpleHTTPServer.

However, Wget is more limited than curl in terms of supported protocols and authentication schemes.

curl

Curl is a tool for making web requests over a wide variety of protocols and with support for many authentication and encryption schemes. Curl can be used to query a web server for a standard HTTP response like you would get from Wget, but it can also be used to construct more complex queries for REST APIs.

Curl requests can become quite complex when passing in many headers or setting many options, but the basic format is similar to wget`:

curl "PROTOCOL://URL"

For example, you can use curl to query the Domino API itself for data about your Domino deployment. Here’s a complete example:

curl --include \
-H "X-Domino-Api-Key: <your-api-key>" \
'https://<your-domino-url>/v4/gateway/runs/getByBatchId'

You can also use curl to download a file from s3 by using the below code. The assumption here is that your s3 bucket resides in us-west-2 region, but you can change that in the url to make sure it reflects the right region in which your s3 bucket is located.

#!/bin/sh
file="<your-file-name>"
bucket="<your-bucket-name>"
resource="/${bucket}/${file}"
contentType="<content-type>"
dateValue="`date +'%a, %d %b %Y %H:%M:%S %z'`"
stringToSign="GET\n\n${contentType}\n${dateValue}\n${resource}"
s3Key=$AWS_ACCESS_KEY_ID
s3Secret=$AWS_SECRET_ACCESS_KEY
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | base64`
curl -H "Host: ${bucket}.s3-us-west-2.amazonaws.com" \
-H "Date: ${dateValue}" \
-H "Content-Type: ${contentType}" \
-H "Authorization: AWS ${s3Key}:${signature}" \
https://${bucket}.s3-us-west-2.amazonaws.com/${file}
Domino Data LabKnowledge BaseData Science BlogTraining
Copyright © 2022 Domino Data Lab. All rights reserved.