This topic describes how to connect to Amazon Simple Storage Service (S3) from Domino.
S3 is a cloud object store available as a service from AWS.
The easiest way to connect to a Generic S3 instance from Domino is to use a Domino Data Source.
After a Generic S3 data source is configured, users, who have Domino permissions to use it and have specified their credentials, can use the Domino Data API to retrieve data through the connector.
For more information, see Retrieving data.
Use one of the following methods to authenticate with S3 from Domino. Both follow the common naming convention of environment variables for AWS packages so you do not have to explicitly reference credentials in your code. . Use a short-lived credential file obtained via Domino’s AWS Credential Propagation feature.
+ After your administrator configures this feature, Domino automatically populates any run or job with your AWS credentials file. These credentials will be periodically refreshed throughout the duration of the workspace to make sure they don’t expire.
Following common AWS conventions, you will see an environment variable
AWS_SHARED_CREDENTIALS_FILE which contains the location to your credential files which will be stored at
+ Learn more about using a credential file with the AWS SDK.
Store AWS your access keys securely as environment variables.
To connect to the S3 buckets your AWS account has access to, you’ll need to provide your AWS Access Key and AWS Secret Key to the AWS CLI. By default, AWS utilities will look for these in your environment variables.
Set the following as Domino environment variables on your user account:
Read Environment Variables for Secure Credential Storage to learn more about Domino environment variables.
If you have files in S3 that are set to allow public read access, you can fetch those files with Wget from the OS shell of a Domino executor, the same way you would for any other resource on the public Internet. The request for those files will look similar to this:
This method is simple, but doesn’t allow for any authentication or authorization. Do not use this method with sensitive data.
A more secure method of reading S3 from the OS shell of a Domino executor is the AWS CLI. Making the AWS CLI work from your executor, install it in your environment and give it your credentials.
AWS CLI is available as a Python package from pip. The following Dockerfile instruction is what you’ll need to install the CLI and automatically add it to your system PATH. This instruction assumes you already have pip installed.
USER root RUN pip install awscli --upgrade USER ubuntu
After your Domino environment and credentials are set up correctly, you can fetch the contents of an S3 bucket to your current directory by running:
aws s3 sync s3://<bucket-name> .
If you are using an AWS credential file with multiple profiles, you might need to specify the profile. (the "default" profile is assumed if none is specified)
aws s3 sync s3://<bucket-name> . --profile <profile name>
Read the official AWS CLI documentation on S3 for more commands and options.
The best available library for interacting with AWS services from Python is boto3, which has been officially supported by Amazon since 2012.
There are many methods for interacting with S3 from boto3 detailed in the official documentation. Below is a simple example for downloading a file where:
you have set up your credentials as instructed above === your account has access to an S3 bucket named *my_bucket === the bucket contains an object named *some_data.csv
import boto3 import io import pandas as pd # create new S3 client client = boto3.client('s3') # download some_data.csv from my_bucket and write to ./some_data.csv locally file = client.download_file('my_bucket', 'some_data.csv', './some_data.csv')
Alternatively, for users using a credential file.
import boto3 #Specify your profile if you are credential file contains multiple profiles session = boto3.Session(profile_name='<profile name>') #Specify your bucket name users_bucket = session.resource('s3').Bucket('my_bucket') # 'list' bucket should succeed for obj in users_bucket.objects.all(): print(obj.key) #download a file users_bucket.download_file('some_data.csv', './some_data.csv')
Note that this code does not provide credentials as arguments to the client constructor, since it assumes either:
credentials will be automatically populated at
/var/lib/domino/home/.aws/credentialsas specified in the environment variable
you have already set up credentials in the
After running the above code, you would expect a local copy of some_data.csv to now exist in the same directory as your Python script or notebook. You could follow this up by loading the data into a pandas dataframe.
df = pd.read_csv('some_data.csv')
Check out part 1 of the Get Started (Python) tutorial for a more detailed example of working with CSV data in Python.
If you’re using one of the Domino standard environments, aws.s3 will already be installed. If you want to add aws.s3 to an environment, use the following Dockerfile instructions.
USER root RUN R -e 'install.packages(c("httr","xml2"), repos="https://cran.r-project.org")' RUN R -e 'install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"))' USER ubuntu
You can find basic instructions on using aws.s3 from the package README. Below is a simple example for downloading a file where:
you have set up the correct environment variables with credentials for your AWS account === your account has access to an S3 bucket named *my_bucket === the bucket contains an object named *some_data.csv
# load the package library("aws.s3") #If you are using a credential file and that files has multiple profiles. Otherwise, this can be excluded. Sys.setenv("AWS_PROFILE" = "<AWS profile>") # download some_data.csv from my_bucket and write to ./some_data.csv locally save_object("some_data.csv", file = "./some_data.csv", bucket = "my_bucket")
After running the above code, you would expect a local copy of some_data.csv to now exist in the same directory as your R script or notebook. You can then read from that local file to work with the data it contains.
myData <- read.csv(file="./some_data.csv", header=TRUE, sep=",") View(myData)