Use cloud storage data sources
Datalore provides interface for mounting S3 or Google Cloud Storage buckets, as well as SMB/CIFS folders, directly inside your notebooks. Here are the main benefits of using such data sources:
You extend your storage as you do not have to upload to notebooks those files that you store in buckets.
You can grant read or write access to the files stored in buckets.
To attach a cloud storage data source to a notebook, create a connection for the respective data source type. Once created, the data source is automatically attached to the notebook from which you set up the connection.
Procedure overview
Attach a cloud storage data source to a specific notebook (Amazon S3 and Google Cloud Storage): Explains how to create a cloud storage dats source and attaches it to a specific notebook. The created data source is added to the workspace resources and can be attached to any other notebook.
Add a cloud storage data source to a workspace: Explains how to add a cloud storage data source to the respective workspace so that you can attach such a data source to any notebook from this workspace.
Manage attached cloud storage data sources on the notebook level: Explains how to use Attached data to manage cloud storage data sources attached to a specific notebook. The changes will affect the data source on the workspace level too.
Manage cloud storage data sources on the workspace level: Explains how to manage cloud storage data sources of a specific workspace from the Home page.
Create, attach, and manage cloud storage data sources
Attach an Amazon S3 bucket
Open the notebook.
Go to Attached data icon on the left-hand sidebar.
or click theClick New connection and select Cloud storage.
In the New connection dialog, select Amazon S3.
In the New Amazon S3 cloud storage connection dialog, fill in the following fields:
Display name: to specify the name for this data source in your system
AWS access key and AWS secret access key: to access your AWS account (details here)
Region: to specify your AWS region
Amazon Bucket name: to specify the name of the bucket you want to mount
Custom options: to specify additional parameters. See the example below
Endpoint URL: to specify the website of the bucket you want to mount
(Optional) Click Test connection to make sure the provided parameters are correct.
Click Create data source to finish the procedure.
Attach a Google Cloud Storage bucket
Open the notebook.
Go to Attached data icon on the left-hand sidebar.
or click theClick New connection and select New cloud storage.
In the New connection dialog, select Google cloud storage.
In the New Google cloud storage connection dialog, fill in the following fields:
Display name: to specify the name for this bucket in your system
GCS Bucket name: to specify the name of the bucket you want to mount (details here).
GCS key file content: to enter the content of the Google service account key file (.json format).
(Optional) Click Test connection to make sure the provided parameters are correct.
Click Create data source to finish the procedure.
Attach an SMB/CIFS folder
Open the notebook.
Go to Attached data icon on the left-hand sidebar.
or click theClick New connection and select New cloud storage.
In the New connection dialog, select SMB/CIFS.
In the SMB/CIFS cloud storage connection dialog, fill in the following fields:
Display name: to specify the name for this data source in your system
Host: to specify the SMB server hostname
Port: to specify the SMB port
Username and Password: to specify the user credentials
Domain (Optional): to specify the SMB server domain
Custom options: to specify additional parameters
(Optional) Click Test connection to make sure the provided parameters are correct.
Click Create data source to finish the procedure.
Add a cloud storage data source to a workspace
This procedure adds a cloud storage data source to your workspace resources without attaching it to any notebook automatically. Such data sources are later available for all notebooks of the respective workspace.
Select Cloud storages from the menu on the left. This will open the Cloud storages list.
Click New connection in the upper right corner of the list.
In the New cloud storage connection, select the cloud storage type.
Proceed by following the steps described for the respective cloud storage type.
Attach an existing cloud storage data source
Cloud storage data sources added to a workspace or attached to a specific notebook are available across the entire workspace and can be attached to any notebook from it.
Open the notebook.
Select
.In the Attached data tool, click Select data to attach and select the required data source from the list.
Manage attached cloud storage data sources on the notebook level
Go to Attached data icon on the left-hand sidebar. You will see all your data sources including attached buckets.
or click theTo change access type, click the pencil icon next to the bucket type and select your option (Read-only access or Read-write access).
To add file to the bucket storage, click Upload files, select your option, and complete the procedure accordingly. You csn find more details about uploading or creating file in Attached files.
Click the ellipsis to use more options:
Open detail view: to view and manage the stored files
Edit cloud storage: to edit the connection parameters for the storage
Copy directory path: to copy the path to the storage to the clipboard
Connect using boto3: to connect to the cloud storage using boto3 in the notebook code (for Amazon S3 only)
Pass credentials to env: to pass the cloud storage credentials to the environment
Detach cloud storage: to detach the storage from the notebook
Manage cloud storage data sources on the workspace level
Select Cloud storage from the menu on the left. This will open the list of your bucket data sources.
To edit the parameters of a data source, click the respective list item and make your changes in the Edit [data_source_name] connection dialog.
To rename a data source, right-click the list item, select Rename from the menu, and provide a new name.
To clone the item to other workspaces:
Right-click the item.
Select Clone to other workspaces.
In the Clone [data_source_name] to other workspaces dialog, expand the Workspaces dropdown list.
Select the workspaces where you want to clone the data source and click anywhere outside the dropdown.
Click the Clone button. This will close the dialog, followed by a success notification.
To delete a data source, right-click the respective list item and select Delete from the menu.
Examples of using the Customs options
Enable SSE-C for S3 data sources
The Custom_options is a field used for optional parameters when creating an Amazon S3 data source. Below are two example of how it can be used.
To enable SSE-C for S3 data sources, specify the following in the Custom_options: In the Custom_options field, specify the following:
use_sse=c:/path/to/keys/filewhere:
/path/to/keys/file
is the file that contain keys. Make sure permissions are600
.(For Datalore Enterprise only) To provide access based on a role associated with that of an EC2 instance profile, add
public_bucket=0,iam_role
into the Custom_options field.