Datalore 2023.6 Help

Use cloud storage data sources

Datalore provides interface for mounting S3 or Google Cloud Storage buckets, as well as SMB/CIFS folders, directly inside your notebooks. Here are the main benefits of using such data sources:

  • You extend your storage as you do not have to upload to notebooks those files that you store in buckets.

  • You can grant read or write access to the files stored in buckets.

To attach a cloud storage data source to a notebook, create a connection for the respective data source type. Once created, the data source is automatically attached to the notebook from which you set up the connection.

Procedure overview

Create, attach, and manage cloud storage data sources

Attach an Amazon S3 bucket

  1. Open the notebook.

  2. Go to Main menu | Tools | Attached data or click the Attached data icon on the left-hand sidebar.

  3. Click New connection and select Cloud storage.

    New connection options
  4. In the New connection dialog, select Amazon S3.

    New cloud storage dialog
  5. In the New Amazon S3 cloud storage connection dialog, fill in the following fields:

    • Display name: to specify the name for this data source in your system

    • AWS access key and AWS secret access key: to access your AWS account (details here)

    • Region: to specify your AWS region

    • Amazon Bucket name: to specify the name of the bucket you want to mount

    • Custom options: to specify additional parameters. See the example below

    • Endpoint URL: to specify the website of the bucket you want to mount

    New Amazon S3 cloud storage connection dialog
  6. (Optional) Click Test connection to make sure the provided parameters are correct.

  7. Click Create data source to finish the procedure.

Attach a Google Cloud Storage bucket

  1. Open the notebook.

  2. Go to Main menu | Tools | Attached data or click the Attached data icon on the left-hand sidebar.

  3. Click New connection and select New cloud storage.

  4. In the New connection dialog, select Google cloud storage.

  5. In the New Google cloud storage connection dialog, fill in the following fields:

    • Display name: to specify the name for this bucket in your system

    • GCS Bucket name: to specify the name of the bucket you want to mount (details here).

    • GCS key file content: to enter the content of the Google service account key file (.json format).

    New Google cloud storage connection
  6. (Optional) Click Test connection to make sure the provided parameters are correct.

  7. Click Create data source to finish the procedure.

Attach an SMB/CIFS folder

  1. Open the notebook.

  2. Go to Main menu | Tools | Attached data or click the Attached data icon on the left-hand sidebar.

  3. Click New connection and select New cloud storage.

  4. In the New connection dialog, select SMB/CIFS.

  5. In the SMB/CIFS cloud storage connection dialog, fill in the following fields:

    • Display name: to specify the name for this data source in your system

    • Host: to specify the SMB server hostname

    • Port: to specify the SMB port

    • Username and Password: to specify the user credentials

    • Domain (Optional): to specify the SMB server domain

    • Custom options: to specify additional parameters

    SMB/CIFS cloud storage connection
  6. (Optional) Click Test connection to make sure the provided parameters are correct.

  7. Click Create data source to finish the procedure.

Add a cloud storage data source to a workspace

This procedure adds a cloud storage data source to your workspace resources without attaching it to any notebook automatically. Such data sources are later available for all notebooks of the respective workspace.

  1. Select Cloud storages from the menu on the left. This will open the Cloud storages list.

    Cloud storages list
  2. Click New connection in the upper right corner of the list.

  3. In the New cloud storage connection, select the cloud storage type.

  4. Proceed by following the steps described for the respective cloud storage type.

Attach an existing cloud storage data source

Cloud storage data sources added to a workspace or attached to a specific notebook are available across the entire workspace and can be attached to any notebook from it.

  1. Open the notebook.

  2. Select Main menu | Tools | Attached data.

  3. In the Attached data tool, click Select data to attach and select the required data source from the list.

    Attaching a cloud storage

Manage attached cloud storage data sources on the notebook level

  1. Go to Main menu | Tools | Attached data or click the Attached data icon on the left-hand sidebar. You will see all your data sources including attached buckets.

  2. To change access type, click the pencil icon next to the bucket type and select your option (Read-only access or Read-write access).

    Cloud storage access options
  3. To add file to the bucket storage, click Upload files, select your option, and complete the procedure accordingly. You csn find more details about uploading or creating file in Attached files.

  4. Click the ellipsis to use more options:

    • Open detail view: to view and manage the stored files

    • Edit cloud storage: to edit the connection parameters for the storage

    • Copy directory path: to copy the path to the storage to the clipboard

    • Connect using boto3: to connect to the cloud storage using boto3 in the notebook code (for Amazon S3 only)

    • Pass credentials to env: to pass the cloud storage credentials to the environment

    • Detach cloud storage: to detach the storage from the notebook

    Manage cloud storage options

Manage cloud storage data sources on the workspace level

  1. Select Cloud storage from the menu on the left. This will open the list of your bucket data sources.

  2. To edit the parameters of a data source, click the respective list item and make your changes in the Edit [data_source_name] connection dialog.

  3. To rename a data source, right-click the list item, select Rename from the menu, and provide a new name.

  4. To clone the item to other workspaces:

    1. Right-click the item.

    2. Select Clone to other workspaces.

      Cloe to other workspaces option
    3. In the Clone [data_source_name] to other workspaces dialog, expand the Workspaces dropdown list.

    4. Select the workspaces where you want to clone the data source and click anywhere outside the dropdown.

      Clone dialog
    5. Click the Clone button. This will close the dialog, followed by a success notification.

  5. To delete a data source, right-click the respective list item and select Delete from the menu.

Examples of using the Customs options

Enable SSE-C for S3 data sources

The Custom_options is a field used for optional parameters when creating an Amazon S3 data source. Below are two example of how it can be used.

  • To enable SSE-C for S3 data sources, specify the following in the Custom_options: In the Custom_options field, specify the following:

    use_sse=c:/path/to/keys/file

    where:

    /path/to/keys/file is the file that contain keys. Make sure permissions are 600.

  • (For Datalore Enterprise only) To provide access based on a role associated with that of an EC2 instance profile, add public_bucket=0,iam_role into the Custom_options field.

Last modified: 19 December 2023