Datalore On-Premises security considerations
tip
We strongly advise applying the principle of least privilege when deploying Datalore and its infrastructure.
For example, if AWS is used as an infrastructure provider, it is crucial that for deployment purposes you use a dedicated IAM account, not the AWS root account.
note
This section covers only Datalore On-Premises-specific aspects. See Security for more information about generic security topics, applicable to both Datalore editions.
warning
While Datalore itself does not require any admin or elevated privileges within its runtime environment, its notebook agents are expected to be spawned as privileged containers.
Datalore notebook agent relies on two things which require elevated access to the runtime: CRI-U and FUSE mounts within the containers. Both of these things require at least SYS_ADMIN
capability granted to the runtime, otherwise Reactive mode and attached files won't work properly.
For the same reason, Datalore operational capacity is limited on environments with limited permission scope, like AWS Fargate.
We are looking into ways of reducing the scope of the permissions required. If Datalore is planned to be operated within the communal infrastructure, it's advised to provision a dedicated set of host machines specifically for Datalore compute agents.
Make sure the Postgres' user for provisioning Datalore has CREATE privileges. This ensures proper execution of ALTER TABLE/COLUMN commands derived from Datalore SQL migrations. EXECUTE privilege is also required.
Datalore does not provide any TLS-related options to its end users. Instead, it relies on third-party load balancers (or reverse proxies) to perform such a termination. As a consequence, the Datalore app itself is not normally expected to be user-faced directly without some intermediary proxy deployed next to Datalore.
tip
The same steps are applicable for the Hub deployment, if required.
This procedure describes creating another container with Nginx that will work as a reverse proxy with SSL termination.
note
The code examples are provided for illustration purposes. Pay attention to the comments in the examples.
Edit the docker-compose.yaml file as shown in the example below.
click to expand...
{...}Edit the nginx ssl.conf file as shown in the example below.
click to expand...
{...}
We advise to use either self-acquired certificate and private key (either self-generated or acquired from the trusted certificate authority), or Let's Encrypt as an alternative.
Perform the following steps based on a selected method:
Self-acquired certificate and private keyLet's EncryptCreate a Kubernetes TLS secret, following the official Kubernetes guidance.
Adjust the
datalore.values.yaml
file, as follows, replacingdatalore.example.com
with your actual FQDN you're going to use with Datalore.click to expand
{...}
note
In this guidance, it's assumed that NGINX Ingress Controller is used.
Install CertManager into your Kubernetes cluster.
Create a
letsencrypt.yaml
with the following content, replacing the placeholders as required:click to expand
{...}Apply the manifest:
kubectl apply -f letsencrypt.yaml
Check the
kubectl get issuer
. Eventually, it should become as follows:$kubectl get issuer 1 ↵ NAME READY AGE letsencrypt-prod True 14d
Adjust the
datalore.values.yaml
file, as follows, replacingdatalore.example.com
with your actual FQDN you're going to use with Datalore.click to expand
{...}
Set the
DATALORE_PUBLIC_URL
parameter in the samedatalore.values.yaml
file. Use the same value you provided to replace"https://datalore.example.com"
in the step above.dataloreEnv: DATALORE_PUBLIC_URL: "https://datalore.example.com"
Apply the configuration and restart Datalore.
Check whether the ingress controller registered the changes:
kubectl get ingress
. The expected result is adatalore
ingress with the 443 port exposed.Check whether the certificate is issued:
kubectl get certificates
. The expected output is similar to the one below.$kubectl get certificates NAME READY SECRET AGE datalore-tls True datalore-tls 8m5s
note
This section is applicable for Helm-based deployment only.
Generate the password and store it in the Kubernetes secret, as described below. The
pwgen
tool is used here as an example. You can use any other tool or method to generate a password.$PASSWORD=$(pwgen -N1 -y 32) kubectl create secret generic datalore-db-password --from-literal=DATALORE_DB_PASSWORD="$PASSWORD"
Modify (or add, if not present yet) the
databaseSecret
block in yourdatalore.values.yaml
as follows:databaseSecret: create: false name: datalore-db-password key: DATALORE_DB_PASSWORD
The value of the name value is referring to a secret name defined at the previous step, while the key value is referring to the key within the secret that contains the password.
tip
If, for any reason, you do not want to create a secret manually, you may specify the password in the Helm config file. In this case, the secret will be provisioned automatically - but keep in mind that the password will be stored in plain text in your configuration file.
In that scenario, adjust the
databaseSecret
block in datalore.values.yaml, as follows:databaseSecret: create: true password: xxxx
(Optional) If you are moving from plain text password storage to the secret reference: remove the
password
key with its value from thedatabaseSecret
block.Proceed based on whether this is your fresh deployment or Datalore is already installed.
Fresh deploymentDatalore is already installedProceed with the installation. No further action is required.
Apply the configuration
warning
If you proceed with this step, the Datalore server will restart.
helm upgrade --install -f datalore.values.yaml datalore datalore/datalore --version 0.2.28
Datalore requires a permanent connection to a PostgreSQL database to operate properly. Once Datalore is deployed, the database password is saved within the environment so Datalore can re-use it later once restarted.
However, you might want to change this password later due to various compliance or operational reasons.
tip
Changing the password in PostgreSQL itself is outside of the scope of this guide. Below you will find a guidance on updating the password within Datalore context after having it updated on the database server.
Locate the
values.yaml
file being used for the deployment.Depending on the method used: either replace the password within the
databaseSecret
block, OR update the secret value if the Kubernetes secret is used instead of the plain-text value.Update the Datalore deployment:
helm upgrade --install -f datalore.values.yaml datalore datalore/datalore --version 0.2.28
Locate the
docker-compose.yaml
file being used for the deployment.Update the
DB_PASSWORD
block inenvironment
block.
Perform the following procedures in the Configuration menu of the Admin panel.
Click the avatar in the upper right corner and select Admin panel from the menu.
From the Admin panel, select Configuration.
Select the Force agent SSL checkbox.
note
Backward compatibility with old agents without encryption is supported.
warning
This procedure will enforce Datalore's root CA certificate to be re-generated. As a consequence, all the currently running computations will terminate abnormally once this procedure is completed.
Click the avatar in the upper right corner and select Admin panel from the menu.
From the Admin panel, select Configuration.
Click the Reset secrets button.