Backup, migration & restore
In this chapter, you will find general guidance about backing up (and subsequent restoration) of Datalore On-Premises deployment.
There are two places where important data resides:
PostgreSQL database
You can back up and restore data using native database tools (or cloud-native tools, if you are managing a database like Amazon RDS.
Block storage
In both Kubernetes and Docker installation methods, Datalore provides no built-in backup and restore mechanism. Instead, you use underlying infrastructure provider tools to back up the volume used for Datalore.
tip
For example, an EBS snapshot could be used if Datalore is deployed on top of AWS, with EBS used as a storage provider
Sometimes, you might need not to back up but to migrate an existing environment to another (for example, migrating from PoC envs to production, having a different Kubernetes cluster for such cases).
If that's the case, proceed as follows:
Export the metadata from the old environment
Deploy pods to a new environment
Import PersistentVolume/PersistentVolumeClaims metadata
Patch PV with the correct UID of the just created PVC
Patch PVs to prevent them from being automatically deleted:
kubectl patch pv ${PV_NAME} -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
Save metadata:
kubectl get pv/${PV_NAME} --export -o yaml > ${PV_NAME}.yaml; kubectl get pvc/${PVC_NAME} --export -o yaml > ${PVC_NAME}.yaml
Deploy a new Datalore installation to the new cluster; then delete the (new) PVC along with the (new) PV:
kubectl delete pvc/${PVC_NAME}
Kubernetes uses UID to determine the connection between PVC and PV. Therefore, you'll need to create a PV with the metadata from step 3 and patch the PV with the correct UID:
kubectl apply -f ${PVC_NAME}.yaml PVC_UID=$(kubectl get pvc/${PVC_NAME} -o jsonpath='{.metadata.uid}') kubectl apply -f ${PV_NAME}.yaml kubectl patch pv ${PV_NAME} -p "{\"spec\":{\"claimRef\":{\"uid\":\"${PVC_UID}\"}}}"
Stop/remove the old deployment so that you can use this volume in the new cluster.
When deployed in Kubernetes, Datalore uses two volume claims (if installed without Hub):
$ k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
postgresql-data-datalore-0 Bound pvc-2d0f1d24-0ad0-438d-9066-568be44212ca 2Gi RWO gp2 17h
storage-datalore-0 Bound pvc-1d103578-c395-4d89-9b5f-778864d4dfac 10Gi RWO gp2 17h
note
In all following steps, replace ${PV_NAME} and ${PVC_NAME} with an appropriate PV/PVC name.
Thanks for your feedback!