Install with Docker
It is recommended to use containers like Docker or orchestration tools like Kubernetes to install Cafe Variome V4 in production. This guide provides instructions to install Cafe Variome V4 using Docker Compose.
Prerequisites
Before you begin, ensure you have the following software installed:
- Docker
- Docker Compose
And all dependency services required by Cafe Variome V4 must be reachable from your docker containers. This may require firewall configurations, or network bridges. Some of them might also be deployed as docker containers, in which case you can use Docker Compose to manage them together with Cafe Variome V4. We do not provide instructions for installing these dependencies in this guide, please ensure to check their latest documentation for installation instructions and security best practices.
CV4 expects all dependencies to be available and properly configured before the application starts. It will wait for the dependencies to be ready for a limited time before it gives up and exits. For some edge cases, it might think the dependencies are ready when they are not, and this will lock it into a non-functioning state. If you plan to use docker for all dependencies in the same compose file, ensure to set up proper health checks and waiting strategies for the containers. An orchastration tool with proper support for dependency management and health checks, like Kubernetes, is highly recommended for production deployment.
Docker Compose File
Create a docker-compose.yml file with the following content, which contains development settings for local dependencies. In production, you should adjust the settings according to your needs, especially the security critical ones.
services:
cv4-backend:
# This image is built for CPU, if you have GPU and want to use it, you can use the gpu variant instead.
# The GPU image is built with CUDA 12.6. If you need another version of CUDA, you can build the image yourself by modifying the Dockerfile in our GitHub repository.
image: ghcr.io/cafevariomeuol/cafe-variome-4:latest-cpu
container_name: cv4-backend
restart: unless-stopped
depends_on:
mongodb:
condition: service_healthy
redis:
condition: service_healthy
qdrant:
condition: service_started
vault:
condition: service_healthy
ports:
- "5000:5000"
volumes:
- ./conf:/app/conf
- ./backend-data:/data
environment:
- CV_APPLICATION_NAME="Cafe Variome V4"
- CV_APPLICATION_DESCRIPTION="A federated platform for sharing and discovering healthcare data."
- CV_ACCESS_URL=https://www.your-cafe-variome-instance.com
- CV_ACCESS_TOKEN_TTL=300
- CV_REFRESH_TOKEN_TTL=7200
- CV_ALLOW_REGISTRATION=true
- CV_SSO_REGISTRATION=true
- CV_LOGGING_LEVEL=INFO
- CV_ENABLE_METRICS=false
- CV_ENABLE_ERROR_REPORTING=false
- CV_ENABLE_PROFILING=false
# - CV_SENTRY_DSN=
- CV_DATA_DIR="data"
- CV_ALLOW_ORIGINS="[]"
- CV_SERVICE_ROOT_PATH="/api"
- CV_DATABASE_DRIVER=mongo
- CV_MONGODB_HOST=mongodb
- CV_MONGODB_PORT=27017
- CV_MONGODB_DB_NAME=cafe-variome
- CV_CACHE_DRIVER=redis
- CV_REDIS_HOST=redis
- CV_REDIS_PORT=6379
- CV_KMS_DRIVER=vault
- CV_VAULT_URL=http://vault:8200
- CV_VAULT_KV_ENGINE=kv
- CV_VAULT_KV_PATH=cafe-variome
- CV_VAULT_TRANSIT_ENGINE=transit
- CV_VAULT_TOTP_ENGINE=totp
- CV_EMAIL_DRIVER=smtp
- CV_SMTP_HOST=your.smtp.server
- CV_SMTP_PORT=25
- CV_SMTP_USERNAME=
- CV_SMTP_START_TLS=false
- CV_SMTP_USE_TLS=false
- CV_SMTP_FROM_EMAIL=no-reply@cafevariome.org
- CV_SENDMAIL_BINARY="sendmail" # Our image does not include a pre-configured MTA, so if you use docker for the backend, it is recommended to use an external SMTP server for email sending.
- CV_MTA_FROM_EMAIL=no-reply@cafevariome.org
- CV_MTA_ENVELOPE_FROM=
- CV_VECTOR_DB_DRIVER=qdrant
- CV_VECTOR_CHUNKING_MAX_TOKENS=200
- CV_VECTOR_CHUNKING_OVERLAP=0.1
- CV_VECTOR_TOKENIZER_MODEL_NAME=bert-base-uncased
- CV_VECTOR_EMBEDDING_MODEL_NAME=all-MiniLM-L6-v2
- CV_QDRANT_LOCATION=http://qdrant:6333
- CV_CHALLENGE_DIFFICULTY=22
- CV_SIMILARITY_URL="https://similarity.cafevariome.org"
secrets:
- vault_role_id
- vault_secret_id
- smtp_password
cv4-frontend:
image: ghcr.io/cafevariomeuol/cv4-frontend:latest
container_name: cv4-frontend
restart: unless-stopped
depends_on:
- cv4-backend:
condition: service_healthy
ports:
- "80:80"
mongodb:
# It's recommended to use a dedicated MongoDB compatible service if you need to run in production, like MongoDB Atlas or Cosmos DB.
image: mongo:8.2
container_name: mongodb
restart: unless-stopped
# In production, you should set up authentication for MongoDB
# environment:
# - MONGO_INITDB_ROOT_USERNAME=admin
# - MONGO_INITDB_ROOT_PASSWORD=your_secure_password
# We also do not expose the port here on host, add it if you need to access MongoDB from outside the docker network
# ports:
# - "27017:27017"
volumes:
- ./mongodb-data:/data/db
healthcheck:
test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping').ok", "--quiet"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
redis:
image: redis:8.6
container_name: redis
restart: unless-stopped
command: redis-server --save "" --appendonly no
# ports:
# - "6379:6379"
# Persistent storage is not necessary for CV4, as we wipe the cache on every startup. It is, however, recommended to enable if using metrics functions.
# volumes:
# - ./redis-data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
qdrant:
image: qdrant/qdrant:v1.17
restart: always
container_name: qdrant
# ports:
# - "6333:6333"
configs:
- source: qdrant_config
target: /qdrant/config/production.yaml
volumes:
- ./qdrant-data:/qdrant/storage
# Qdrant does not support health checks as the image lacks even the basic tools to perform them.
vault:
# This is a development configuration for Vault. Never use this in production, as it is not secure at all and is using file based storage. You should configure the vault credentials and secret engines properly before connecting it to CV4.
image: hashicorp/vault:1.21.4
# Vault 2.x releases are not tested. Any compatibility is not guaranteed and is likely coincidental. We will migrate to Vault 2.x once we have confirmed the compatibility and stability of the new version. This will be a breaking change, and we will provide a migration guide when the time comes.
container_name: vault
restart: always
ports:
# Port is exposed because you would need to connect to it to set up the secret engines and app role. In production, you should set up the vault separately and securely, and only allow the CV4 backend to connect to it, without exposing the port to public.
- "8200:8200"
cap_add:
- IPC_LOCK
environment:
VAULT_LOCAL_CONFIG: '{"storage": {"file": {"path": "/vault/file"}}, "listener": [{"tcp": { "address": "0.0.0.0:8200", "tls_disable": true}}], "default_lease_ttl": "168h", "max_lease_ttl": "720h", "ui": true}'
command: server
volumes:
- ./vault-data/file:/vault/file
healthcheck:
test: ["CMD", "vault", "status", "-address=http://127.0.0.1:8200"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
configs:
qdrant_config:
content: |
log_level: INFO
secrets:
vault_role_id:
file: ./secrets/vault_role_id.txt
vault_secret_id:
file: ./secrets/vault_secret_id.txt
smtp_password:
file: ./secrets/smtp_password.txt
The backend configuration files are expected to be in the ./conf directory, and the backend application data will be stored in the ./backend-data directory. You can change these paths as needed. The configurations are done using environment variables in the .env file, which should be placed in the ./conf directory. They can also be overridden in the docker-compose.yml file using the environment section. The environment variables in docker will take precedence over those in the .env file. For security critical values, it is recommended to set them using docker secrets. Examples are given above. The values of the secrets should be put into the specified files directly, and the backend will pick them up as it initialises.