Install with Docker

It is recommended to use containers like Docker or orchestration tools like Kubernetes to install Cafe Variome V4 in production. This guide provides instructions to install Cafe Variome V4 using Docker Compose.

Prerequisites

Before you begin, ensure you have the following software installed:

Docker
Docker Compose

And all dependency services required by Cafe Variome V4 must be reachable from your docker containers. This may require firewall configurations, or network bridges. Some of them might also be deployed as docker containers, in which case you can use Docker Compose to manage them together with Cafe Variome V4. We do not provide instructions for installing these dependencies in this guide, please ensure to check their latest documentation for installation instructions and security best practices.

CV4 expects all dependencies to be available and properly configured before the application starts. It will wait for the dependencies to be ready for a limited time before it gives up and exits. For some edge cases, it might think the dependencies are ready when they are not, and this will lock it into a non-functioning state. If you plan to use docker for all dependencies in the same compose file, ensure to set up proper health checks and waiting strategies for the containers. An orchastration tool with proper support for dependency management and health checks, like Kubernetes, is highly recommended for production deployment.

Docker Compose File

Create a docker-compose.yml file with the following content, which contains development settings for local dependencies. In production, you should adjust the settings according to your needs, especially the security critical ones.

docker-compose.yml
services:
  cv4-backend:
    # This image is built for CPU, if you have GPU and want to use it, you can use the gpu variant instead.
    # The GPU image is built with CUDA 12.6. If you need another version of CUDA, you can build the image yourself by modifying the Dockerfile in our GitHub repository.
    image: ghcr.io/cafevariomeuol/cafe-variome-4:latest-cpu
    container_name: cv4-backend
    restart: unless-stopped
    depends_on:
      mongodb:
        condition: service_healthy
      redis:
        condition: service_healthy
      qdrant:
        condition: service_started
      vault:
        condition: service_healthy
    ports:
      - "5000:5000"
    volumes:
      - ./conf:/app/conf
      - ./backend-data:/data
    environment:
      - CV_APPLICATION_NAME="Cafe Variome V4"
      - CV_APPLICATION_DESCRIPTION="A federated platform for sharing and discovering healthcare data."
      - CV_ACCESS_URL=https://www.your-cafe-variome-instance.com
      - CV_ACCESS_TOKEN_TTL=300
      - CV_REFRESH_TOKEN_TTL=7200
      - CV_ALLOW_REGISTRATION=true
      - CV_SSO_REGISTRATION=true
      - CV_LOGGING_LEVEL=INFO
      - CV_ENABLE_METRICS=false
      - CV_ENABLE_ERROR_REPORTING=false
      - CV_ENABLE_PROFILING=false
    #   - CV_SENTRY_DSN=
      - CV_DATA_DIR="data"
      - CV_ALLOW_ORIGINS="[]"
      - CV_SERVICE_ROOT_PATH="/api"
      - CV_DATABASE_DRIVER=mongo
      - CV_MONGODB_HOST=mongodb
      - CV_MONGODB_PORT=27017
      - CV_MONGODB_DB_NAME=cafe-variome
      - CV_CACHE_DRIVER=redis
      - CV_REDIS_HOST=redis
      - CV_REDIS_PORT=6379
      - CV_KMS_DRIVER=vault
      - CV_VAULT_URL=http://vault:8200
      - CV_VAULT_KV_ENGINE=kv
      - CV_VAULT_KV_PATH=cafe-variome
      - CV_VAULT_TRANSIT_ENGINE=transit
      - CV_VAULT_TOTP_ENGINE=totp
      - CV_EMAIL_DRIVER=smtp
      - CV_SMTP_HOST=your.smtp.server
      - CV_SMTP_PORT=25
      - CV_SMTP_USERNAME=
      - CV_SMTP_START_TLS=false
      - CV_SMTP_USE_TLS=false
      - CV_SMTP_FROM_EMAIL=no-reply@cafevariome.org
      - CV_SENDMAIL_BINARY="sendmail"   # Our image does not include a pre-configured MTA, so if you use docker for the backend, it is recommended to use an external SMTP server for email sending.
      - CV_MTA_FROM_EMAIL=no-reply@cafevariome.org
      - CV_MTA_ENVELOPE_FROM=
      - CV_VECTOR_DB_DRIVER=qdrant
      - CV_VECTOR_CHUNKING_MAX_TOKENS=200
      - CV_VECTOR_CHUNKING_OVERLAP=0.1
      - CV_VECTOR_TOKENIZER_MODEL_NAME=bert-base-uncased
      - CV_VECTOR_EMBEDDING_MODEL_NAME=all-MiniLM-L6-v2
      - CV_QDRANT_LOCATION=http://qdrant:6333
      - CV_CHALLENGE_DIFFICULTY=22
      - CV_SIMILARITY_URL="https://similarity.cafevariome.org"
    secrets:
      - vault_role_id
      - vault_secret_id
      - smtp_password

  cv4-frontend:
    image: ghcr.io/cafevariomeuol/cv4-frontend:latest
    container_name: cv4-frontend
    restart: unless-stopped
    depends_on:
      - cv4-backend:
        condition: service_healthy
    ports:
      - "80:80"

  mongodb:
    # It's recommended to use a dedicated MongoDB compatible service if you need to run in production, like MongoDB Atlas or Cosmos DB.
    image: mongo:8.2
    container_name: mongodb
    restart: unless-stopped
    # In production, you should set up authentication for MongoDB
    # environment:
    #   - MONGO_INITDB_ROOT_USERNAME=admin
    #   - MONGO_INITDB_ROOT_PASSWORD=your_secure_password
    # We also do not expose the port here on host, add it if you need to access MongoDB from outside the docker network
    # ports:
    #   - "27017:27017"
    volumes:
      - ./mongodb-data:/data/db
    healthcheck:
      test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping').ok", "--quiet"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

  redis:
    image: redis:8.6
    container_name: redis
    restart: unless-stopped
    command: redis-server --save "" --appendonly no
    # ports:
    #   - "6379:6379"
    # Persistent storage is not necessary for CV4, as we wipe the cache on every startup. It is, however, recommended to enable if using metrics functions.
    # volumes:
    #   - ./redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

  qdrant:
    image: qdrant/qdrant:v1.17
    restart: always
    container_name: qdrant
    # ports:
    #   - "6333:6333"
    configs:
      - source: qdrant_config
        target: /qdrant/config/production.yaml
    volumes:
      - ./qdrant-data:/qdrant/storage
    # Qdrant does not support health checks as the image lacks even the basic tools to perform them.

  vault:
    # This is a development configuration for Vault. Never use this in production, as it is not secure at all and is using file based storage. You should configure the vault credentials and secret engines properly before connecting it to CV4.
    image: hashicorp/vault:1.21.4
    # Vault 2.x releases are not tested. Any compatibility is not guaranteed and is likely coincidental. We will migrate to Vault 2.x once we have confirmed the compatibility and stability of the new version. This will be a breaking change, and we will provide a migration guide when the time comes.
    container_name: vault
    restart: always
    ports:
      # Port is exposed because you would need to connect to it to set up the secret engines and app role. In production, you should set up the vault separately and securely, and only allow the CV4 backend to connect to it, without exposing the port to public.
      - "8200:8200"
    cap_add:
      - IPC_LOCK
    environment:
      VAULT_LOCAL_CONFIG: '{"storage": {"file": {"path": "/vault/file"}}, "listener": [{"tcp": { "address": "0.0.0.0:8200", "tls_disable": true}}], "default_lease_ttl": "168h", "max_lease_ttl": "720h", "ui": true}'
    command: server
    volumes:
      - ./vault-data/file:/vault/file
    healthcheck:
      test: ["CMD", "vault", "status", "-address=http://127.0.0.1:8200"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s

configs:
  qdrant_config:
    content: |
      log_level: INFO

secrets:
  vault_role_id:
    file: ./secrets/vault_role_id.txt
  vault_secret_id:
    file: ./secrets/vault_secret_id.txt
  smtp_password:
    file: ./secrets/smtp_password.txt

The backend configuration files are expected to be in the ./conf directory, and the backend application data will be stored in the ./backend-data directory. You can change these paths as needed. The configurations are done using environment variables in the .env file, which should be placed in the ./conf directory. They can also be overridden in the docker-compose.yml file using the environment section. The environment variables in docker will take precedence over those in the .env file. For security critical values, it is recommended to set them using docker secrets. Examples are given above. The values of the secrets should be put into the specified files directly, and the backend will pick them up as it initialises.

Prerequisites​

Docker Compose File​

Prerequisites

Docker Compose File