Architecture

LibreDataHub uses Docker Compose for orchestration, providing a simple deployment model without requiring Kubernetes. The platform integrates multiple open-source tools in a unified environment with shared resources and unified database access.

The architecture is designed for:

  • Single-server efficiency: Maximum productivity from one powerful machine
  • Project-based isolation: Secure multi-tenant environment with role-based access control
  • Unified database access: Same database connections available across all tools
  • Resource sharing: Intelligent resource management across users and projects

JupyterLab

Jupyter is a web application used to program in over 40 programming languages, including Python, Julia, Ruby, R and Scala. Jupyter lets you create notebooks, i.e. programs containing both Markdown text and code in Julia, Python, R…

These code notebooks are used in data science to explore and analyze data. LibreDataHub provides multiple Python versions (3.8, 3.11, 3.13) with persistent user libraries and settings.

RStudio

RStudio is an integrated development environment (IDE) for R. LibreDataHub provides persistent R libraries across sessions, with seamless database connectivity to PostgreSQL and DuckDB.

Code-server (VS Code)

Code-server brings the full VS Code experience to your browser. It shares library environments with Jupyter and RStudio, providing a consistent development experience across all tools.

LinkR

At the heart of LibreDataHub lies LinkR, an open-source web application developed by InterHop.

LinkR enables users to access, manipulate and analyze healthcare data with low-code tools, i.e. without the need for extensive programming skills. LinkR uses the common data model OMOP to facilitate code exchange between multiple centers.

It provides both a graphical interface for clinicians and a programming environment for data scientists, making it ideal for collaborative healthcare projects.

MyST

MyST enables direct notebook-to-dashboard conversion. Project users can deploy Jupyter notebooks as interactive dashboards accessible via web browser, making it easy to share analyses and visualizations.

Ollama

Ollama provides local LLM inference capabilities with GPU acceleration. When enabled, Ollama allows users to run large language models locally for AI-powered analytics and research workflows.

Apache Airflow

Apache Airflow is a workflow orchestration platform. LibreDataHub provides a shared Airflow instance where users can deploy DAGs (Directed Acyclic Graphs) for scheduling and automating data pipelines.

CloudBeaver

CloudBeaver is a lightweight web application for working with different types of databases, all through a single, secure cloud solution accessible via a browser.

PostgreSQL with CitusDB

LibreDataHub uses Citus - a PostgreSQL extension that adds distributed database capabilities. Even on a single node, Citus provides:

  • Sharding capabilities: Horizontal partitioning for large tables
  • Columnar compression: Efficient storage for analytics workloads
  • Performance monitoring: Built-in pg_stat_statements extension

Each project has its own database with 3 schemas:

  • <username>: personal schema, accessible from user only
  • private: shared schema, only writable by admins
  • public: shared schema, writable by any project member

The same database connections are available across Jupyter, RStudio, Code-server, and CloudBeaver.

DuckDB

LibreDataHub also includes DuckDB, an opensource database management system designed for data analysis.

Its support for columnar storage formats such as Parquet enables seamless integration with other LibreDataHub components, such as Jupyter Notebooks or LinkR, enabling high-performance data queries directly in the search environment.

Grafana

Grafana is an open-source web application that lets users create dynamic, customizable dashboards to visualize data.

LibreDataHub uses Grafana to monitor:

  • Application usage per user
  • Server resources (CPU/RAM/disk/network)
  • Application resource usage (CPU/RAM)
  • PostgreSQL metrics

The monitoring stack includes a Prometheus federation endpoint for external monitoring integration.