Architecture
LibreDataHub uses Docker Compose for orchestration, providing a simple deployment model without requiring Kubernetes. The platform integrates multiple open-source tools in a unified environment with shared resources and unified database access.
The architecture is designed for:
- Single-server efficiency: Maximum productivity from one powerful machine
- Project-based isolation: Secure multi-tenant environment with role-based access control
- Unified database access: Same database connections available across all tools
- Resource sharing: Intelligent resource management across users and projects
JupyterLab
Jupyter is a web application used to program in over 40 programming languages, including Python, Julia, Ruby, R and Scala. Jupyter lets you create notebooks, i.e. programs containing both Markdown text and code in Julia, Python, R…
These code notebooks are used in data science to explore and analyze data. LibreDataHub provides multiple Python versions (3.8, 3.11, 3.13) with persistent user libraries and settings.

RStudio
RStudio is an integrated development environment (IDE) for R. LibreDataHub provides persistent R libraries across sessions, with seamless database connectivity to PostgreSQL and DuckDB.

Code-server (VS Code)
Code-server brings the full VS Code experience to your browser. It shares library environments with Jupyter and RStudio, providing a consistent development experience across all tools.

LinkR
At the heart of LibreDataHub lies LinkR, an open-source web application developed by InterHop.
LinkR enables users to access, manipulate and analyze healthcare data with low-code tools, i.e. without the need for extensive programming skills. LinkR uses the common data model OMOP to facilitate code exchange between multiple centers.
It provides both a graphical interface for clinicians and a programming environment for data scientists, making it ideal for collaborative healthcare projects.

MyST
MyST enables direct notebook-to-dashboard conversion. Project users can deploy Jupyter notebooks as interactive dashboards accessible via web browser, making it easy to share analyses and visualizations.

Ollama
Ollama provides local LLM inference capabilities with GPU acceleration. When enabled, Ollama allows users to run large language models locally for AI-powered analytics and research workflows.
Apache Airflow
Apache Airflow is a workflow orchestration platform. LibreDataHub provides a shared Airflow instance where users can deploy DAGs (Directed Acyclic Graphs) for scheduling and automating data pipelines.

CloudBeaver
CloudBeaver is a lightweight web application for working with different types of databases, all through a single, secure cloud solution accessible via a browser.

PostgreSQL with CitusDB
LibreDataHub uses Citus - a PostgreSQL extension that adds distributed database capabilities. Even on a single node, Citus provides:
- Sharding capabilities: Horizontal partitioning for large tables
- Columnar compression: Efficient storage for analytics workloads
- Performance monitoring: Built-in
pg_stat_statementsextension
Each project has its own database with 3 schemas:
<username>: personal schema, accessible from user onlyprivate: shared schema, only writable by adminspublic: shared schema, writable by any project member
The same database connections are available across Jupyter, RStudio, Code-server, and CloudBeaver.
DuckDB
LibreDataHub also includes DuckDB, an opensource database management system designed for data analysis.
Its support for columnar storage formats such as Parquet enables seamless integration with other LibreDataHub components, such as Jupyter Notebooks or LinkR, enabling high-performance data queries directly in the search environment.
Grafana
Grafana is an open-source web application that lets users create dynamic, customizable dashboards to visualize data.
LibreDataHub uses Grafana to monitor:
- Application usage per user
- Server resources (CPU/RAM/disk/network)
- Application resource usage (CPU/RAM)
- PostgreSQL metrics
The monitoring stack includes a Prometheus federation endpoint for external monitoring integration.
