Miten rakensimme oman konesaliratkaisun

Introduction

Public cloud services are often the fastest and easiest option, but for us they were not the whole story. We wanted an environment where performance, cost control, security, and operational visibility remain firmly in our own hands. That is why we decided to build our own data center infrastructure, based on open-source technologies and a modern hyperconverged architecture.

In this post, we explain how we built a Proxmox HA cluster backed by a Ceph storage system, and why this combination became the foundation of our platform.

Goals and Design Principles

At the start of the project, we defined a clear set of goals:

High availability (HA) with no single points of failure
Horizontal scalability, allowing capacity to grow node by node
Strong performance for demanding workloads
Full control over our data
Open-source technologies without vendor lock-in

Based on these requirements, Proxmox and Ceph quickly emerged as the strongest candidates.

Our Own Rack in a Data Center

The environment is built in our own 19” rack located in a colocation data center. This means that all hardware is owned and managed by us, while benefiting from professional data center facilities, including:

redundant power delivery
controlled cooling
redundant network connectivity
physical security

This setup combines the flexibility of owning our hardware with the reliability of a professional data center.

Hyperconverged Architecture

We chose a hyperconverged model, where:

each server node provides both compute and storage resources
no separate SAN or NAS system is required
performance and capacity scale together as new nodes are added

This approach simplifies the overall architecture and removes traditional single points of failure.

Proxmox VE – The Virtualization Layer

Virtualization is built on Proxmox Virtual Environment, which uses the KVM hypervisor under the hood. Proxmox provides:

centralized cluster management
built-in high availability for virtual machines
live migration between nodes
native integration with Ceph storage

Virtual machines can be migrated between nodes without downtime, and in case of hardware failure, Proxmox HA automatically restarts workloads on healthy nodes.

Ceph – Distributed Storage at Scale

Storage is provided by Ceph, a fully distributed storage system running across the Proxmox cluster.

Key benefits of Ceph in our setup include:

data replication across multiple nodes
no dependency on a single disk or storage controller
excellent performance using NVMe drives
seamless integration with Proxmox

If a node or disk fails, data remains available and the cluster continues operating without service interruption.

High Availability in Practice

By combining Proxmox HA with Ceph replication, we achieve an environment where:

the failure of a single server does not impact service availability
maintenance can be performed without downtime
unplanned outages are minimized

This level of resilience is essential for business-critical services that must be available around the clock.

GPU Acceleration and Local LLM Workloads

As part of the overall platform, the cluster is also equipped with GPU acceleration, which we use to run local large language models (LLMs). This allows us to deliver AI-powered services without relying on external cloud providers.

The GPUs are integrated into the Proxmox environment in a way that:

virtual machines can access GPU resources directly (PCIe passthrough or vGPU)
compute resources can be allocated flexibly across workloads
AI workloads run close to the data, reducing latency

Running LLMs locally provides several advantages:

improved data privacy and compliance
predictable and controlled costs
the ability to fine-tune and optimize models for our specific use cases

This makes our data center not only highly available and scalable, but also AI-ready by design.

Conclusion

By building our own data center platform on a Proxmox HA cluster with Ceph storage and GPU acceleration, we have created a robust foundation for business-critical services.

The environment is:

scalable
fault-tolerant
fully under our control

Most importantly, it is designed to grow with our needs—both today and in the future.

If you would like to learn more about the architecture or our real-world experience with Proxmox, Ceph, and local AI workloads, feel free to get in touch.

Antti Koskela

Youlearn it oy