Architecting a Multi-Node Home Data Center for Local LLM Deployment
A detailed look at a heterogeneous home server cluster designed for high-VRAM workloads, featuring a mix of NVIDIA RTX 30-series and 50-series GPUs across multiple compute nodes to support large-scale model inference and embedding tasks.
Hardware Configuration and Cluster Topology
The deployment consists of four distinct compute systems, each tailored for specific workloads ranging from heavy VRAM-intensive inference to dedicated embedding generation. The architecture leverages a combination of AMD Threadripper, Intel Xeon, and consumer-grade CPUs to balance throughput and efficiency.
Node 1: High-VRAM Inference Engine
The primary workstation is powered by an AMD Threadripper 3960X (24 cores) and 128GB of DDR4 RAM. This system is equipped with four NVIDIA RTX 3090 Ti GPUs. To mitigate the significant power draw—reaching nearly 2000W under full load—the system utilizes dual Power Supply Units (PSUs) to maintain electrical stability.
Node 2: Mid-Range Scaling Node
The second system utilizes an Intel Xeon 8352 (36 cores) paired with 128GB of DDR4 RAM. For acceleration, it employs four NVIDIA RTX 5070 Ti GPUs, providing a balance between core count and modern GPU architecture.
Node 3: Dedicated Embedding Server
The third node is centered around an Intel Core i7-14700K (24 cores) and 64GB of DDR5 RAM, featuring a single NVIDIA RTX 5090. This specific node is primarily utilized for running embedding models, leveraging the high performance of the 5090 for vectorization tasks. Notably, the CPU used in this configuration is an engineering sample.
Node 4: Auxiliary Compute Node
The final system consists of an AMD Ryzen 5950X (16 cores) with 64GB of DDR4 RAM and two NVIDIA RTX 5070 Ti GPUs, serving as additional compute capacity for the home data center.
Technical Considerations and Stability
A critical aspect of this build is the power management of the first node. The decision to implement dual PSUs was necessary to handle the peak power demands of the quad 3090 Ti configuration. According to the operator, the system has maintained stability for approximately one month of operation under these conditions.
Note: The provided source material is a hardware specification list; detailed software stack, interconnects, or specific model quantization methods are not specified.
Original Source