Documentation Index
Fetch the complete documentation index at: https://docs.getlimina.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Please only run one container instance per machine. Running multiple containers results in vastly reduced performance.
- Container engine, such as Docker (can be installed using the official instructions)
- (GPU only) Nvidia Container Toolkit with Nvidia driver version 515 or higher (can be installed using the following installation guide)
System Requirements
The image comes in two different build flavours:- A compact, CPU-only container that runs on any Intel or AMD CPU and a container with GPU acceleration. The CPU container is highly optimised for the majority of use cases, as the container uses hand-coded AMX/AVX2/AVX512/AVX512 VNNI instructions in conjunction with Neural Network compression techniques to deliver a ~25X speedup over a reference transformer-based system.
- A GPU container is designed for large-scale deployments making billions of API calls or processing terabytes of data per month.
Minimum Requirements
The minimum system requirements for the container image are as follows:| Minimum | Recommended (Text only) | Recommended (All Features) | Recommended Request Concurrency | |
|---|---|---|---|---|
| CPU | Any x86 (Intel or AMD) processor with 7.5GB free RAM and 50GB disk volume | Intel Sapphire Rapids or newer CPUs supporting AMX with 16GB RAM and 50GB disk volume | Intel Sapphire Rapids or newer CPUs supporting AMX with 64GB RAM and 100GB disk volume | 1 |
| GPU | Any x86 (Intel or AMD) processor with 28GB free RAM. Nvidia GPU with compute capability 7.0 or higher (Volta or newer) and at least 16GB VRAM. 100GB disk volume | Any x86 (Intel or AMD) processor with 32GB RAM and Nvidia Tesla T4 GPU. 100GB disk volume | Any x86 (Intel or AMD) processor with 64GB RAM and Nvidia Tesla T4 GPU. 100GB disk volume | 16 |
Recommended Requirements
CPU
While Limina CPU-based container will run on any x86-compatible instance, the below cloud instance types give optimal throughput and latency per dollar:| Platform | Recommended Instance Type (Text only) | Recommended Instance Type (All Features) |
|---|---|---|
| Azure | Standard_E2_v5 (2 vCPUs, 16GB RAM) | Standard_E8_v5 (8 vCPUs, 64GB RAM) |
| AWS | m7i.xlarge (4 vCPUs, 16GB RAM) | m7i.4xlarge (16 vCPUs, 64GB RAM) |
| GCP | n2-standard-4 (4 vCPUs, 16GB RAM) | n2-standard-16 (16 vCPUs, 64GB RAM) |
In the event when a lower latency is required, the instance type should be scaled; e.g. using an m7i.2xlarge in place of a m7i.xlarge. While the Limina container solution can make use of all available CPU cores, it delivers best throughput per dollar using a single CPU core machine. Scaling CPU cores does not result in a linear increase in performance.
GPU
Similarly for the GPU-based image, Limina recommends the following Nvidia T4 GPU-equipped instance types:| Platform | Recommended Instance Type (Text only) | Recommended Instance Type (All Features) |
|---|---|---|
| Azure | Standard_NC4as_T4_v3 (4 vCPUs, 28GB RAM) | Standard_NC8as_T4_v3 (8 vCPUs, 56GB RAM) |
| AWS | g4dn.2xlarge (8 vCPUs, 32GB RAM) | g4dn.4xlarge (16 vCPUs, 64GB RAM) |
| GCP | n1-standard-8 + Tesla T4 (8 vCPUs, 30GB RAM) | n1-standard-16 + Tesla T4 (16 vCPUs, 60GB RAM) |