Prerequisites and System Requirements
Prerequisites
The following prerequisites are required to run the container:
- Container engine, such as Docker (can be installed using the official instructions )
- (GPU only) Nvidia Container Toolkit with Nvidia driver version 515 or higher (can be installed using the following installation guide )
All other dependencies, such as CUDA are included with the container and don't need to be installed separately.
System Requirements
The image comes in two different build flavours: a compact, CPU-only container that runs on any Intel or AMD CPU and a container with GPU acceleration. The CPU container is highly optimised for the majority of use cases, as the container uses hand-coded AVX2/AVX512/AVX512 VNNI instructions in conjunction with Neural Network compression techniques to deliver a ~25X speedup over a reference transformer-based system. The GPU container is designed for large-scale deployments making billions of API calls or processing terabytes of data per month.
Minimum Requirements
The minimum system requirements for the Docker image are as follows:
Minimum | Recommended | Recommended Concurrency | |
---|---|---|---|
CPU | Any x86 (Intel or AMD) processor with 6GB RAM | Intel Cascade Lake or newer CPUs supporting AVX512 VNNI with 8GB RAM | 1 |
GPU | Any x86 (Intel or AMD) processor with 28GB RAM. Nvidia GPU with compute capability 6.0 or higher (Pascal or newer) | Any x86 (Intel or AMD) processor with 32GB RAM and Nvidia Tesla T4 GPU | 32 |
The Private AI image can also run on the new Apple chips, such as the M1. Performance will be degraded however, due to the Rosetta 2 emulation of the AVX instructions. Native ARM CPU builds and builds requiring only 1-2GB RAM can also be delivered upon request.
Recommended Requirements
While Private AI CPU-based container will run on any x86-compatible instance, the below cloud instance types give optimal throughput and latency per dollar:
Platform | Recommended Instance Type | Description |
---|---|---|
Azure | Standard_E2_v5 | 2x Intel Ice Lake vCPUs, 16GB RAM |
AWS | M5zn.large (2 vCPU, 8GB RAM) | 2x Intel Cascade Lake vCPUs, 8 GB RAM |
GCP | N2-Standard-2 (2 vCPU, 8GB RAM) | 2x Intel Cascade Lake or Ice Lake vCPUs, 8 GB RAM |
Note: In the event lower latency is required, the instance type should be scaled; e.g. using an M5zn.2xlarge in place of a M5zn.xlarge. While the Private AI docker solution can make use of all available CPU cores, it delivers best throughput per dollar using a single CPU core machine. Scaling CPU cores does not result in a linear increase in performance.
Similarly for the GPU-based image, Private AI recommends the following instance types:
Platform | Recommended Instance Type | Description |
---|---|---|
Azure | Standard_NC8as_T4_v3 | 8x AMD EPYC 7V12(Rome) vCPUs, 56GB RAM, Nvidia Tesla T4 GPU |
AWS | G4dn.2xlarge | 8x Intel Cascade Lake vCPUs, 32GB RAM, Nvidia Tesla T4 GPU |
GCP | N1-Standard-8 + Tesla T4 | 8x Intel Skylake vCPUs, 32GB RAM, Nvidia Tesla T4 GPU |
The aforementioned instance types were selected based on extensive benchmarking performed by Private AI. Please see Benchmarks for some performance numbers.
Note: Please only run one container instance per CPU or GPU. Running multiple containers results in vastly reduced performance!