Prerequisites and System Requirements

Prerequisites

The following prerequisites are required to run the container:

  • Container engine, such as Docker (can be installed using the official instructions )
  • (GPU only) Nvidia Container Toolkit with Nvidia driver version 515 or higher (can be installed using the following installation guide )

All other dependencies, such as CUDA are included with the container and don't need to be installed separately.

System Requirements

The image comes in two different build flavours: a compact, CPU-only container that runs on any Intel or AMD CPU and a container with GPU acceleration. The CPU container is highly optimised for the majority of use cases, as the container uses hand-coded AVX2/AVX512/AVX512 VNNI instructions in conjunction with Neural Network compression techniques to deliver a ~25X speedup over a reference transformer-based system. The GPU container is designed for large-scale deployments making billions of API calls or processing terabytes of data per month.

Minimum Requirements

The minimum system requirements for the Docker image are as follows:

Minimum Recommended Recommended Concurrency
CPU Any x86 (Intel or AMD) processor with 6GB RAM Intel Cascade Lake or newer CPUs supporting AVX512 VNNI with 8GB RAM 1
GPU Any x86 (Intel or AMD) processor with 28GB RAM. Nvidia GPU with compute capability 6.0 or higher (Pascal or newer) Any x86 (Intel or AMD) processor with 32GB RAM and Nvidia Tesla T4 GPU 32

The Private AI image can also run on the new Apple chips, such as the M1. Performance will be degraded however, due to the Rosetta 2 emulation of the AVX instructions. Native ARM CPU builds and builds requiring only 1-2GB RAM can also be delivered upon request.

Recommended Requirements

While Private AI CPU-based container will run on any x86-compatible instance, the below cloud instance types give optimal throughput and latency per dollar:

Platform Recommended Instance Type Description
Azure Standard_E2_v5 2x Intel Ice Lake vCPUs, 16GB RAM
AWS M5zn.large (2 vCPU, 8GB RAM) 2x Intel Cascade Lake vCPUs, 8 GB RAM
GCP N2-Standard-2 (2 vCPU, 8GB RAM) 2x Intel Cascade Lake or Ice Lake vCPUs, 8 GB RAM

Note: In the event lower latency is required, the instance type should be scaled; e.g. using an M5zn.2xlarge in place of a M5zn.xlarge. While the Private AI docker solution can make use of all available CPU cores, it delivers best throughput per dollar using a single CPU core machine. Scaling CPU cores does not result in a linear increase in performance.

Similarly for the GPU-based image, Private AI recommends the following instance types:

Platform Recommended Instance Type Description
Azure Standard_NC8as_T4_v3 8x AMD EPYC 7V12(Rome) vCPUs, 56GB RAM, Nvidia Tesla T4 GPU
AWS G4dn.2xlarge 8x Intel Cascade Lake vCPUs, 32GB RAM, Nvidia Tesla T4 GPU
GCP N1-Standard-8 + Tesla T4 8x Intel Skylake vCPUs, 32GB RAM, Nvidia Tesla T4 GPU

The aforementioned instance types were selected based on extensive benchmarking performed by Private AI. Please see Benchmarks for some performance numbers.

Note: Please only run one container instance per CPU or GPU. Running multiple containers results in vastly reduced performance!

© Copyright 2022, Private AI.