Benchmarks
Throughput and latency of the Private AI container varies greatly based on the type of hardware provisioned to the container. It is highly recommended to use the hardware specified in the system requirements.
CPU
The below table illustrates the performance of the CPU container on various AWS instance types:
Instance Type | Throughput (requests/sec) | Latency (ms) |
---|---|---|
c5.large | 1.61 | 620 |
c5a.large | 1.23 | 816 |
m5.large | 1.43 | 698 |
m5n.large | 3.69 | 271 |
m5zn.large | 5.03 | 199 |
For best throughput, it is recommended to use single logical core workers. The below table illustrates the scaling efficiency when running the container on multiple CPU cores:
Instance Type | Logical CPU Cores | Throughput (requests/sec) | Latency (ms) | Scaling Efficiency (%) |
---|---|---|---|---|
m5zn.large | 1 | 5.03 | 198.82 | 100 |
m5zn.3xlarge | 6 | 18.78 | 53.2 | 62 |
m5zn.6xlarge | 12 | 24.57 | 40.64 | 41 |
The default accuracy_mode
value high
offers best PII detection performance, however it can be changed to trade PII detection performance for speed:
Accuracy Mode | Throughput (requests/sec) | Latency (ms) |
---|---|---|
standard | 25.83 | 38.67 |
standard high & standard high multilingual | 14.73 | 67.83 |
high & high multilingual | 5.03 | 198.82 |
GPU
Below are benchmarks of the GPU container running on a g4dn.2xlarge
instance when optimized for throughput with 128 concurrency:
Accuracy Mode | Throughput (requests/sec) | Latency (ms) |
---|---|---|
standard | 570 | 198 |
standard high & standard high multilingual | 507 | 229 |
high & high multilingual | 210 | 530 |
The above GPU container benchmarks are optimized for throughput. Latency as low as 10ms can be achieved when using a lower number of concurrent requests.
Notes:
-
Unless otherwise stated, tests are run using the default "high"
accuracy_mode
. - All tests are performed with a 100 word/500 character test input.
- Processing time scales linearly to the length of the input text.
- All benchmarks used concurrency settings optimized for throughput.
For the full benchmark report, please contact us.