Below are answers to some frequently asked questions we get. Please contact us if the answer to your question isn't in the list.
Our system can process:
Support for DOCX and XLSX is coming soon. If your desired filetype isn’t in the list, please contact us for our release timeline for that file type.
There is no hardcoded maximum input length. In practice, the maximum possible length is dependent on provisioned hardware and any timeouts set by the user. Internally we’ve tested up to 500K characters (approximately 100K English words).
Please see the concurrency section for recommended concurrency levels on your chosen hardware, but in general a concurrency level of 1 per CPU container and 32 per GPU container works well.
Please see the benchmarks page.
Yes, upon request optimized models can be delivered. Model optimization is free of charge on the Scale and Pro tiers. For this to happen we ask for 5-10 examples for each requested improvement.
No, we do not offer the ability for customers to tune the models themselves. Instead, Private AI can adjust the models with just 5-10 examples of the modification in question.
Additionally, customers can customize behaviour via allow and block lists, which support regex functionality.
Latency and throughput scale sub linearly when more compute resources are allocated. This is best illustrated in the benchmarks section. For this reason for throughput-optimized deployments it is recommended to run with either single logical CPU core or low cost inference GPUs and scale horizontally as required.
Our container is optimized for scale, so: Lots! We have several customers that each process billions of API requests per month. Our largest deployment so far consisted of 400 GPU-equipped container instances.
Private AI is deployed with dozens of customers, ranging from startups up to hospitals and fortune 500 companies. We operate in multiple verticals such as insurance, health tech, ASR providers, cloud contact centres, and financial institutions.
Each release is tested using Private AI’s Continuous Integration (CI) system, which consists of:
- comprehensive unit tests
- integration tests covering API functionality
- soak test for container stability and memory leaks
After a release has passed the CI tests, it is run on Private AI’s demo server for a day and undergoes a final round of manual testing. During this time the container is monitored for any stability issues or memory leaks.
A release is first distributed to a handful of customers, before being rolled out to Private AI’s entire user base.
Private AI’s software is deployed as a single container, that includes all necessary runtime files, including the ML models. This allows the container to run completely air gapped. We recommend using a container orchestration service such as ECS or Kubernetes. Note that GPUs aren’t required.
Yes, however Docker is what we use to build & test the container with internally.
The container can run on any x86 machine (Intel or AMD CPU). No GPU is required, however Nvidia GPUs are supported. Please see the
Container health can be monitored either via the
Updates are released via new versions of the container image. No additional downloads or configuration such as model files are required, as everything is packaged inside the container image. New container versions can be pulled from Azure Container Registry.
Updates are released on a monthly basis, however patch releases are made on an as-needed basis. If you need something urgently, please let us know so that we can do a patch release for you.
Yes, our container phones home for authentication and usage reporting. However this can be disabled on the Pro tier.
Yes, the container can run without external internet access, but only on the Pro tier.
This depends a lot on the implementation and volume, but a typical CPU deployment is on the order of a few hundred USD per month. Large scale deployments (>1B+ processed per month) usually cost ~1K USD per month.