Frequently Asked Questions
Below are answers to some frequently asked questions we get. Please contact us if the answer to your question isn't in the list.
What file formats do your models accept?
Our system can process:
Support for DOCX and XLSX is coming soon. If your desired filetype isn’t in the list, please contact us for our release timeline for that file type.
What’s the maximum length of text we can send your models?
There is no hardcoded maximum input length. In practice, the maximum possible length is dependent on provisioned hardware and any timeouts set by the user. Internally we’ve tested up to 500K characters (approximately 100K English words).
What concurrency level should I use?
Please see the concurrency section for recommended concurrency levels on your chosen hardware, but in general a concurrency level of 1 per CPU container and 32 per GPU container works well.
What sort of throughput and latency should we expect?
Please see the benchmarks page.
Is model tuning available?
Yes, upon request optimized models can be delivered. Model optimization is free of charge on the Scale and Pro tiers. For this to happen we ask for 5-10 examples for each requested improvement.
Can customers tune the models on-prem?
No, we do not offer the ability for customers to tune the models themselves. Instead, Private AI can adjust the models with just 5-10 examples of the modification in question.
Additionally, customers can customize behaviour via allow and block lists, which support regex functionality.
Does latency and throughput scale linearly if we provide more resources?
Latency and throughput scale sub linearly when more compute resources are allocated. This is best illustrated in the benchmarks section. For this reason for throughput-optimized deployments it is recommended to run with either single logical CPU core or low cost inference GPUs and scale horizontally as required.
How much scale can this system handle?
Our container is optimized for scale, so: Lots! We have several customers that each process billions of API requests per month. Our largest deployment so far consisted of 400 GPU-equipped container instances.
Where is Private AI currently deployed?
Private AI is deployed with dozens of customers, ranging from startups up to hospitals and Fortune 500 companies. We operate in multiple verticals such as insurance, health tech, ASR providers, cloud contact centres, and financial institutions.
How well is Private AI’s software tested?
Each release is tested using Private AI’s Continuous Integration (CI) system, which consists of:
- comprehensive unit tests
- integration tests covering API functionality
- soak test for container stability and memory leaks
After a release has passed the CI tests, it is run on Private AI’s demo server for a day and undergoes a final round of manual testing. During this time the container is monitored for any stability issues or memory leaks.
A release is first distributed to a handful of customers, before being rolled out to Private AI’s entire user base.
How do we deploy your models?
Private AI’s software is deployed as a single container, that includes all necessary runtime files, including the ML models. This allows the container to run completely air gapped. We recommend using a container orchestration service such as ECS or Kubernetes. Note that GPUs aren’t required.
Does the container work with other container runtimes?
Yes, however Docker is what we use to build & test the container with internally.
What resources do we need to provision to run your models?
The container can run on any x86 machine (Intel or AMD CPU). No GPU is required, however Nvidia GPUs are supported. Please see the
How should the container status be monitored?
Container health can be monitored either via the healthz route or via the container health check.
How are updates released?
Updates are released via new versions of the container image. No additional downloads or configuration such as model files are required, as everything is packaged inside the container image. New container versions can be pulled from Azure Container Registry.
How often are updates released?
Updates are released on a monthly basis, however patch releases are made on an as-needed basis. If you need something urgently, please let us know so that we can do a patch release for you.
Does your container call home?
Yes, our container phones home for authentication and usage reporting. However this can be disabled on the Pro tier.
Can the container be air gapped?
Yes, the container can run without external internet access, but only on the Pro tier.
What’s the typical compute cost to run your models in our environment?
This depends a lot on the implementation and volume, but a typical CPU deployment is on the order of a few hundred USD per month. Large scale deployments (>1B+ processed per month) usually cost ~1K USD per month.