Frequently Asked Questions

Below are answers to some frequently asked questions we get. Please contact us if the answer to your question isn't in the list.

Software Functionality

What file formats do your models accept?

Check out our Supported File Types page for a list of file formats and additional details.

If your desired filetype isn’t in the list, please contact us for our release timeline for that file type.

What’s the maximum length of text we can send your models?

There is no hardcoded maximum input length. In practice, the maximum possible length is dependent on provisioned hardware and any timeouts set by the user. Internally we’ve tested up to 500K characters (approximately 100K English words).

Performance

What concurrency level should I use?

Please see the concurrency section for recommended concurrency levels on your chosen hardware, but in general a concurrency level of 4 per CPU container and 32 per GPU container works well.

What sort of throughput and latency should we expect?

Please see the benchmarks page.

Is model tuning available?

Yes, upon request optimized models can be delivered. Model optimization is free of charge on the Scale plan. For this to happen we ask for 5-10 examples for each requested improvement.

Can customers tune the models on-prem?

No, we do not offer the ability for customers to tune the models themselves. Instead, Private AI can adjust the models with just 5-10 examples of the modification in question.

Additionally, customers can customize behaviour via allow and block lists, which support regex functionality.

Does latency and throughput scale linearly if we provide more resources?

Latency and throughput scale sub linearly when more compute resources are allocated. This is best illustrated in the benchmarks section. For this reason for throughput-optimized deployments it is recommended to run with either single logical CPU core or low cost inference GPUs and scale horizontally as required.

How much scale can this system handle?

Our container is optimized for scale, so: Lots! We have several customers that each process billions of API requests per month. Our largest deployment so far consisted of 400 GPU-equipped container instances.

Where is Private AI currently deployed?

Private AI is deployed with dozens of customers, ranging from startups up to hospitals and Fortune 500 companies. We operate in multiple verticals such as insurance, health tech, ASR providers, cloud contact centres, and financial institutions.

How well is Private AI’s software tested?

Each release is tested using Private AI’s Continuous Integration (CI) system, which consists of:

comprehensive unit tests
integration tests covering API functionality
soak test for container stability and memory leaks

After a release has passed the CI tests, it is run on Private AI’s demo server for a day and undergoes a final round of manual testing. During this time the container is monitored for any stability issues or memory leaks.

A release is first distributed to a handful of customers, before being rolled out to Private AI’s entire user base.

Installation

How do we deploy your models?

Private AI’s software is deployed as a single container, that includes all necessary runtime files, including the ML models. This allows the container to run completely air gapped. We recommend using a container orchestration service such as ECS or Kubernetes. Note that GPUs aren’t required.

Does the container work with other container runtimes?

Yes, however Docker is what we use to build & test the container with internally.

What resources do we need to provision to run your models?

The container can run on any x86 machine (Intel or AMD CPU). No GPU is required, however Nvidia GPUs are supported. Please see the section for more details.

How should the container status be monitored?

Container health can be monitored either via the healthz route or via the container health check.

How are updates released?

Updates are released via new versions of the container image. No additional downloads or configuration such as model files are required, as everything is packaged inside the container image. New container versions can be pulled from Azure Container Registry.

How often are updates released?

Updates are released on a monthly basis, however patch releases are made on an as-needed basis. If you need something urgently, please let us know so that we can do a patch release for you.

Does your container call home?

Yes, our container phones home for authentication and usage reporting. However this can be disabled on the Scale plan.

What do the authentication and usage reporting messages look like?

Here is a sample of the authentication request sent when the container starts:

Copy

Copied

POST https://apim-auth-prod.azure-api.net:443/license-verification/license_status
headers = 'ocp-apim-subscription-key:privateaikey'
data =
{
  "id": "123",
  "expires_at": "2023-12-31T00:00:00+00:00",
  "quota": 1000,
  "metering_id": "test-customer",
}

Here is a sample of the usage statistics, which are batched and sent every 5 minutes (if there has been any usage activity):

Copy

Copied

POST https://app.amberflo.io:443/ingest/
headers = 'Accept: application/json', 'Content-Type: application/json', 'X-API-KEY: customerkey', 'Content-Encoding: gzip', 'User-Agent: Amberflo.io SDK 3.0.0 Python 3.8.17'
data =
{
  [
    {
      "uniqueId": "03a781eb-9c57-4b4e-8c5e-4655b4c92490",
      "meterApiName": "api-calls",
      "meterValue": 10,
      "customerId": "customer_id",
      "meterTimeInMillis": 1687207778956.456,
      "dimensions": {
        "Accuracy": "high",
        "Synthetic": "false",
        "SessionId": "6d6db44e-24a7-406e-9197-89bba8095e73",
      }
    },
    {
      "uniqueId": "1f1aedcc-a26e-48a8-820a-9511946006ac",
      "meterApiName": "api-chars",
      "meterValue": 567,
      "customerId": "customer_id",
      "meterTimeInMillis": 1687207778956.456,
      "dimensions": {
        "Accuracy": "high",
        "Synthetic": "false",
        "SessionId": "6d6db44e-24a7-406e-9197-89bba8095e73",
      }
    },
    {
      "uniqueId": "ccec78a3-b480-4c82-99df-a07e82d84875",
      "meterApiName": "api-words",
      "meterValue": 123,
      "customerId": "customer_id",
      "meterTimeInMillis": 1687207778956.456,
      "dimensions": {
        "Accuracy": "high",
        "Synthetic": "false",
        "SessionId": "6d6db44e-24a7-406e-9197-89bba8095e73",
      }
    },
  ]
}

Can the container be air gapped?

Yes, the container can run without external internet access, but only on the Scale plan.

What’s the typical compute cost to run your models in our environment?

This depends a lot on the implementation and volume, but a typical CPU deployment is on the order of a few hundred USD per month. Large scale deployments (>1B+ processed per month) usually cost ~1K USD per month.

PrivateGPT

Where is my data stored?

This depends on the flavour of PrivateGPT you use:

PrivateGPT Headless: The container is completely stateless, no data is stored whatsoever or shared with Private AI.
PrivateGPT UI: Chat history and embeddings are stored within your browser and within your company's cloud environment. No data is shared with Private AI.
PrivateGPT UI Demo: chat.private-ai.com is intended for demonstration purposes. Chat history is stored within your browser, but embeddings corresponding to any files you have uploaded to the system are stored in Private AI's cloud environment.

The PrivateGPT UI is built on the Microsoft Azure OpenAI service. Microsoft stores the de-identified prompts for 30 days to monitor for abuse and misuse. For more details please visit Azure OpenAI service privacy page.

How is PrivateGPT installed?

The PrivateGPT Headless or API version can be installed using Private AI's Docker container. Please see the Quickstart Guide for further details.

The PrivateGPT UI version is installed via a Terraform script. Please see Management and Installation for further details.

Can the PrivateGPT UI be reskinned or customized to meet my organization's requirements?

Yes, please contact us for instructions.