Release Notes

Below are the release notes for the Private AI container. To update, please grab a new version of the image.

2.12.0 (2022/07/27)


  • New Inference Pipeline

    This release features the debut of our new inference pipeline. The main feature of the new pipeline is that it is able to operative on non-whitespace separated text. This has a number of benefits, including better performance around punctuation and control characters and enables new languages, such as Mandarin (simplified).

  • Prometheus Endpoint

    A Prometheus metrics endpoint is now available at /metrics. See the API reference for details.

  • New Language Support

    The following languages have been added to Core support:

    • Ukrainian
    • Hindi

    In addition to this, we have added Extended support for the following 5 languages:

    • Estonian
    • Malay
    • Punjabi
    • Tamil
    • Vietnamese

    We have also added Beta support for Mandarin (simplified)

  • Improved Models

    This release features a number of PII detection improvements:

    • German NUMERICAL_PII detection has been improved.
    • Improved performance on medical questionnaires and customer onboarding forms.
    • Multilingual chat performance has been improved, particularly in Spanish.
    • Postal address detection performance has been improved for addresses in the United Kingdom, Australia and New Zealand.
    • PASSWORD and CVV detection performance has been improved.
    • PHI Attributes / symptoms detection has been improved.
    • General improvements for EHRs and ASR transcripts.
  • Security Patch

    Several updates to container image dependencies and Python libraries have been updated to address security recommendations

2.11.1 (2022/06/01)


  • Security Patch

    Several libraries received patch updates to address security recommendations and have been included in this release.

  • Improved Models

    Improvements have been made to the detect instances of medical entities such as CONDITION, INJURY and MEDICAL_MISC.

    Improvements have been made to NUMERICAL_PII, particularly in multilingual models

  • Container Options

    Allow startup resource check to be disabled.

2.11.0 (2022/05/10)

What’s new in 2.11.0?

  • New language support

    Tagalog has been moved from extended to core support. For the full list, please see the supported languages page.

  • New Entity Types

    VEHICLE_ID has been added in this release. This entity type covers vehicle identification numbers such as license plate numbers, vehicle serial and vehicle identification numbers.


  • Model Improvements

    PII detection error has been reduced by approximately 10%, particularly around CREDIT_CARD, CREDIT_CARD_EXPIRATION and CVV. Australian and New Zealand address recognition have also been improved.

    Performance on disfluent ASR transcripts (particularly around passwords), chat logs and medical patient records has been improved.

    CPU model processing speed has increased by approximately 8%, whilst GPU processing speed has been improved by up to 35%, depending on the chosen accuracy mode.

  • Service health monitoring

    The /healthz endpoint is more robust for detecting the overall health of the API service.

  • Improved error messages

    Error messages when either the key or text fields are missing are now more specific.

  • Security updates

    Libraries have been updated based on security recommendations from our regular vulnerability scans.

  • Documentation revamp

    Our public documentation has been updated to include new guides, updated install instructions and sample configurations.

  • Other

    ENABLED_CLASSES can now be set via an environment variable, similar to LOG_LEVEL.

2.10.0 (2022/03/14)

What’s new in 2.10.0?

  • 33 Supported Languages

    Our system can now detect PII in 33 different languages, with more coming soon. For the full list, please see the supported languages page.

  • New Entities

    2.10 includes the following new entities:

    • GENDER_SEXUALITY : Terms indicating gender identity or sexual orientation, including slang terms. E.g.: “female”, “bisexual”, “trans”
    • MARITAL_STATUS : Terms indicating marital status. E.g.: “single”, “common-law”, “ex-wife”, “married”
    • LOCATION_COORDINATE : A subclass of LOCATION. A geographic position referred to using latitude, longitude, and/or elevation coordinates. E.g.: “We’re at: [40.748440 and -73.984559] ”

    The NUMERICAL_PII class now includes MAC addresses and cookie IDs.

    These entities will be listed in the docs shortly.

  • Complete HIPAA Support

    With this release we have complete support for all the entities listed under the HIPAA Safe Harbor rule. Health plan beneficiary numbers and medical record numbers have been added to HEALTHCARE_NUMBER, whilst medical device serial numbers have been added to NUMERICAL_PII.

  • Entity Sets

    enabled_classes now supports entity sets. This way, you can simply include the name of the regulation that you want to comply with, and we will enable the entities that are listed in that regulation for you. The regulations that are implemented in this release are:

    • GPDR
    • CPRA
    • HIPAA
    • Quebec Privacy Act
    • PCI

    Example command:

    curl -X POST localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "Hi Anwar", "key": "<customer key>", "enabled_classes": ["GDPR"]}'

    The docs will be updated to include this functionality and the entities included in each entity set shortly.

  • Docker image version logging

    We are now logging the version of the docker image in our logs. This allows us to provide better customer support based on the version of that is in the logs.


  • Better models

    2.10 features improved PII detection models, particularly around credit card numbers, verification codes, social security numbers, US postal addresses, email addresses in emails and resumes.

  • TIME entity adjustment

    We have adjusted the TIME entity to no longer include ASR transcript timestamps.

  • Better API error messages

    In order to improve error handling and make debugging easier, we have reworked our API error messages to be more detailed and understandable. Error messages (but not potentially sensitive payloads) are now also logged to console.

  • Redaction marker label calculation

    We have improved how the redaction marker that is used in the redacted text is calculated.

  • Resource validation system

    In 2.9 we introduced checks that validate that the container has been provided with enough resources. In this release, we have further expanded and improved these checks to be able to detect memory and GPU resources more accurately.

  • Health check system

    We have improved the health check endpoint in the GPU build to return the health of the GPU inference engine as well.

    Improve the process monitoring inside the GPU build to eliminate the possibility of having dead containers that are still running.

    We have updated the health check route in the CPU build to be completely asynchronous.

  • RAM usage

    The container printed the RAM usage on every API call. This has now been moved to ‘debug’ log level.

  • Docs

    We have added a new page in our documentation title “Deployment Considerations”, which aims to help users on how to deploy the docker image on production environments.

    Other notable changes are:

    • Adding a new page that lists supported languages
    • Update the list of supported entities
  • Web Demo

    We have made a small improvement to the UI of the web demo by changing the model options from a drop down list to radio buttons.

    Web demo now has unique PII markers disabled by default. This change will be reflected in the upcoming API refactor.

2.9.1 (2022/02/24)

  • Logging Improvements

    RAM usage is now logged on debug level instead of info

  • Container Health

    healthz route latency improved

    Docker container health check has been implemented, for improved AWS ECS use

2.9.0 (2022/01/18)

What’s new in 2.9.0?

  • New PII Classes

    Passport numbers are now recognized as a separate entity type, PASSPORT_NUMBER instead of NUMERICAL_PII.

    POLITICAL_AFFILIATION has been added and covers terms referring to a political party, movement, or ideology (e.g., Republican, liberal)

    We now support IPv6 address deidentification as well in addition to IPv4 addresses. Any IPv6 address that is found in the text will be labelled as IP_ADDRESS.

  • Container Startup Resource Validation

    Based on our user feedback, we have implemented a hardware resource validation that runs on container startup. This implementation validates that the container has access to an NVIDIA GPU and/or enough RAM on startup. If the implementation fails to validate these requirements, it prints a helpful and detailed error message (rather than the default “Killed” message printed by Docker) which guides the user on how to solve these resource related issues.

  • Docker Hub Repository

    Starting with release 2.9.0, the container can be pulled from a private Docker Hub repository. Please contact us if you would like to receive the container via this repository, instead of the existing encrypted Docker image export.


  • Model Improvements

    This release includes improved models. Improvements include:

    • Better performance on ASR system transcripts, particularly around disfluencies
    • Improved Driver License detection
    • Better performance on SMS message style conversations
  • Improved Documentation

    We have spent some time improving our documentation as well. The noteworthy improvements are:

    • The table of contents is now more clear and easier to navigate.
    • A new detailed introduction page.
    • Detailed installation instructions.
    • Updated API reference.
    • Updated Web Demo to showcase Multilingual PII Redaction and Synthetic Personal Data Generation in addition to English PII Redaction.
  • Fixes

    We fixed an issue where the built-in labels that use regex patterns would override the custom labels defined in block_list.

    We tuned the models to fix an issue where some non-PII words that are following PII words would be labelled as part of the PII word.

    We have removed a warning message that would show up on container startup due to an internal library incorrectly assessing the ML dependencies.

    Synthetic PII generation now works when the custom block_list feature is used.

2.8.0 (2021/12/20)

What’s new in 2.8.0?

  • New Entity Types

    Added DRIVER_LICENSE entity type. Driver's licenses will now be picked up in this class instead of `NUMERICAL_PII.


  • Improved backup authentication mechanism fail-over logic.
  • Updated API server. This was a dependency and security upgrade.
  • GPU inference server errors now return 500 instead of 503.

Deprecation Notice: We’ve rearranged the plumbing on our authentication system. Releases prior to 2.3.0 will no longer authenticate as of 31st December 2021.

2.7.1 (2021/10/28)

  • Linked Batch Processsing

    This release adds the link_batch option. When enabled, batch inputs will be joined together internally in the Private AI inference engine, to share context between the different inputs. This is useful when processing a sequence of short inputs, such as an SMS chat log. Please visit the API reference for implementation details.

2.7.0 (2021/10/29)

What’s new in 2.7.0?

Breaking change: The default accuracy mode has been changed from standard to high.

  • Added the LOG_LEVEL environment variable, which controls logging verbosity. The environment variable can be set to info , warning or error . Default is info .


  • Model Improvements

    This release features improved PII detection models:

    • Numerical PII detection has been further refined, particularly around SSNs and credit card numbers
    • Further improvements for chat transcripts
    • Further improvements for OCR documents, particularly receipts
    • Further improvements for JSON files
  • Authentication

    The backup authentication mechanism has been moved to a completely new system, improving redundancy

  • Usage Reporting

    The get_usage route now returns the current month's usage, instead of current week.

2.6.1 (2021/09/27)

  • Improved Models

    This release fixes phone numbers and credit card numbers occasionally being detected as SSNs. Additionally, performance around ASR transcripts and the various ways they transcribe numbers was improved

2.6.0 (2021/09/21)


  • Improved Models

    This release features improved PII detection models, particularly surrounding English and Portuguese.

    Optimizations for a number of popular ASR systems have been added in this release. In particular, the optimizations cover how the systems transcribe numbers.

2.5.0 (2021/08/20)

What’s new in 2.5.0?

  • New Entity Types

    The DATE class has been split into DATE and DATE_INTERVAL. DATE_INTERVAL covers broader references such as 'last summer', whilst DATE remains targeted as specific references like '21/8/2019'

  • Batch Processing

    Support for batch processing has been added. To use batch processing, simply submit a list of text strings:

    curl -X POST http://localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": ["My password is: 4XDX63F8O1", "My password is: 33LMVLLDHNasdfsda"], "key": <key>}'


  • Multilingual Improvements

    This release features improved PII detection models, particularly surrounding English, Italian and Korean.

  • Image Size

    Container image size has been further reduced.

2.4.0 (2021/07/21)

What’s new in 2.4.0?

  • Custom Redaction Markers

    Added support for custom redaction markers.

  • Allow Lists

    Added support for allow lists - any entities matching entries in the allow list will be discarded.

  • New Entity Types

    Added new location classes:

    • LOCATION_ADDRESS : A street address, e.g. '48 Bristol Ave, 6157, Perth, Australia'
    • LOCATION_CITY : A city, e.g. 'Perth' or 'Toronto'
    • LOCATION_COUNTRY : A country, e.g. 'Spain'
    • LOCATION_ZIP : A zip or postal code, e.g. '10405'
    • LOCATION_STATE : A reference to a state within a country, e.g. 'California'

    NOTE: These entities are subclasses of LOCATION - the LOCATION label remains unchanged and will appear along with the above entities


  • Model Improvements

    This release features improved PII detection and synthetic PII generation models, particularly surrounding Spanish, Italian and Korean.

  • Phone Number Improvements

    Improved phone number post-processing, particularly around bracket handling and '+' in international dialling codes

  • Best Label Calculation

    Improved automatic calculation of the number of processing threads to use whilst executing the ML models.

2.3.1 (2021/07/16)

  • CPU Performance Improvement

    Patch release to address CPU utilisation

2.3.0 (2021/06/25)

What’s new in 2.3.0?

  • New Languages

    Added support for Korean

  • New Entity Types

    Added ROUTING_NUMBER, which is a number associated with a bank or financial institution (e.g., 012345678).

    Added BANK_ACCOUNT, which is a bank account or bank card number (e.g., 012345-67).


  • Improved Models

    This release features improved PII detection models, trained on ~50% more data than 2.2.0.

    We have improved PHI detection performance. More to come in the next release.

  • Authentication

    This release now authenticates with our revamped authentication system. No changes on the user side are required.

2.2.2 (2021/06/03)

  • New Accuracy Mode

    Added a new accuracy mode that is approximately 4x faster than standard. In order to use this model, please set accuracy_mode to fast.

2.2.1 (2021/05)

  • Improved Models

    Improved SSN detection in ASR transcripts

    Improved PHI detection

2.2.0 (2021/04/29)

What’s new in 2.2.0?

  • Multilingual Support

    This release adds support for Spanish, French, Italian, German and Portuguese. To enable it, please see the API Reference for details

  • Synthetic PII Generation

    Beta release of synthetic PII generation. In addition to identifying and redacting PII, Private AI can now also generate synthetic PII. To try it out, please set fake_entity_accuracy_mode to standard:

    $ curl -X POST http://localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "so, it expires the 1st; and the 3 digits on the back", "fake_entity_accuracy_mode": "standard", "key": <key>}'
    "result": "so, it expires the [CREDIT_CARD_EXPIRATION_1]; and the 3 digits on the back",
    "result_fake": "so, it expires the 20th; and the 3 digits on the back",
    "pii": [
        "marker": "CREDIT_CARD_EXPIRATION_1",
        "text": "21st",
        "best_label": "CREDIT_CARD_EXPIRATION",
        "stt_idx": 19,
        "end_idx": 23,
        "labels": {"CREDIT_CARD_EXPIRATION": 0.8895},
        "fake_text": "20th",
       "fake_stt_idx": 19,
       "fake_end_idx": 23
    "api_calls_used": 1,
    "output_checks_passed": true


  • Customizable API Port

    API port can now be customized. See the Environment Variables section for details.

    Health check port is now on port 8080, same as the main deidentify_text route

  • Revamped API Serving

    The API serving infrastructure has been completely rebuilt

    Shortened authentication request timeout

2.1.3 (2021/04/13)

  • Improved Models

    Improved credit card handling in ASR transcripts

2.1.2 (2021/03/02)

  • Added ZODIAC_SIGN , which covers Zodiac Signs such as "Aries" or "Taurus".
  • This release features improved PII detection, particularly surrounding SSN , DOB and NUMERICAL_PII .
  • Added passport numbers, vehicle license plate numbers and vehicle serial numbers to NUMERICAL_PII .
  • Passport numbers and vehicle serial numbers are now recognised as NUMERICAL_PII .

2.1.1 (2021/02/26)

  • Improved Models

    Further PII detection improvements targeted at numerical entity detection.

2.1.0 (2021/02/18)


  • Improved Models

    This release improves PII detection accuracy, via model updates and improved training data.

    Additionally an improvement was made in an edge case where model output is highly ambiguous.

2.0.1 (2021/01/25)

  • Improved Models

    Improved PII detection models.

  • Reduced Image Size

    Further reduced Docker image size.

2.0.0 (2021/01/14)

What’s new in 2.0.0?

  • Revamped API

    The 2.0.0 release features a revamped API interface, based on recent customer feedback

  • New Entity Types

    New entity types:

    • FILENAME : Name of a computer file, e.g., bradtaxreturns.txt, koalabear.jpg
    • ORIGIN : Origin encompasses nationalities, ethnicities, and races. E.g., Canadian, american, caucasian

    Added PHI entity types:

    • BLOOD_TYPE : Blood type, e.g., O-
    • CONDITION : A medical condition. Includes diseases, syndromes, deficits, disorders. E.g., chronic fatigue syndrome, arrhythmia, depression.
    • DRUG : Medical drug, including vitamins and minerals. E.g., Advil, Acetaminophen, Panadol
    • INJURY : Human injury, e.g., I broke my arm, I have a sprained wrist. Includes mutations, miscarriages and dislocations.
    • MEDICAL_PROCESS : Medical process, including treatments, procedures and tests. E.g., ‘heart surgery’, ‘CT scan’.
    • PHYSICAL_ATTRIBUTE : A body attribute, e.g. I’m 190cm tall.
    • STATISTICS : How many people in a specific country have the disease or what percentage of people were cured of a disease, for example. E.g., 20 percent of people have arrythmia


  • New Inference Engine

    New inference engine, which is significantly faster than previous releases

  • Reduced Container Image Size

    Docker image size has been drastically reduced

1.5.1 (2020/12/08)

  • Improved Models

    Improved credit card number and SSN detection in chat logs.

1.5.0 (2020/11/19)

What’s new in 1.5.0?

  • New Accuracy Mode

    The previous standard accuracy model is now fast. In it’s place, we have introduced a new model ~2x slower but with far better performance.


  • Improved Models

    Improved model accuracy via additional training data.

  • Runtime Performance Improvements

    Reduced latency by ~15% on fast mode. 60ms to 52ms on our single core GCP N2 Cascade Lake test instance.

    Dramatically reduced RAM usage for all models.

    Reduced Docker image size.

1.4.2 (2020/11/6)

  • Phone Number Improvements

    Improved support for 7 digit phone numbers

1.4.1 (2020/11/4)

  • SSN Improvements

    Improved SSN detection

1.4.0 (2020/10/23)

What’s new in 1.4.0?

  • New Entity Types

    Added DOB entity type, which covers Date of Birth (e.g., Date of Birth: March 7, 1961)

    Added CVV, which covers credit card verification codes (e.g., CVV: 080)

    Added CREDIT_CARD_EXPIRATION, which is the expiration date of a credit card (e.g., Expires: 2/28)

    Added PASSWORD entity type, which covers account passwords, pins, access keys, or verification answers (e.g., 27%alfalfa, 1234)


  • Improved Models

    Adjusted entity types to give better per class accuracy.

    Improved SSN and credit card detection.

  • Health Route

    Added last_auth_call_successful into healthz response.

1.3.2 (2020/10/12)

  • Authentication

    Added backup authentication mechanism.

1.3.1 (2020/10/05)

  • Large Input Handling

    Improved handling of ultra large inputs (>100K words).

1.3.0 (2020/09/25)

What’s new in 1.3.0?

  • New Entity Types

    Added USERNAME entity type, User name or handle (e.g., privateairocks, @_PrivateAI).

    Added RELIGION entity type, which covers terms indicating religious affiliation (e.g., Hindu).

1.2.0 (2020/08/14)

What’s new in 1.2.0?

  • New Entity Types

    Added AGE entity type, which is a number or phrase associated with an age (e.g., 27)

  • New Accuracy Mode

    Added best accuracy mode. To use it, please set accuracy_mode to best.

1.1.0 (2020/07/05)

What’s new in 1.1.0?

  • Credit Card Number Support

    Added support for credit card numbers

1.0.0 (2020/06/15)

Initial container release

For release notes older than 1.0.0, please contact us.

© Copyright 2022, Private AI.