Release Notes

Below are the release notes for the Private AI container. To update, please grab a new version of the image.

2.14.1 (2022/11/30)

  • Improved PCI detection in French and Spanish
  • / , \ and $ characters are no longer stripped from entities. For example, Visit us at facebook.com/user123/ is now redacted as Visit us at [URL_1] instead of Visit us at [URL_1]/
  • Tuned RAM check thresholds for machines with 8GB RAM
  • Language Support: Added Extended Support for Japanese

2.14.0 (2022/11/11)

What’s new in 2.14.0?

  • New Language Support

    The following languages have been added to Extended support:

    • Luxembourgish
    • Swahili
  • Entity Types
    • NAME_GIVEN , which encompasses name(s) given to an individual, usually at birth, often first/middle names in Western cultures, middle/last names in Eastern cultures.
    • NAME_FAMILY , which encompasses names indicating a person’s family or community, often a last name in Western cultures, first name in Eastern cultures.
    • MEDICAL_MISC entity type has been deleted.

Improvements

  • Improved Models
    • Improved detection of names spelled out in all caps by ASR systems.
    • NAME : Improved name subclass detection / classification in English.
    • EMAIL_ADDRESS : More robustness around partial / unformatted emails in English.
    • CREDIT_CARD : improvement around mentions of the last 4 digits only in English.
    • Enhanced detection of NAMEs and other entities when spelled-out in a transcript (e.g., “c as in charlie …”)
    • Improvements to detection of PASSWORD, including verification answers
    • Improved handling of eponymous medical conditions in English.
    • Improvements to PHI detection in English.
    • Improvements to PHI detection in Spanish.
    • Improvements to all personal number classes such as PASSPORT , CREDIT_CARD and SSN including international variants in French, German, Italian, Tagalog and Ukrainian.
    • Improved PII detection in text containing facerolls and typos.
    • Improvements to PII detection in Tagalog data containing profanities / toxic material.
    • Improved detection of ambiguous LOCATION / ORGANIZATION mentions, as well as ambiguous NAMEs
    • Improved PII detection in text containing control characters
    • General improvements to:
      • Russian
      • Spanish
  • Miscellaneous
    • Container startup memory check is now performed on container start, instead of after loading models
    • Fixed handling of null strings

3.0.0 beta 2 (2022/11/09)

We are proud to announce the first beta release for the 3rd major version of Private AI's solution. Note that 3.0 does not maintain backwards compatibility. Instead, Private AI will continue to do 2.X releases with updated models and potential security fixes until 3 months after the 3.0 final release.

Please don't use this release in production. As this is a beta release, there will still be breaking changes.

What’s new in 3.0.0 beta 2?

  • Container Distribution

    Starting with 3.0.0, we will be distributing our container exclusively through the Azure Container Registry. For those of you using Dockerhub, you are simply replacing your old docker login with a new one. For example:

    docker login -u INSERT_UNIQUE_CLIENT_ID -p INSERT_UNIQUE_CLIENT_PW crprivateaiprod.azurecr.io
  • Licensing Change

    We have changed our licensing system from an API Key to a license file. For the full 3.0 release, you will only need a license file but for beta2, you will still be required to use the API key with your license file.

    In order to run the container with the license file, run the following:

    docker run --rm -v "full path to license.json":/app/license/license.json -p 8080:8080 -it crprivateaiprod.azurecr.io/deid:<version>

    Once you have the container up and running with the new license file, you can run send the container a request like this:

    curl --request POST --url http://localhost:8080/v3/process_text   --header 'Content-Type: application/json'   --header 'x-api-key: <key>'   --data '{"text": ["Hello John"]}'

    Note that for the 3.0 beta, you must still supply your API key in the POST request. This will be removed in the 3.0 final release.

  • New API Interface

    3.0.0 introduces many changes to the API. The full spec is posted here

    Key changes:

    • The endpoint is now called /v3/process_text – ( deidentify_text is no longer supported)
    • text field is required to be a list by default, even with a single string
    • key field has been removed from the body and is now a required request header: X-API-KEY
    • accuracy_mode is now called accuracy and can be found one layer down in the entity_detection dictionary settings
    • return_entities parameter allows you to configure whether to include identified entities in the response
    • Entity is established in nomenclature to recognize PII, PHI, PCI

    Example conversions from 2.0 request payload to 3.0:

###
Example with enabled_classes
###
2.0:
{"text": "Hello there John!", 
  "key":<My_api_key>, 
  "accuracy_mode":"high", 
  "enabled_classes":["NAME"]
}

3.0:
**headers= "X-API-KEY":<My_api_key>

{"text": ["Hello there John!"], 
  "entity_detection":
    {"accuracy": "high", 
      "entity_types": [{
        "type": "enable", 
        "value":["NAME"]
      }]
    }
  }
}

-----------------------------------------------------------------------------------------
###
Example with inclusion of all entity types in entity marker
###
2.0:
{"text": "Hello there Pieter!", 
"key":<My_api_key>,
"accuracy_mode":"standard",
"marker_format": "[ALL_CLASS_NAMES]"
}

3.0:
**headers= "X-API-KEY":<My_api_key>

{"text": ["Hello there Pieter!"], 
  "entity_detection":
    {"accuracy": "standard"},
  "processed_text":
    {"type": "MARKER",
    "pattern": "[ALL_ENTITY_TYPES]"}
  }
}

-----------------------------------------------------------------------------------------
###
Example with disabling unique_pii_markers through MARKER definition 
###
2.0:
{"text": "Hello there Paul!", 
"key":<My_api_key>,
"accuracy_mode":"high_multilingual",
"unique_pii_markers": false
}

3.0:
**headers= "X-API-KEY":<My_api_key>

{"text": ["Hello there Paul!"], 
  "entity_detection":
    {"accuracy": "high_multilingual"},
  "processed_text":
    {"type": "MARKER",
    "pattern": "[ENTITY_TYPE]"} #Note that there is no longer a 'unique_pii_markers' parameter, and this is set by using the "ENTITY_TYPE" pattern
  }
}
  • File Support for PDFs / Images

    We now support file redaction using the new endpoint: /v3/process_file

    • Passing in a PDF or JPEG / JPG file to this endpoint will return a link to the format preserved redacted file
    • Further details available here

    Here's an example of how to startup the container and process a file:

    To run the container:

    docker run --rm -p 8080:8080 -v <path to license>:/app/license/license.json \
    -e PAI_OUTPUT_FILE_DIR=<path to files for processing> \
    -v "<path to files for processing>":<path to files for processing> \
    -it deid:<version>

    For example, if your license file is in your home directory, the input directory you wish to mount is called inputfiles and the output directory is inputfiles/output:

    docker run --rm -p 8080:8080 -v /home/<username>/license.json:/app/license/license.json \
    -e PAI_OUTPUT_FILE_DIR=/home/<username>/inputfiles/output \
    -v /home/<username>/inputfiles:/home/<username>/inputfiles \
    -it crprivateaiprod.azurecr.io/deid:3.0.0beta2-full_cpu

    Note that the output directory must reside within the input directory. Also ensure that the full path to the directory specified exists and has read / write permissions.

    You can then make a request like this, for any file located within the input volume:

    curl --request POST --url http://localhost:8080/v3/process_file --header 'Content-Type: application/json' --header 'x-api-key: <key>'   --data '{"uri": "/home/<username>/inputfiles/testing.pdf"}'

    The API request is synchronous, and will return once the redacted file is written to the output directory. Also note that for the 3.0 beta, you must still supply your API key in the POST request. This will be removed in the 3.0 final release.

  • Application version endpoint

    Sending a GET request to the container root endpoint http://container-address:8080 will return a response providing information about the application version:

    {"app_version": "3.0.0"}
  • Language Support

    The following languages have been added to Extended support:

    • Luxembourgish
    • Swahili
  • Synthetic Entity Generation

    Synthetic entity generation now is now supported across each language Private AI supports.

    Quality of generated entities has been improved, particularly around matching the formatting and length of the original entity.

  • Environment Variables

    All previous environment variables are now prefixed with “PAI” to better differentiate PAI specific variables:

    • PAI_LOG_LEVEL
    • PAI_PORT
    • PAI_ACCURACY_MODE
    • PAI_SYNTHETIC_PII_ACCURACY_MODES
    • PAI_ALLOW_LIST
    • PAI_MARKER_FORMAT
    • PAI_NUM_THREADS
    • PAI_LOG_PII_STATS
    • PAI_LOG_DEIDENTIFIED_OUTPUT

Improvements

  • Model Improvements
    • Improved detection of names spelled out in all caps by ASR systems.
    • NAME : Improved name subclass detection / classification in English.
    • EMAIL_ADDRESS : More robustness around partial / unformatted emails in English.
    • CREDIT_CARD : improvement around mentions of the last 4 digits only in English.
    • MEDICAL_MISC entity type has been deleted.
    • Enhanced detection of NAMEs and other entities when spelled-out in a transcript (e.g., “c as in charlie …”)
    • Improvements to detection of PASSWORD, including verification answers
    • Improved handling of eponymous medical conditions in English.
    • Improvements to PHI detection in English.
    • Improvements to PHI detection in Spanish.
    • Improvements to all personal number classes such as PASSPORT , CREDIT_CARD and SSN including international variants in French, German, Italian, Tagalog and Ukrainian.
    • Improved PII detection in text containing facerolls and typos.
    • Improvements to PII detection in Tagalog data containing profanities / toxic material.
    • Improved detection of ambiguous LOCATION / ORGANIZATION mentions, as well as ambiguous NAMEs
    • Improved PII detection in text containing control characters
    • General improvements to:
      • Russian
      • Spanish

2.13.1 (2022/09/26)

  • Emoji Improvements

    Processing of non-English text containing emojis has been improved

2.13.0 (2022/09/08)

What’s new in 2.13.0?

  • Second Generation Synthetic PII

    This release features the debut of our second generation synthetic PII system. The system has been rebuilt from the ground up and leverages a new approach developed by Private AI. The new system features the following improvements:

    • Increased PII realism, including greater variety of generated terms and less generation of common terms such as "John" or "Paul".
    • Better generation of numerical PII, particularly around the correct number of digits.

    Note that the CPU containers are now approximately 700MB larger due to this change and that the new synthetic PII system is slower than the first generation. Private AI will be releasing optimizations for both container size and processing time in subsequent releases, along with GPU support.

  • New Language Support

    The following languages have been added to Extended support:

    • Belarusian
    • Icelandic
    • Indonesian
    • Khmer
    • Thai

    We have also added Beta support for Japanese.

  • New Entity Types
    • NAME_GIVEN , which encompasses name(s) given to an individual, usually at birth, often first/middle names in Western cultures, middle/last names in Eastern cultures.
    • NAME_FAMILY , which encompasses names indicating a person’s family or community, often a last name in Western cultures, first name in Eastern cultures.
  • Disable GPU
    • PAI_DISABLE_GPU_CHECK allows users to disable the startup check for GPU on the container and run the GPU container using CPU only.

Improvements

  • Best Label Calculation

    The best label calculation has been updated to prefer the most granular entity type. For example, Hello John will become Hello [NAME_GIVEN] instead of Hello [NAME]. Similarly, I live in Toronto will be I live in [LOCATION_CITY] instead of I live in [LOCATION]. When an entity spans multiple words that have additional, nested labels, the existing behaviour is retained: namely, the most general entity type, covering the entire span, is used. For example, Hello John Doe will be Hello [NAME] and I live in Toronto, Canada will be I live in [LOCATION].

  • Improved Models

    This release features a number of PII detection improvements:

    • Further improvements to the character-level recognition that was introduced in 2.12.
    • False Positive reduction for CONDITION , DRUG , MEDICAL_PROCESS in English.
    • CREDIT_CARD , PHONE_NUMBER , EMAIL_ADDRESS , BANK_ACCOUNT , PASSPORT_NUMBER , SSN improvements in Spanish.
    • CONDITION , DRUG , MEDICAL_PROCESS in Spanish.
    • NAME , LOCATION , ORGANIZATION , POLITICAL_AFFILIATION improvements in German, French, Italian and Polish.
    • Improved performance across all entity types in Tagalog.
  • Miscellaneous

    Improved log messages on container startup.

2.12.0 (2022/07/27)

What’s new in 2.12.0?

  • New Inference Pipeline

    This release features the debut of our new inference pipeline. The main feature of the new pipeline is that it is able to operate on non-whitespace separated text. This has a number of benefits, including better performance around punctuation and control characters and enables new languages, such as Mandarin (simplified).

  • Prometheus Endpoint

    A Prometheus metrics endpoint is now available at /metrics. See the API reference for details.

  • New Language Support

    The following languages have been added to Core support:

    • Ukrainian
    • Hindi

    In addition to this, we have added Extended support for the following 5 languages:

    • Estonian
    • Malay
    • Punjabi
    • Tamil
    • Vietnamese

    We have also added Beta support for Mandarin (simplified)

Improvements

  • Improved Models

    This release features a number of PII detection improvements:

    • German NUMERICAL_PII detection has been improved.
    • Improved performance on medical questionnaires and customer onboarding forms.
    • Multilingual chat performance has been improved, particularly in Spanish.
    • Postal address detection performance has been improved for addresses in the United Kingdom, Australia and New Zealand.
    • PASSWORD and CVV detection performance has been improved.
    • PHI Attributes / symptoms detection has been improved.
    • General improvements for EHRs and ASR transcripts.
  • Security Patch

    Several updates to container image dependencies and Python libraries have been updated to address security recommendations

2.11.1 (2022/06/01)

Improvements

  • Security Patch

    Several libraries received patch updates to address security recommendations and have been included in this release.

  • Improved Models

    Improvements have been made to the detect instances of medical entities such as CONDITION, INJURY and MEDICAL_MISC.

    Improvements have been made to NUMERICAL_PII, particularly in multilingual models

  • Container Options

    Allow startup resource check to be disabled.

2.11.0 (2022/05/10)

What’s new in 2.11.0?

  • New language support

    Tagalog has been moved from extended to core support. For the full list, please see the supported languages page.

  • New Entity Types

    VEHICLE_ID has been added in this release. This entity type covers vehicle identification numbers such as license plate numbers, vehicle serial and vehicle identification numbers.

Improvements

  • Model Improvements

    PII detection error has been reduced by approximately 10%, particularly around CREDIT_CARD, CREDIT_CARD_EXPIRATION and CVV. Australian and New Zealand address recognition have also been improved.

    Performance on disfluent ASR transcripts (particularly around passwords), chat logs and medical patient records has been improved.

    CPU model processing speed has increased by approximately 8%, whilst GPU processing speed has been improved by up to 35%, depending on the chosen accuracy mode.

  • Service health monitoring

    The /healthz endpoint is more robust for detecting the overall health of the API service.

  • Improved error messages

    Error messages when either the key or text fields are missing are now more specific.

  • Security updates

    Libraries have been updated based on security recommendations from our regular vulnerability scans.

  • Documentation revamp

    Our public documentation has been updated to include new guides, updated install instructions and sample configurations.

  • Other

    ENABLED_CLASSES can now be set via an environment variable, similar to LOG_LEVEL.

2.10.0 (2022/03/14)

What’s new in 2.10.0?

  • 33 Supported Languages

    Our system can now detect PII in 33 different languages, with more coming soon. For the full list, please see the supported languages page.

  • New Entities

    2.10 includes the following new entities:

    • GENDER_SEXUALITY : Terms indicating gender identity or sexual orientation, including slang terms. E.g.: “female”, “bisexual”, “trans”
    • MARITAL_STATUS : Terms indicating marital status. E.g.: “single”, “common-law”, “ex-wife”, “married”
    • LOCATION_COORDINATE : A subclass of LOCATION. A geographic position referred to using latitude, longitude, and/or elevation coordinates. E.g.: “We’re at: [40.748440 and -73.984559] ”

    The NUMERICAL_PII class now includes MAC addresses and cookie IDs.

    These entities will be listed in the docs shortly.

  • Complete HIPAA Support

    With this release we have complete support for all the entities listed under the HIPAA Safe Harbor rule. Health plan beneficiary numbers and medical record numbers have been added to HEALTHCARE_NUMBER, whilst medical device serial numbers have been added to NUMERICAL_PII.

  • Entity Sets

    enabled_classes now supports entity sets. This way, you can simply include the name of the regulation that you want to comply with, and we will enable the entities that are listed in that regulation for you. The regulations that are implemented in this release are:

    • GPDR
    • CPRA
    • HIPAA
    • Quebec Privacy Act
    • PCI

    Example command:

    curl -X POST localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "Hi Anwar", "key": "<customer key>", "enabled_classes": ["GDPR"]}'

    The docs will be updated to include this functionality and the entities included in each entity set shortly.

  • Docker image version logging

    We are now logging the version of the docker image in our logs. This allows us to provide better customer support based on the version of that is in the logs.

Improvements

  • Better models

    2.10 features improved PII detection models, particularly around credit card numbers, verification codes, social security numbers, US postal addresses, email addresses in emails and resumes.

  • TIME entity adjustment

    We have adjusted the TIME entity to no longer include ASR transcript timestamps.

  • Better API error messages

    In order to improve error handling and make debugging easier, we have reworked our API error messages to be more detailed and understandable. Error messages (but not potentially sensitive payloads) are now also logged to console.

  • Redaction marker label calculation

    We have improved how the redaction marker that is used in the redacted text is calculated.

  • Resource validation system

    In 2.9 we introduced checks that validate that the container has been provided with enough resources. In this release, we have further expanded and improved these checks to be able to detect memory and GPU resources more accurately.

  • Health check system

    We have improved the health check endpoint in the GPU build to return the health of the GPU inference engine as well.

    Improve the process monitoring inside the GPU build to eliminate the possibility of having dead containers that are still running.

    We have updated the health check route in the CPU build to be completely asynchronous.

  • RAM usage

    The container printed the RAM usage on every API call. This has now been moved to ‘debug’ log level.

  • Docs

    We have added a new page in our documentation title “Deployment Considerations”, which aims to help users on how to deploy the docker image on production environments.

    Other notable changes are:

    • Adding a new page that lists supported languages
    • Update the list of supported entities
  • Web Demo

    We have made a small improvement to the UI of the web demo by changing the model options from a drop down list to radio buttons.

    Web demo now has unique PII markers disabled by default. This change will be reflected in the upcoming API refactor.

2.9.1 (2022/02/24)

  • Logging Improvements

    RAM usage is now logged on debug level instead of info

  • Container Health

    healthz route latency improved

    Docker container health check has been implemented, for improved AWS ECS use

2.9.0 (2022/01/18)

What’s new in 2.9.0?

  • New PII Classes

    Passport numbers are now recognized as a separate entity type, PASSPORT_NUMBER instead of NUMERICAL_PII.

    POLITICAL_AFFILIATION has been added and covers terms referring to a political party, movement, or ideology (e.g., Republican, liberal)

    We now support IPv6 address deidentification as well in addition to IPv4 addresses. Any IPv6 address that is found in the text will be labelled as IP_ADDRESS.

  • Container Startup Resource Validation

    Based on our user feedback, we have implemented a hardware resource validation that runs on container startup. This implementation validates that the container has access to an NVIDIA GPU and/or enough RAM on startup. If the implementation fails to validate these requirements, it prints a helpful and detailed error message (rather than the default “Killed” message printed by Docker) which guides the user on how to solve these resource related issues.

  • Docker Hub Repository

    Starting with release 2.9.0, the container can be pulled from a private Docker Hub repository. Please contact us if you would like to receive the container via this repository, instead of the existing encrypted Docker image export.

Improvements

  • Model Improvements

    This release includes improved models. Improvements include:

    • Better performance on ASR system transcripts, particularly around disfluencies
    • Improved Driver License detection
    • Better performance on SMS message style conversations
  • Improved Documentation

    We have spent some time improving our documentation as well. The noteworthy improvements are:

    • The table of contents is now more clear and easier to navigate.
    • A new detailed introduction page.
    • Detailed installation instructions.
    • Updated API reference.
    • Updated Web Demo to showcase Multilingual PII Redaction and Synthetic Personal Data Generation in addition to English PII Redaction.
  • Fixes

    We fixed an issue where the built-in labels that use regex patterns would override the custom labels defined in block_list.

    We tuned the models to fix an issue where some non-PII words that are following PII words would be labelled as part of the PII word.

    We have removed a warning message that would show up on container startup due to an internal library incorrectly assessing the ML dependencies.

    Synthetic PII generation now works when the custom block_list feature is used.

2.8.0 (2021/12/20)

What’s new in 2.8.0?

  • New Entity Types

    Added DRIVER_LICENSE entity type. Driver's licenses will now be picked up in this class instead of `NUMERICAL_PII.

Improvements

  • Improved backup authentication mechanism fail-over logic.
  • Updated API server. This was a dependency and security upgrade.
  • GPU inference server errors now return 500 instead of 503.

Deprecation Notice: We’ve rearranged the plumbing on our authentication system. Releases prior to 2.3.0 will no longer authenticate as of 31st December 2021.

2.7.1 (2021/10/28)

  • Linked Batch Processsing

    This release adds the link_batch option. When enabled, batch inputs will be joined together internally in the Private AI inference engine, to share context between the different inputs. This is useful when processing a sequence of short inputs, such as an SMS chat log. Please visit the API reference for implementation details.

2.7.0 (2021/10/29)

What’s new in 2.7.0?

Breaking change: The default accuracy mode has been changed from standard to high.

  • Added the LOG_LEVEL environment variable, which controls logging verbosity. The environment variable can be set to info , warning or error . Default is info .

Improvements

  • Model Improvements

    This release features improved PII detection models:

    • Numerical PII detection has been further refined, particularly around SSNs and credit card numbers
    • Further improvements for chat transcripts
    • Further improvements for OCR documents, particularly receipts
    • Further improvements for JSON files
  • Authentication

    The backup authentication mechanism has been moved to a completely new system, improving redundancy

  • Usage Reporting

    The get_usage route now returns the current month's usage, instead of current week.

2.6.1 (2021/09/27)

  • Improved Models

    This release fixes phone numbers and credit card numbers occasionally being detected as SSNs. Additionally, performance around ASR transcripts and the various ways they transcribe numbers was improved

2.6.0 (2021/09/21)

Improvements

  • Improved Models

    This release features improved PII detection models, particularly surrounding English and Portuguese.

    Optimizations for a number of popular ASR systems have been added in this release. In particular, the optimizations cover how the systems transcribe numbers.

2.5.0 (2021/08/20)

What’s new in 2.5.0?

  • New Entity Types

    The DATE class has been split into DATE and DATE_INTERVAL. DATE_INTERVAL covers broader references such as 'last summer', whilst DATE remains targeted as specific references like '21/8/2019'

  • Batch Processing

    Support for batch processing has been added. To use batch processing, simply submit a list of text strings:

    curl -X POST http://localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": ["My password is: 4XDX63F8O1", "My password is: 33LMVLLDHNasdfsda"], "key": <key>}'

Improvements

  • Multilingual Improvements

    This release features improved PII detection models, particularly surrounding English, Italian and Korean.

  • Image Size

    Container image size has been further reduced.

2.4.0 (2021/07/21)

What’s new in 2.4.0?

  • Custom Redaction Markers

    Added support for custom redaction markers.

  • Allow Lists

    Added support for allow lists - any entities matching entries in the allow list will be discarded.

  • New Entity Types

    Added new location classes:

    • LOCATION_ADDRESS : A street address, e.g. '48 Bristol Ave, 6157, Perth, Australia'
    • LOCATION_CITY : A city, e.g. 'Perth' or 'Toronto'
    • LOCATION_COUNTRY : A country, e.g. 'Spain'
    • LOCATION_ZIP : A zip or postal code, e.g. '10405'
    • LOCATION_STATE : A reference to a state within a country, e.g. 'California'

    NOTE: These entities are subclasses of LOCATION - the LOCATION label remains unchanged and will appear along with the above entities

Improvements

  • Model Improvements

    This release features improved PII detection and synthetic PII generation models, particularly surrounding Spanish, Italian and Korean.

  • Phone Number Improvements

    Improved phone number post-processing, particularly around bracket handling and '+' in international dialling codes

  • Best Label Calculation

    Improved automatic calculation of the number of processing threads to use whilst executing the ML models.

2.3.1 (2021/07/16)

  • CPU Performance Improvement

    Patch release to address CPU utilisation

2.3.0 (2021/06/25)

What’s new in 2.3.0?

  • New Languages

    Added support for Korean

  • New Entity Types

    Added ROUTING_NUMBER, which is a number associated with a bank or financial institution (e.g., 012345678).

    Added BANK_ACCOUNT, which is a bank account or bank card number (e.g., 012345-67).

Improvements

  • Improved Models

    This release features improved PII detection models, trained on ~50% more data than 2.2.0.

    We have improved PHI detection performance. More to come in the next release.

  • Authentication

    This release now authenticates with our revamped authentication system. No changes on the user side are required.

2.2.2 (2021/06/03)

  • New Accuracy Mode

    Added a new accuracy mode that is approximately 4x faster than standard. In order to use this model, please set accuracy_mode to fast.

2.2.1 (2021/05)

  • Improved Models

    Improved SSN detection in ASR transcripts

    Improved PHI detection

2.2.0 (2021/04/29)

What’s new in 2.2.0?

  • Multilingual Support

    This release adds support for Spanish, French, Italian, German and Portuguese. To enable it, please see the API Reference for details

  • Synthetic PII Generation

    Beta release of synthetic PII generation. In addition to identifying and redacting PII, Private AI can now also generate synthetic PII. To try it out, please set fake_entity_accuracy_mode to standard:

    $ curl -X POST http://localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "so, it expires the 1st; and the 3 digits on the back", "fake_entity_accuracy_mode": "standard", "key": <key>}'
    {
    "result": "so, it expires the [CREDIT_CARD_EXPIRATION_1]; and the 3 digits on the back",
    "result_fake": "so, it expires the 20th; and the 3 digits on the back",
    "pii": [
      {
        "marker": "CREDIT_CARD_EXPIRATION_1",
        "text": "21st",
        "best_label": "CREDIT_CARD_EXPIRATION",
        "stt_idx": 19,
        "end_idx": 23,
        "labels": {"CREDIT_CARD_EXPIRATION": 0.8895},
        "fake_text": "20th",
       "fake_stt_idx": 19,
       "fake_end_idx": 23
      },
    ],
    "api_calls_used": 1,
    "output_checks_passed": true
    }

Improvements

  • Customizable API Port

    API port can now be customized. See the Environment Variables section for details.

    Health check port is now on port 8080, same as the main deidentify_text route

  • Revamped API Serving

    The API serving infrastructure has been completely rebuilt

    Shortened authentication request timeout

2.1.3 (2021/04/13)

  • Improved Models

    Improved credit card handling in ASR transcripts

2.1.2 (2021/03/02)

  • Added ZODIAC_SIGN , which covers Zodiac Signs such as "Aries" or "Taurus".
  • This release features improved PII detection, particularly surrounding SSN , DOB and NUMERICAL_PII .
  • Added passport numbers, vehicle license plate numbers and vehicle serial numbers to NUMERICAL_PII .
  • Passport numbers and vehicle serial numbers are now recognised as NUMERICAL_PII .

2.1.1 (2021/02/26)

  • Improved Models

    Further PII detection improvements targeted at numerical entity detection.

2.1.0 (2021/02/18)

Improvements

  • Improved Models

    This release improves PII detection accuracy, via model updates and improved training data.

    Additionally an improvement was made in an edge case where model output is highly ambiguous.

2.0.1 (2021/01/25)

  • Improved Models

    Improved PII detection models.

  • Reduced Image Size

    Further reduced Docker image size.

2.0.0 (2021/01/14)

What’s new in 2.0.0?

  • Revamped API

    The 2.0.0 release features a revamped API interface, based on recent customer feedback

  • New Entity Types

    New entity types:

    • FILENAME : Name of a computer file, e.g., bradtaxreturns.txt, koalabear.jpg
    • ORIGIN : Origin encompasses nationalities, ethnicities, and races. E.g., Canadian, american, caucasian

    Added PHI entity types:

    • BLOOD_TYPE : Blood type, e.g., O-
    • CONDITION : A medical condition. Includes diseases, syndromes, deficits, disorders. E.g., chronic fatigue syndrome, arrhythmia, depression.
    • DRUG : Medical drug, including vitamins and minerals. E.g., Advil, Acetaminophen, Panadol
    • INJURY : Human injury, e.g., I broke my arm, I have a sprained wrist. Includes mutations, miscarriages and dislocations.
    • MEDICAL_PROCESS : Medical process, including treatments, procedures and tests. E.g., ‘heart surgery’, ‘CT scan’.
    • PHYSICAL_ATTRIBUTE : A body attribute, e.g. I’m 190cm tall.
    • STATISTICS : How many people in a specific country have the disease or what percentage of people were cured of a disease, for example. E.g., 20 percent of people have arrythmia

Improvements

  • New Inference Engine

    New inference engine, which is significantly faster than previous releases

  • Reduced Container Image Size

    Docker image size has been drastically reduced

1.5.1 (2020/12/08)

  • Improved Models

    Improved credit card number and SSN detection in chat logs.

1.5.0 (2020/11/19)

What’s new in 1.5.0?

  • New Accuracy Mode

    The previous standard accuracy model is now fast. In it’s place, we have introduced a new model ~2x slower but with far better performance.

Improvements

  • Improved Models

    Improved model accuracy via additional training data.

  • Runtime Performance Improvements

    Reduced latency by ~15% on fast mode. 60ms to 52ms on our single core GCP N2 Cascade Lake test instance.

    Dramatically reduced RAM usage for all models.

    Reduced Docker image size.

1.4.2 (2020/11/6)

  • Phone Number Improvements

    Improved support for 7 digit phone numbers

1.4.1 (2020/11/4)

  • SSN Improvements

    Improved SSN detection

1.4.0 (2020/10/23)

What’s new in 1.4.0?

  • New Entity Types

    Added DOB entity type, which covers Date of Birth (e.g., Date of Birth: March 7, 1961)

    Added CVV, which covers credit card verification codes (e.g., CVV: 080)

    Added CREDIT_CARD_EXPIRATION, which is the expiration date of a credit card (e.g., Expires: 2/28)

    Added PASSWORD entity type, which covers account passwords, pins, access keys, or verification answers (e.g., 27%alfalfa, 1234)

Improvements

  • Improved Models

    Adjusted entity types to give better per class accuracy.

    Improved SSN and credit card detection.

  • Health Route

    Added last_auth_call_successful into healthz response.

1.3.2 (2020/10/12)

  • Authentication

    Added backup authentication mechanism.

1.3.1 (2020/10/05)

  • Large Input Handling

    Improved handling of ultra large inputs (>100K words).

1.3.0 (2020/09/25)

What’s new in 1.3.0?

  • New Entity Types

    Added USERNAME entity type, User name or handle (e.g., privateairocks, @_PrivateAI).

    Added RELIGION entity type, which covers terms indicating religious affiliation (e.g., Hindu).

1.2.0 (2020/08/14)

What’s new in 1.2.0?

  • New Entity Types

    Added AGE entity type, which is a number or phrase associated with an age (e.g., 27)

  • New Accuracy Mode

    Added best accuracy mode. To use it, please set accuracy_mode to best.

1.1.0 (2020/07/05)

What’s new in 1.1.0?

  • Credit Card Number Support

    Added support for credit card numbers

1.0.0 (2020/06/15)

Initial container release

For release notes older than 1.0.0, please contact us.

© Copyright 2022, Private AI.