Release Notes
Below are the release notes for the Private AI container. To update, please grab a new version of the image.
3.4.1.1 (2023/10/04)
What’s new in 3.4.1.1?
-
Model Improvements
-
Improved detection of
NUMERICAL_PII
andMONEY
entities related to cryptocurrency wallet IDs, transaction hashes, and cryptocurrency names / amounts
-
Improved detection of
2.14.6 (2023/10/02)
What’s new in 2.14.6?
-
General Information
- Please note that this release is for legacy users only and is NOT for users already on V3 of Private AI
-
Model Improvements
-
Improved PCI detection (in particular,
CREDIT_CARD
s) in French
-
Improved PCI detection (in particular,
3.4.1 (2023/09/22)
What’s new in 3.4.1?
-
New Language Support
- We now provide extended support for Cantonese
-
Model Improvements
-
Improvements to PII detection in Dutch, with particular attention to
SSN
(Burgerservicenummer / Citizen Service Number and the Belgian NISS) andNUMERICAL_PII
such as organization numbers ( e.g. , Ondernemingsnummer, Identificatienummer) and VAT numbers ( e.g. , BTW Identificatienummer, BTW Nummer)
-
Improvements to PII detection in Dutch, with particular attention to
3.4.0 (2023/09/15)
What’s new in 3.4.0?
-
New Language Support
- We now provide Core Support for Dutch and Japanese
- Extended Support has also been added for Afrikaans
-
General Improvements
- DICOM file support is now available
- PNG file support is now available
- BMP file support is now available
- XML file support has been improved
- Audio support has been improved and can now be deployed in a single container
-
Model Improvements
- Improvements to multilingual PII detection, with a particular focus on PCI entity types, in: French, German, Spanish, and Portuguese
-
Fine-tuning of recently-added classes:
NAME_MEDICAL_PROFESSIONAL
andORGANIZATION_MEDICAL_FACILITY
3.3.4 (2023/09/02)
What’s new in 3.3.4?
-
General Improvements
- Improved OCR support and general performance improvements with PDFs
- General Office document support improvements
- webm format support for audio files
3.3.3 (2023/08/15)
What’s new in 3.3.3?
-
General Improvements
- General performance improvement and reduced memory footprint
- Various library updates based on security recommendations
- File processing now supports disabling entities being returned in response
-
New Entity Types
-
NAME_MEDICAL_PROFESSIONAL
: detects the names and professional titles of medical professionals such as doctors and nurses (e.g., Dr. Kay Martinez, MD ) -
ORGANIZATION_MEDICAL_FACILITY
: detects the names of medical facilities such as hospitals and clinics (e.g., Victoria General Hospital , Union Family Health Clinic )
-
-
Model Improvements
- Improved detection of PII in medical records and in .xml processed as plain text
-
Improved detection of
ACCOUNT_NUMBER
, particularly in French -
Improved detection of
HEALTHCARE_NUMBER
in English
3.3.2 (2023/07/12)
What’s new in 3.3.2?
-
General Improvements
- Significant performance improvement with OCR related tasks
- Image blurring has improved significantly
3.3.1 (2023/07/12)
What’s new in 3.3.1?
-
General Improvements
- Various library updates based on security recommendations
3.3.0 (2023/07/12)
What’s new in 3.3.0?
-
General Improvements
- File redaction for PDFs responds with numbered entities for the entire document rather than per page.
- PDF and image processing have speed improvements on the GPU container
- Doc / DocX file processing now returns redacted main file contents in response
- General updates to libraries based on security recommendations
-
Model Improvements
- General improvements to PII detection in: English, French, Japanese, Korean, Portuguese, Russian, Tagalog, Ukrainian
- Improved detection of numerical classes in: English, Korean, Spanish, Russian
- Improved detection of English PHI Classes: English
-
Improvements to the
ACCOUNT_NUMBER
entity in English and Spanish
3.2.1 (2023/06/03)
What’s new in 3.2.1?
-
General Improvements
- The Re-identification route has been improved to handle additional use cases.
-
New Language Support
- Extended support has been added for Bambara
3.2.0 (2023/05/25)
What’s new in 3.2.0?
-
New Features
- Re-identification endpoint now available. This endpoint allows a user to pass previously de-identified text to be re-identified. Further details on how to use this new endpoint can be found on the API Reference
-
You can now configure our solution to redact only entities protected by Japan's
Act on the Protection of Personal Information (APPI)
or APPI's sensitive personal data designation. See our documentation for details on
how to implement
and our
supported entities list
for the entities covered by
APPI
andAPPI_SENSITIVE
-
Model Improvements
-
Improved detection of numerical entity classes in English (e.g.,
BANK_ACCOUNT
,ACCOUNT_NUMBER
,CREDIT_CARD
,CREDIT_CARD_EXPIRATION
) -
Improved precision in detecting PHI classes in English (e.g.,
CONDITION
,DOSE
,DRUG
, andMEDICAL_PROCESS
) - Improved PII & PCI detection in Japanese, Polish, Portuguese, Russian, Spanish, Ukrainian
-
Improved detection of numerical entity classes in English (e.g.,
-
Better Image and PDF Processing (Again!)
PDF and image processing has once again been improved performance-wise.
-
New File Formats
The following file formats are now supported in the
/process/file/uri
andprocess/file/base64
endpoints:- .eml
- .txt
- .xls / .xlsx
- .ppt
3.1.1 (2023/04/18)
What’s new in 3.1.1?
-
New Entity Types
-
ACCOUNT_NUMBER
captures the number associated with a client’s account (e.g., Policy No. 10042992 , Member ID: HZ-5235-001 ) -
DURATION
captures mentions of periods of time, specified as a number and a unit of time (e.g., 8 months , 2 years )
-
-
New Language Support
- Added Core Support for Mandarin (simplified script)
-
Model Improvements
-
Improved detection of PCI classes in English, including optimization for South African English, Italian, Spanish (in particular:
BANK_ACCOUNT
,CREDIT_CARD
) - Improved detection of PHI classes in English
- Improved detection of PII in English clickstream data sets
- Improved detection of PII in Mandarin (simplified), Tagalog, French
-
Improved detection of PCI classes in English, including optimization for South African English, Italian, Spanish (in particular:
2.14.5 (2023/04/18)
What’s new in 2.14.5?
-
Model Improvements
-
Improved detection of PCI classes in English, including optimization for South African English, Italian, Spanish (in particular:
BANK_ACCOUNT
,CREDIT_CARD
) - Improved detection of PHI classes in English
- Improved detection of PII in English clickstream data sets
- Improved detection of PII in Mandarin (simplified), Tagalog, French
-
Improved detection of PCI classes in English, including optimization for South African English, Italian, Spanish (in particular:
3.1.0 (2023/04/03)
What’s new in 3.1.0?
-
New File Formats
The following file formats are now supported in the
/process/file/uri
andprocess/file/base64
endpoints:- .doc
- .docx
- .xml
- .json
-
Language Detection
The
/process/text
endpoint returns alanguage_detected
attribute which specifies ISO 639-1 language labels in the response. For more information, please have a look at the process text documentation -
Better Image and PDF Processing
PDF and image processing has been greatly improved in both accuracy and throughput performance.
-
Model Improvements
-
Improved detection of PCI and other numerical classes in English (in particular:
CREDIT_CARD
,CREDIT_CARD_EXPIRATION
,CVV
,HEALTHCARE_NUMBER
,VEHICLE_ID
) -
Improved detection of PCI classes in French and Spanish (in particular:
BANK_ACCOUNT
,CREDIT_CARD
,CREDIT_CARD_EXPIRATION
,CVV
)
-
Improved detection of PCI and other numerical classes in English (in particular:
3.0.0 (2023/03/12)
We are proud to announce the 3rd major version of Private AI's solution. Note that 3.0 does not maintain backwards compatibility. Instead, Private AI will continue to do 2.X releases with updated models and potential security fixes until 3 months after this release.
What’s new in 3.0?
Starting with 3.0, we will be distributing our container exclusively through the Azure Container Registry. Login credentials and sample commands to download the container image can be found in the customer portal and will look like:
docker login -u INSERT_UNIQUE_CLIENT_ID -p INSERT_UNIQUE_CLIENT_PW crprivateaiprod.azurecr.io
-
Licensing Change
We have changed our licensing system from an API Key to a license file. In order to run the container with the license file, run the following:
docker run --rm -v "full path to license.json":/app/license/license.json \ -p 8080:8080 -it crprivateaiprod.azurecr.io/deid:<version>
Once you have the container up and running with the new license file, you can run send the container a request like this:
curl --request POST --url http://localhost:8080/v3/process/text --header 'Content-Type: application/json' \ --data '{"text": ["Hello John"]}'
-
New API Interface
3.0 introduces many changes to the API, please see the new API Reference for details. Key changes:
-
deidentify_text
is now called/v3/process/text
-
Endpoints in general now follow the standard of
process/type/subtype
-
text
field is required to be a list by default, even with a single string -
key
field has been removed from the body and is now in the request header:X-API-KEY
. It is only required when using our demo endpoint -
accuracy_mode
is now calledaccuracy
and can be found one layer down in theentity_detection
dictionary settings -
return_entities
parameter allows you to configure whether to include identified entities in the response -
unique_pii_markers
has been removed. Instead, please setpattern
inside the marker parameters toBEST_ENTITY_TYPE
-
Entity
is established in nomenclature to recognize PII, PHI, PCI
Example conversions from V2 request payload to 3.0:
### Example with enabled_classes ### 2.0: {"text": "Hello there John!", "key":<My_api_key>, "accuracy_mode":"high", "enabled_classes":["NAME"] } 3.0: {"text": ["Hello there John! I live in Newark"], "entity_detection": {"accuracy": "high", "entity_types": [{"type": "ENABLE", "value":["NAME"]}] } } ----------------------------------------------------------------------------------------- ### Example with inclusion of all entity types in entity marker ### 2.x: {"text": "Hello there Pieter!", "key":<My_api_key>, "accuracy_mode":"standard", "marker_format": "[ALL_CLASS_NAMES]" } 3.0: {"text": ["Hello there Pieter!"], "entity_detection": {"accuracy": "standard"}, "processed_text": {"type": "MARKER", "pattern": "[ALL_ENTITY_TYPES]"} } ----------------------------------------------------------------------------------------- ### Example with disabling unique_pii_markers through MARKER definition ### 2.0: {"text": "Hello there Paul!", "key":<My_api_key>, "accuracy_mode":"high_multilingual", "unique_pii_markers": false } 3.0: {"text": ["Hello there Paul!"], "entity_detection": {"accuracy": "high_multilingual"}, "processed_text": {"type": "MARKER", "pattern": "[BEST_ENTITY_TYPE]"} }
-
-
File Support for Audio / PDFs / Images
3.0 supports file redaction using an unified endpoint, which works either with URIs or base64-encoded files:
/v3/process/files/uri
and/v3/process/files/base64
. Please see the Quickstart Guide for details. -
Application version endpoint
Sending a GET request to the container root endpoint
http://container-address:8080
will return a response providing information about the application version:{"app_version": "3.0.0"}
-
Synthetic Entity Generation
Synthetic entity generation is now supported across each language Private AI supports.
Quality of generated entities has been improved, particularly around matching the formatting and length of the original entity.
-
Environment Variables
All previous environment variables are now prefixed with “PAI” to better differentiate PAI specific variables. You can find the full list of environment variables in Running the Container.
-
PII Metrics
In 3.0, non-airgapped users can enable PII metrics gathering for reporting purposes. In order to do this, add
PAI_ENABLE_PII_COUNT_METERING=True
as an environment variable. You'll be able to see the number of PII captured by your license usage and we will be further improving this feature to provide you with a granular view on entity types captured and other reporting features.Please note that this feature is OFF by default and requires explicit configuration to gather this data. Any usage prior to enabling this feature is NOT captured and cannot be reported on retroactively.
2.14.3 (2023/03/07)
-
Improvements to numerical entity detection and classification, specifically:
NUMERICAL_PII
,BANK_ACCOUNT
,PHONE_NUMBER
,CREDIT_CARD
,CREDIT_CARD_EXPIRATION
andCVV
. - Improvements to PII detection within ASR transcripts, including variable casing (lower/upper/sentence case) for named entities.
- Improvements to ORGANIZATION detection.
- Better recognition of emergency phone numbers.
- GPU container image size has been reduced.
2.14.2 (2023/01/17)
-
Improvements to
PHONE_NUMBER
detection, particularly in ASR transcripts in which entities may have unusual formatting. -
Improvements to
CREDIT_CARD
detection in ASR transcripts, which may contain spelling and formatting anomalies. - Optimizations for detecting PII entities in HR documents, such as CVs and resumes.
- General improvements to PII detection in Spanish text.
- Resolved an issue where redaction markers in previously redacted data were sometimes captured as PII.
-
The trailing period in company names such as
ACME Co.
are now included in the entity.
2.14.1 (2022/11/30)
- Improved PCI detection in French and Spanish
-
/
,\
and$
characters are no longer stripped from entities. For example,Visit us at facebook.com/user123/
is now redacted asVisit us at [URL_1]
instead ofVisit us at [URL_1]/
. - Tuned RAM check thresholds for machines with 8GB RAM.
- Language Support: Added Extended Support for Japanese .
2.14.0 (2022/11/11)
What’s new in 2.14.0?
-
New Language Support
The following languages have been added to Extended support:
- Luxembourgish
- Swahili
-
Entity Types
-
NAME_GIVEN
, which encompasses name(s) given to an individual, usually at birth, often first/middle names in Western cultures, middle/last names in Eastern cultures. -
NAME_FAMILY
, which encompasses names indicating a person’s family or community, often a last name in Western cultures, first name in Eastern cultures. -
MEDICAL_MISC
entity type has been deleted.
-
Improvements
-
Improved Models
- Improved detection of names spelled out in all caps by ASR systems.
-
NAME
: Improved name subclass detection / classification in English. -
EMAIL_ADDRESS
: More robustness around partial / unformatted emails in English. -
CREDIT_CARD
: improvement around mentions of the last 4 digits only in English. - Enhanced detection of NAMEs and other entities when spelled-out in a transcript (e.g., “c as in charlie …”)
- Improvements to detection of PASSWORD, including verification answers
- Improved handling of eponymous medical conditions in English.
- Improvements to PHI detection in English.
- Improvements to PHI detection in Spanish.
-
Improvements to all personal number classes such as
PASSPORT
,CREDIT_CARD
andSSN
including international variants in French, German, Italian, Tagalog and Ukrainian. - Improved PII detection in text containing facerolls and typos.
- Improvements to PII detection in Tagalog data containing profanities / toxic material.
- Improved detection of ambiguous LOCATION / ORGANIZATION mentions, as well as ambiguous NAMEs
- Improved PII detection in text containing control characters
-
General improvements to:
- Russian
- Spanish
-
Miscellaneous
- Container startup memory check is now performed on container start, instead of after loading models
- Fixed handling of null strings
2.13.1 (2022/09/26)
-
Emoji Improvements
Processing of non-English text containing emojis has been improved
2.13.0 (2022/09/08)
What’s new in 2.13.0?
-
Second Generation Synthetic PII
This release features the debut of our second generation synthetic PII system. The system has been rebuilt from the ground up and leverages a new approach developed by Private AI. The new system features the following improvements:
- Increased PII realism, including greater variety of generated terms and less generation of common terms such as "John" or "Paul".
- Better generation of numerical PII, particularly around the correct number of digits.
Note that the CPU containers are now approximately 700MB larger due to this change and that the new synthetic PII system is slower than the first generation. Private AI will be releasing optimizations for both container size and processing time in subsequent releases, along with GPU support.
-
New Language Support
The following languages have been added to Extended support:
- Belarusian
- Icelandic
- Indonesian
- Khmer
- Thai
We have also added Beta support for Japanese.
-
New Entity Types
-
NAME_GIVEN
, which encompasses name(s) given to an individual, usually at birth, often first/middle names in Western cultures, middle/last names in Eastern cultures. -
NAME_FAMILY
, which encompasses names indicating a person’s family or community, often a last name in Western cultures, first name in Eastern cultures.
-
-
Disable GPU
-
PAI_DISABLE_GPU_CHECK
allows users to disable the startup check for GPU on the container and run the GPU container using CPU only.
-
Improvements
-
Best Label Calculation
The best label calculation has been updated to prefer the most granular entity type. For example,
Hello John
will becomeHello [NAME_GIVEN]
instead ofHello [NAME]
. Similarly,I live in Toronto
will beI live in [LOCATION_CITY]
instead ofI live in [LOCATION]
. When an entity spans multiple words that have additional, nested labels, the existing behaviour is retained: namely, the most general entity type, covering the entire span, is used. For example,Hello John Doe
will beHello [NAME]
andI live in Toronto, Canada
will beI live in [LOCATION]
. -
Improved Models
This release features a number of PII detection improvements:
- Further improvements to the character-level recognition that was introduced in 2.12.
-
False Positive reduction for
CONDITION
,DRUG
,MEDICAL_PROCESS
in English. -
CREDIT_CARD
,PHONE_NUMBER
,EMAIL_ADDRESS
,BANK_ACCOUNT
,PASSPORT_NUMBER
,SSN
improvements in Spanish. -
CONDITION
,DRUG
,MEDICAL_PROCESS
in Spanish. -
NAME
,LOCATION
,ORGANIZATION
,POLITICAL_AFFILIATION
improvements in German, French, Italian and Polish. - Improved performance across all entity types in Tagalog.
-
Miscellaneous
Improved log messages on container startup.
2.12.0 (2022/07/27)
What’s new in 2.12.0?
-
New Inference Pipeline
This release features the debut of our new inference pipeline. The main feature of the new pipeline is that it is able to operate on non-whitespace separated text. This has a number of benefits, including better performance around punctuation and control characters and enables new languages, such as Mandarin (simplified).
-
Prometheus Endpoint
A Prometheus metrics endpoint is now available at
/metrics
. See the API reference for details. -
New Language Support
The following languages have been added to Core support:
- Ukrainian
- Hindi
In addition to this, we have added Extended support for the following 5 languages:
- Estonian
- Malay
- Punjabi
- Tamil
- Vietnamese
We have also added Beta support for Mandarin (simplified)
Improvements
-
Improved Models
This release features a number of PII detection improvements:
-
German
NUMERICAL_PII
detection has been improved. - Improved performance on medical questionnaires and customer onboarding forms.
- Multilingual chat performance has been improved, particularly in Spanish.
- Postal address detection performance has been improved for addresses in the United Kingdom, Australia and New Zealand.
-
PASSWORD
andCVV
detection performance has been improved. - PHI Attributes / symptoms detection has been improved.
- General improvements for EHRs and ASR transcripts.
-
German
-
Security Patch
Several updates to container image dependencies and Python libraries have been updated to address security recommendations
2.11.1 (2022/06/01)
Improvements
-
Security Patch
Several libraries received patch updates to address security recommendations and have been included in this release.
-
Improved Models
Improvements have been made to the detect instances of medical entities such as
CONDITION
,INJURY
andMEDICAL_MISC
.Improvements have been made to
NUMERICAL_PII
, particularly in multilingual models -
Container Options
Allow startup resource check to be disabled.
2.11.0 (2022/05/10)
What’s new in 2.11.0?
-
New language support
Tagalog has been moved from extended to core support. For the full list, please see the supported languages page.
-
New Entity Types
VEHICLE_ID
has been added in this release. This entity type covers vehicle identification numbers such as license plate numbers, vehicle serial and vehicle identification numbers.
Improvements
-
Model Improvements
PII detection error has been reduced by approximately 10%, particularly around
CREDIT_CARD
,CREDIT_CARD_EXPIRATION
andCVV
. Australian and New Zealand address recognition have also been improved.Performance on disfluent ASR transcripts (particularly around passwords), chat logs and medical patient records has been improved.
CPU model processing speed has increased by approximately 8%, whilst GPU processing speed has been improved by up to 35%, depending on the chosen accuracy mode.
-
Service health monitoring
The
/healthz
endpoint is more robust for detecting the overall health of the API service. -
Improved error messages
Error messages when either the key or text fields are missing are now more specific.
-
Security updates
Libraries have been updated based on security recommendations from our regular vulnerability scans.
-
Documentation revamp
Our public documentation has been updated to include new guides, updated install instructions and sample configurations.
-
Other
ENABLED_CLASSES
can now be set via an environment variable, similar toLOG_LEVEL
.
2.10.0 (2022/03/14)
What’s new in 2.10.0?
-
33 Supported Languages
Our system can now detect PII in 33 different languages, with more coming soon. For the full list, please see the supported languages page.
-
New Entities
2.10 includes the following new entities:
-
GENDER_SEXUALITY
: Terms indicating gender identity or sexual orientation, including slang terms. E.g.: “female”, “bisexual”, “trans” -
MARITAL_STATUS
: Terms indicating marital status. E.g.: “single”, “common-law”, “ex-wife”, “married” -
LOCATION_COORDINATE
: A subclass of LOCATION. A geographic position referred to using latitude, longitude, and/or elevation coordinates. E.g.: “We’re at: [40.748440 and -73.984559] ”
The
NUMERICAL_PII
class now includes MAC addresses and cookie IDs.These entities will be listed in the docs shortly.
-
-
Complete HIPAA Support
With this release we have complete support for all the entities listed under the HIPAA Safe Harbor rule. Health plan beneficiary numbers and medical record numbers have been added to
HEALTHCARE_NUMBER
, whilst medical device serial numbers have been added toNUMERICAL_PII
. -
Entity Sets
enabled_classes
now supports entity sets. This way, you can simply include the name of the regulation that you want to comply with, and we will enable the entities that are listed in that regulation for you. The regulations that are implemented in this release are:- GPDR
- CPRA
- HIPAA
- Quebec Privacy Act
- PCI
Example command:
curl -X POST localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "Hi Anwar", "key": "<customer key>", "enabled_classes": ["GDPR"]}'
The docs will be updated to include this functionality and the entities included in each entity set shortly.
-
Docker image version logging
We are now logging the version of the docker image in our logs. This allows us to provide better customer support based on the version of that is in the logs.
Improvements
-
Better models
2.10 features improved PII detection models, particularly around credit card numbers, verification codes, social security numbers, US postal addresses, email addresses in emails and resumes.
-
TIME entity adjustment
We have adjusted the
TIME
entity to no longer include ASR transcript timestamps. -
Better API error messages
In order to improve error handling and make debugging easier, we have reworked our API error messages to be more detailed and understandable. Error messages (but not potentially sensitive payloads) are now also logged to console.
-
Redaction marker label calculation
We have improved how the redaction marker that is used in the redacted text is calculated.
-
Resource validation system
In 2.9 we introduced checks that validate that the container has been provided with enough resources. In this release, we have further expanded and improved these checks to be able to detect memory and GPU resources more accurately.
-
Health check system
We have improved the health check endpoint in the GPU build to return the health of the GPU inference engine as well.
Improve the process monitoring inside the GPU build to eliminate the possibility of having dead containers that are still running.
We have updated the health check route in the CPU build to be completely asynchronous.
-
RAM usage
The container printed the RAM usage on every API call. This has now been moved to ‘debug’ log level.
-
Docs
We have added a new page in our documentation title “Deployment Considerations”, which aims to help users on how to deploy the docker image on production environments.
Other notable changes are:
- Adding a new page that lists supported languages
- Update the list of supported entities
-
Web Demo
We have made a small improvement to the UI of the web demo by changing the model options from a drop down list to radio buttons.
Web demo now has unique PII markers disabled by default. This change will be reflected in the upcoming API refactor.
2.9.1 (2022/02/24)
-
Logging Improvements
RAM usage is now logged on debug level instead of info
-
Container Health
healthz
route latency improvedDocker container health check has been implemented, for improved AWS ECS use
2.9.0 (2022/01/18)
What’s new in 2.9.0?
-
New PII Classes
Passport numbers are now recognized as a separate entity type,
PASSPORT_NUMBER
instead ofNUMERICAL_PII
.POLITICAL_AFFILIATION
has been added and covers terms referring to a political party, movement, or ideology (e.g., Republican, liberal)We now support IPv6 address deidentification as well in addition to IPv4 addresses. Any IPv6 address that is found in the text will be labelled as
IP_ADDRESS
. -
Container Startup Resource Validation
Based on our user feedback, we have implemented a hardware resource validation that runs on container startup. This implementation validates that the container has access to an NVIDIA GPU and/or enough RAM on startup. If the implementation fails to validate these requirements, it prints a helpful and detailed error message (rather than the default “Killed” message printed by Docker) which guides the user on how to solve these resource related issues.
-
Docker Hub Repository
Starting with release 2.9.0, the container can be pulled from a private Docker Hub repository. Please contact us if you would like to receive the container via this repository, instead of the existing encrypted Docker image export.
Improvements
-
Model Improvements
This release includes improved models. Improvements include:
- Better performance on ASR system transcripts, particularly around disfluencies
- Improved Driver License detection
- Better performance on SMS message style conversations
-
Improved Documentation
We have spent some time improving our documentation as well. The noteworthy improvements are:
- The table of contents is now more clear and easier to navigate.
- A new detailed introduction page.
- Detailed installation instructions.
- Updated API reference.
- Updated Web Demo to showcase Multilingual PII Redaction and Synthetic Personal Data Generation in addition to English PII Redaction.
-
Fixes
We fixed an issue where the built-in labels that use regex patterns would override the custom labels defined in
block_list
.We tuned the models to fix an issue where some non-PII words that are following PII words would be labelled as part of the PII word.
We have removed a warning message that would show up on container startup due to an internal library incorrectly assessing the ML dependencies.
Synthetic PII generation now works when the custom
block_list
feature is used.
2.8.0 (2021/12/20)
What’s new in 2.8.0?
-
New Entity Types
Added
DRIVER_LICENSE
entity type. Driver's licenses will now be picked up in this class instead of `NUMERICAL_PII.
Improvements
- Improved backup authentication mechanism fail-over logic.
- Updated API server. This was a dependency and security upgrade.
- GPU inference server errors now return 500 instead of 503.
Deprecation Notice: We’ve rearranged the plumbing on our authentication system. Releases prior to 2.3.0 will no longer authenticate as of 31st December 2021.
2.7.1 (2021/10/28)
-
Linked Batch Processsing
This release adds the
link_batch
option. When enabled, batch inputs will be joined together internally in the Private AI inference engine, to share context between the different inputs. This is useful when processing a sequence of short inputs, such as an SMS chat log. Please visit the API reference for implementation details.
2.7.0 (2021/10/29)
What’s new in 2.7.0?
Breaking change: The default accuracy mode has been changed from standard
to high
.
-
Added the
LOG_LEVEL
environment variable, which controls logging verbosity. The environment variable can be set toinfo
,warning
orerror
. Default isinfo
.
Improvements
-
Model Improvements
This release features improved PII detection models:
- Numerical PII detection has been further refined, particularly around SSNs and credit card numbers
- Further improvements for chat transcripts
- Further improvements for OCR documents, particularly receipts
- Further improvements for JSON files
-
Authentication
The backup authentication mechanism has been moved to a completely new system, improving redundancy
-
Usage Reporting
The
get_usage
route now returns the current month's usage, instead of current week.
2.6.1 (2021/09/27)
-
Improved Models
This release fixes phone numbers and credit card numbers occasionally being detected as SSNs. Additionally, performance around ASR transcripts and the various ways they transcribe numbers was improved
2.6.0 (2021/09/21)
Improvements
-
Improved Models
This release features improved PII detection models, particularly surrounding English and Portuguese.
Optimizations for a number of popular ASR systems have been added in this release. In particular, the optimizations cover how the systems transcribe numbers.
2.5.0 (2021/08/20)
What’s new in 2.5.0?
-
New Entity Types
The
DATE
class has been split intoDATE
andDATE_INTERVAL
.DATE_INTERVAL
covers broader references such as 'last summer', whilstDATE
remains targeted as specific references like '21/8/2019' -
Batch Processing
Support for batch processing has been added. To use batch processing, simply submit a list of text strings:
curl -X POST http://localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": ["My password is: 4XDX63F8O1", "My password is: 33LMVLLDHNasdfsda"], "key": <key>}'
Improvements
-
Multilingual Improvements
This release features improved PII detection models, particularly surrounding English, Italian and Korean.
-
Image Size
Container image size has been further reduced.
2.4.0 (2021/07/21)
What’s new in 2.4.0?
-
Custom Redaction Markers
Added support for custom redaction markers.
-
Allow Lists
Added support for allow lists - any entities matching entries in the allow list will be discarded.
-
New Entity Types
Added new location classes:
-
LOCATION_ADDRESS
: A street address, e.g. '48 Bristol Ave, 6157, Perth, Australia' -
LOCATION_CITY
: A city, e.g. 'Perth' or 'Toronto' -
LOCATION_COUNTRY
: A country, e.g. 'Spain' -
LOCATION_ZIP
: A zip or postal code, e.g. '10405' -
LOCATION_STATE
: A reference to a state within a country, e.g. 'California'
NOTE: These entities are subclasses of
LOCATION
- theLOCATION
label remains unchanged and will appear along with the above entities -
Improvements
-
Model Improvements
This release features improved PII detection and synthetic PII generation models, particularly surrounding Spanish, Italian and Korean.
-
Phone Number Improvements
Improved phone number post-processing, particularly around bracket handling and '+' in international dialling codes
-
Best Label Calculation
Improved automatic calculation of the number of processing threads to use whilst executing the ML models.
2.3.1 (2021/07/16)
-
CPU Performance Improvement
Patch release to address CPU utilisation
2.3.0 (2021/06/25)
What’s new in 2.3.0?
-
New Languages
Added support for Korean
-
New Entity Types
Added
ROUTING_NUMBER
, which is a number associated with a bank or financial institution (e.g., 012345678).Added
BANK_ACCOUNT
, which is a bank account or bank card number (e.g., 012345-67).
Improvements
-
Improved Models
This release features improved PII detection models, trained on ~50% more data than 2.2.0.
We have improved PHI detection performance. More to come in the next release.
-
Authentication
This release now authenticates with our revamped authentication system. No changes on the user side are required.
2.2.2 (2021/06/03)
-
New Accuracy Mode
Added a new accuracy mode that is approximately 4x faster than
standard
. In order to use this model, please setaccuracy_mode
tofast
.
2.2.1 (2021/05)
-
Improved Models
Improved
SSN
detection in ASR transcriptsImproved PHI detection
2.2.0 (2021/04/29)
What’s new in 2.2.0?
-
Multilingual Support
This release adds support for Spanish, French, Italian, German and Portuguese. To enable it, please see the API Reference for details
-
Synthetic PII Generation
Beta release of synthetic PII generation. In addition to identifying and redacting PII, Private AI can now also generate synthetic PII. To try it out, please set
fake_entity_accuracy_mode
tostandard
:$ curl -X POST http://localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "so, it expires the 1st; and the 3 digits on the back", "fake_entity_accuracy_mode": "standard", "key": <key>}' { "result": "so, it expires the [CREDIT_CARD_EXPIRATION_1]; and the 3 digits on the back", "result_fake": "so, it expires the 20th; and the 3 digits on the back", "pii": [ { "marker": "CREDIT_CARD_EXPIRATION_1", "text": "21st", "best_label": "CREDIT_CARD_EXPIRATION", "stt_idx": 19, "end_idx": 23, "labels": {"CREDIT_CARD_EXPIRATION": 0.8895}, "fake_text": "20th", "fake_stt_idx": 19, "fake_end_idx": 23 }, ], "api_calls_used": 1, "output_checks_passed": true }
Improvements
-
Customizable API Port
API port can now be customized. See the Environment Variables section for details.
Health check port is now on port 8080, same as the main deidentify_text route
-
Revamped API Serving
The API serving infrastructure has been completely rebuilt
Shortened authentication request timeout
2.1.3 (2021/04/13)
-
Improved Models
Improved credit card handling in ASR transcripts
2.1.2 (2021/03/02)
-
Added
ZODIAC_SIGN
, which covers Zodiac Signs such as "Aries" or "Taurus". -
This release features improved PII detection, particularly surrounding
SSN
,DOB
andNUMERICAL_PII
. -
Added passport numbers, vehicle license plate numbers and vehicle serial numbers to
NUMERICAL_PII
. -
Passport numbers and vehicle serial numbers are now recognised as
NUMERICAL_PII
.
2.1.1 (2021/02/26)
-
Improved Models
Further PII detection improvements targeted at numerical entity detection.
2.1.0 (2021/02/18)
Improvements
-
Improved Models
This release improves PII detection accuracy, via model updates and improved training data.
Additionally an improvement was made in an edge case where model output is highly ambiguous.
2.0.1 (2021/01/25)
-
Improved Models
Improved PII detection models.
-
Reduced Image Size
Further reduced Docker image size.
2.0.0 (2021/01/14)
What’s new in 2.0.0?
-
Revamped API
The 2.0.0 release features a revamped API interface, based on recent customer feedback
-
New Entity Types
New entity types:
-
FILENAME
: Name of a computer file, e.g., bradtaxreturns.txt, koalabear.jpg -
ORIGIN
: Origin encompasses nationalities, ethnicities, and races. E.g., Canadian, american, caucasian
Added PHI entity types:
-
BLOOD_TYPE
: Blood type, e.g., O- -
CONDITION
: A medical condition. Includes diseases, syndromes, deficits, disorders. E.g., chronic fatigue syndrome, arrhythmia, depression. -
DRUG
: Medical drug, including vitamins and minerals. E.g., Advil, Acetaminophen, Panadol -
INJURY
: Human injury, e.g., I broke my arm, I have a sprained wrist. Includes mutations, miscarriages and dislocations. -
MEDICAL_PROCESS
: Medical process, including treatments, procedures and tests. E.g., ‘heart surgery’, ‘CT scan’. -
PHYSICAL_ATTRIBUTE
: A body attribute, e.g. I’m 190cm tall. -
STATISTICS
: How many people in a specific country have the disease or what percentage of people were cured of a disease, for example. E.g., 20 percent of people have arrythmia
-
Improvements
-
New Inference Engine
New inference engine, which is significantly faster than previous releases
-
Reduced Container Image Size
Docker image size has been drastically reduced
1.5.1 (2020/12/08)
-
Improved Models
Improved credit card number and SSN detection in chat logs.
1.5.0 (2020/11/19)
What’s new in 1.5.0?
-
New Accuracy Mode
The previous
standard
accuracy model is nowfast
. In it’s place, we have introduced a new model ~2x slower but with far better performance.
Improvements
-
Improved Models
Improved model accuracy via additional training data.
-
Runtime Performance Improvements
Reduced latency by ~15% on
fast
mode. 60ms to 52ms on our single core GCP N2 Cascade Lake test instance.Dramatically reduced RAM usage for all models.
Reduced Docker image size.
1.4.2 (2020/11/6)
-
Phone Number Improvements
Improved support for 7 digit phone numbers
1.4.1 (2020/11/4)
-
SSN Improvements
Improved
SSN
detection
1.4.0 (2020/10/23)
What’s new in 1.4.0?
-
New Entity Types
Added
DOB
entity type, which covers Date of Birth (e.g., Date of Birth: March 7, 1961)Added
CVV
, which covers credit card verification codes (e.g., CVV: 080)Added
CREDIT_CARD_EXPIRATION
, which is the expiration date of a credit card (e.g., Expires: 2/28)Added
PASSWORD
entity type, which covers account passwords, pins, access keys, or verification answers (e.g., 27%alfalfa, 1234)
Improvements
-
Improved Models
Adjusted entity types to give better per class accuracy.
Improved SSN and credit card detection.
-
Health Route
Added
last_auth_call_successful
intohealthz
response.
1.3.2 (2020/10/12)
-
Authentication
Added backup authentication mechanism.
1.3.1 (2020/10/05)
-
Large Input Handling
Improved handling of ultra large inputs (>100K words).
1.3.0 (2020/09/25)
What’s new in 1.3.0?
-
New Entity Types
Added
USERNAME
entity type, User name or handle (e.g., privateairocks, @_PrivateAI).Added
RELIGION
entity type, which covers terms indicating religious affiliation (e.g., Hindu).
1.2.0 (2020/08/14)
What’s new in 1.2.0?
-
New Entity Types
Added
AGE
entity type, which is a number or phrase associated with an age (e.g., 27) -
New Accuracy Mode
Added
best
accuracy mode. To use it, please setaccuracy_mode
tobest
.
1.1.0 (2020/07/05)
What’s new in 1.1.0?
-
Credit Card Number Support
Added support for credit card numbers
1.0.0 (2020/06/15)
Initial container release
For release notes older than 1.0.0, please contact us.