Release Notes
Below are the release notes for the Private AI container. To update, please grab a new version of the image.
4.0.0 (2024/11/28)
What's new in 4.0.0?
attention
4.0.0 introduces breaking changes. See below for details
-
Breaking Changes
-
/v3
has been removed from the all URIs in the API -
Terms denoting gender identity and sexual orientation, previously redacted under a single class:
GENDER_SEXUALITY
, will now be redacted under two separate classes:GENDER
andSEXUALITY
, respectively. Please see our Supported Entity Types page for descriptions and examples of these entity types -
The entity type groupings
HIPAA
andPHI
have been renamed and the entity types included in each set has also changed.HIPAA
has been renamed toHIPAA_SAFE_HARBOR
and, accordingly, only includes entity types covered by the HIPAA Safe Harbor provision . ThePHI
grouping has been renamed toHEALTH_INFORMATION
. For further information on using these selective entity groupings, please see our documentation on Customizing Detection , as well as our Supported Entity Types for details on which entity types are included in each set
-
-
New Features
- Coreference Resolution : In real-world data, a single entity can be referred to in many ways — variations in spelling, nicknames, or abbreviations can create challenges for accurate processing. Our Coreference Resolution feature solves this by ensuring that all references to a person or organization are consistently linked (See Coreference Resolution )
- Container Playground : visualize Private AI's capabilities with the Playground deployed within your organization
- Checksum Validation : Verify Credit Card numbers using checksum formulas such as Luhn, ensuring accurate identification, categorization and removal of these entities
- Confidential Company Information : identify and secure sensitive corporate information, including logos , signatures , file names , locations and financial data (Beta)
- Sigature Detection : Hand written signatures can now be detected in images and documents.
-
New Entity Type
-
We've added Beta support for a new health information entity type:
EFFECT
. This class captures terms referring to medical symptoms and side effects (e.g., "worsening cough" , "elevated heart rate" ). Please refer to our Supported Entity Types page for more information on enabling Beta entity types and our Health Information entity types.
-
We've added Beta support for a new health information entity type:
- Model Improvements
-
English Models
-
Interactive voice response (IVR) codes are no longer captured as
PHONE_NUMBER
(e.g., "Press 2 for Cardiology). - Improved detection of abbreviated gender markers on licenses (e.g., "M", "F")
-
Improved detection of
USERNAME
entities and numerical corporate identifiers (NUMERICAL_PII
) in English structured and unstructured text - Improved detection of numerical entity types in contexts where only the last four digits are present
-
Improved detection of
PHONE_NUMBER
s - Improved detection of generational suffixes appearing after names (e.g., "junior" , "Sr." , "III" ), as well as detection of names that are ambiguous with placenames and other nouns (e.g., "London" , "Black" )
- Improved detection of individual names written in non-Latin characters appearing in English contexts
- Improved detection of email addresses when spelled out (e.g., "N for Nelly, a for apple" )
-
Interactive voice response (IVR) codes are no longer captured as
-
Multilingual Models
-
Interactive voice response (IVR) codes are no longer captured as
PHONE_NUMBER
(e.g., "Press 2 for Cardiology). -
Improved detection of
AGE
entities in German text -
Improvements to PII detection in Japanese, specifically targeting: unstructured text in professional emails, customer service call transcripts, text containing spelled-out names and email addresses,
LOCATION
entities and related subclasses (e.g.,LOCATION_CITY
,LOCATION_ADDRESS
), PCI entities (e.g.,CREDIT_CARD
,BANK_ACCOUNT
),DURATION
entities, and company names (ORGANIZATION
) in Japanese text (including Japanese and English mixed text) - Improved detection of PII in German medical texts
-
Added support for Quebecois RAMQ numbers, redacted as
HEALTHCARE_NUMBER
-
Improved detection in unstructured French text of health information French Canadian NAS (SIN) numbers, redacted as
SSN
- Improved detection of PII in French and Spanish customer support transcripts
- Improved detection of direct identifiers and numerical entity types in Portuguese and Russian unstructured text
-
Improved detection of PII in Mandarin (simplified script) targeting:
GENDER
,SEXUALITY
,MARITAL_STATUS
, andPHONE_NUMBER
as well as other direct identifiers
-
Interactive voice response (IVR) codes are no longer captured as
3.9.0 (2024/08/12)
What’s new in 3.9.0?
-
New Entity Types
-
We've added support for
GENDER
andSEXUALITY
entity types. Both classes are currently captured under the hybrid classGENDER_SEXUALITY
. The hybrid class will be removed in the 4.0 release. Please see our Supported Entities page for descriptions and examples of these entity types.
-
We've added support for
-
Model Improvements
- Improved detection of company and organization names in Japanese and English text
- Improved detection of PII in English clinical notes
-
Improved detection of French Canadian Social Insurance numbers (captured as
SSN
) - Improved detection of Portuguese tax ID numbers
-
Improved detection of
ACCOUNT_NUMBER
,ORGANIZATION_MEDICAL_FACILITY
,LOCATION_ADDRESS_STREET
,NAME_FAMILY
,NAME_GIVEN
andDURATION
entity types in Mandarin and Korean text - Improved detection of numerical entity types in Spanish text
-
General Improvements
-
The NER route is available. The
ner/text
provides the raw output of the entity detection engine and is recommended if details about all entities discovered in a text fragment, including overlapping ones are required. With thener/text
route you will be able to answer questions like Does this text contain zip codes? or Does it contain a complete address? This extra flexibility implies that you should be ready to implement your own post-processing logic - Audio files are now supported on the GPU container
- CSV file processing bug fix
-
The NER route is available. The
3.8.2 (2024/05/23)
What’s new in 3.8.2?
attention
On May 24th, we uncovered a bug in our PDF redaction module that could allow PII to leak through in the invisible text layer re-inserted into the new document. The PII is only accessible by searching the redacted document or by copying the invisible text layer. Our testing revealed that the issue occurs on only a few percent of PDFs, predominantly around slanted text and tables. As such we believe the likelihood of a leak is small in ordinary scenarios. However, as we take data privacy very seriously, we strongly suggest users of our PDF processing to update to 3.8.2 immediately
-
Model Improvements
- Improved PII detection in Japanese call transcripts
-
Improved detection of PII within structured tables in PDFs (
MONEY
mentions, in particular) -
Improvements to
CVV
detection in Russian text
-
General Improvements
- PDF invisible text layer issue addressed. Please see the top of the 3.8.2 notes for details
- Increased coverage of supported PowerPoint elements, particularly around text in separate containers on pages. Please see the PowerPoint page for further details
- Security updates
3.8.1 (2024/05/07)
What’s new in 3.8.1?
-
Model Improvements
-
Note
: The following model improvements for this point release are only included in the
high
andhigh_multilingual
accuracy modes -
Improved detection of PII in Japanese ASR call transcripts, especially for
EMAIL_ADDRESS
,LOCATION_ADDRESS
(and related entity types), andNAME
-
Improved detection of
MONEY
entities in structured data and PDF tables -
Better performance on PCI entity types (e.g.,
ROUTING_NUMBER
) and other numerical classes (e.g.,SSN
,NUMERICAL_PII
) in English text - Improved performance on PII detection in English clinical notes and other medical data types
-
Better detection of partial
CREDIT_CARD
numbers (e.g., the last four digits only) in English, German, French, Portuguese, and Japanese -
Improvements to the
DRIVER_LICENSE
andVEHICLE_ID
classes in Spanish
-
Note
: The following model improvements for this point release are only included in the
-
General Improvements
- Security updates
- Fixed issue with copyable Japanese text in de-identified PDFs
-
The
gpu-text
container no longer has a strict4GB
shared memory requirement; This requirement is only for thegpu
container
3.8.0 (2024/03/28)
What’s new in 3.8.0?
attention
3.8.0 includes a container log warning that strongly recommends 64 GB of RAM for anyone utilizing the file support endpoints.
-
Model Improvements
- Improved detection of PII in structured data
-
Improved detection of PII in French and Portuguese text, particularly with respect to numerical entity types such as
SSN
andHEALTHCARE_NUMBER
-
Translated Redaction Labels
- Redaction markers are now supported in core languages. Please see the languages page to see which are supported.
-
Websocket (Beta)
-
The websocket endpoint now retains context. This can be enabled / disabled via the
PAI_WS_LINK_BATCH
environment variable by setting it to true / false. The default istrue
-
The context window can also be adjusted via the
PAI_WS_CONTEXT_SIZE
environment variable. The default size is50
.
-
The websocket endpoint now retains context. This can be enabled / disabled via the
-
Other Improvements
- Image processing now supports redaction with black boxes.
3.7.3 (2024/03/14)
What’s new in 3.7.3?
-
New Entity Type
-
We've added support for a new entity type,
LOCATION_ADDRESS_STREET
, which is a subclass of our existingLOCATION_ADDRESS
. WhereasLOCATION_ADDRESS
captures a full address,LOCATION_ADDRESS_STREET
captures only the street name and number of an address, plus unit numbers, if relevant. Please see our Supported Entities page for examples of both categories.
-
We've added support for a new entity type,
-
Model Improvements
-
Improved detection of numerical entity types in French, Spanish, and English, especially
SSN
andCREDIT_CARD
-
Better detection of
BANK_ACCOUNT
,MEDICAL_PROCESS
, andTIME
in Dutch - Improved detection of numerical entity types written in words (as in ASR transcripts) for Japanese, French, Spanish, and Portuguese text
-
Note
Model improvements for this point release are only included in the
high
andhigh_multilingual
accuracy modes
-
Improved detection of numerical entity types in French, Spanish, and English, especially
-
Other General Improvements
- Docx document processing has improved and can now handle embedded hyperlinks, text boxes and shapes with text data
3.7.2 (2024/03/01)
What’s new in 3.7.2?
-
Model Improvements
- Improvements to detection of partial credit card numbers ( i.e. , "the last four digits are ...") and social security numbers
-
Better detection of numerical entity types (e.g.,
SSN
,CREDIT_CARD
) written in words ( e.g. , "one, two, three"), a common format used by ASR tools, especially in multilingual text (Spanish, Dutch, Korean, German, Italian) -
Improved detection of
MONEY
entities in English and all numerical entity types in French -
General improvements to PII detection in Mandarin (simplified script), especially for
NAME
,LOCATION
,MONEY
,DRUG
, andDATE
-
Improved PII detection in Spanish, especially with respect to regional equivalents of
SSN
,CVV
,ACCOUNT_NUMBER
,PASSPORT_NUMBER
, andDOB
-
Note
Model improvements for this point release are only included in the
high
andhigh_multilingual
accuracy modes
-
Other General Improvements
- Improved support for Japanese fonts in PDF file processing
- Security updates
3.7.1 (2024/02/09)
What’s new in 3.7.1?
-
Model Improvements
- Improved PII detection in tabular data with abbreviated field / column header names
-
New Language Support
- We now provide extended support for Georgian
-
Other General Improvements
- Security updates
3.7.0 (2024/02/02)
What’s new in 3.7.0?
attention
3.7.0 introduces a breaking change in model behaviour. Medical codes previously redacted as NUMERICAL_PII
will no longer be detected, unless the new Beta entity type MEDICAL_CODE
is explicitly enabled in your POST request (details on this entity type below).
-
New Beta Entity Type
-
We've added Beta support for a
MEDICAL_CODE
entity type to our English models, covering medical classification systems such as ICD-10, NDC, SNOMED, etc. Please see our Supported Entities page for more information on how to enable Beta entity types.
Ex.: 1981-03-11T04:11:32-03:00 Forearm sprain SNOMED-CT 70704007 -
We've added Beta support for a
-
Model Improvements
- English:
-
Improved detection of
BANK_ACCOUNT
andROUTING_NUMBER
, including regional variants such as UK sort code and Australian BSB -
Enhanced detection of
SSN
in ASR call transcripts - Improved support for PII detection in tabular data and unstructured text containing mathematical formulas
- Better PHI detection in ASR-transcribed clinical note dictations
- Japanese:
- Improved average recall of PII entity types
- Spanish & Portuguese:
-
Improved detection of
NAME_MEDICAL_PROFESSIONAL
andORGANIZATION_MEDICAL_FACILITY
- Allow Filter Logic (Beta)
We've introduced an AllowTextFilter
parameter under entity_types/filter
that applies a regex filter on the text payload as a whole and not just the entities detected (which is how the AllowFilter
parameter currently functions). This filter functionality is flagged as beta in the 3.7.0 release and is not recommended for production use.
- New Audio Options
For audio file processing, we've introduced two new parameters to adjust the audio redaction bleep frequency and gain. These parameters can be adjusted under AudioOptions
in the process_file
routes using bleep_frequency
and bleep_gain
.
More information can be found on the API spec
-
Other General Improvements
- Improvements to ASR engine to provide better overall audio redaction of detected entities
- Processed Docx files with entities detected in footers could previously cause issues, this has been fixed
- Processed Docx formatting issues including tables, checkboxes and spacing have been addressed
3.6.3 (2024/01/12)
What’s new in 3.6.3?
- Improved performance of Standard ASR
- GPU container can now be run as a non-root user
3.6.2 (2024/01/03)
What’s new in 3.6.2?
-
Bug fixes for
.ppt
and.pptx
files:- Issue with delimiters in text being improperly redacted
- Issue with redaction in slide notes
- Better error handling for unsupported images embedded within the files
-
Image resizing support within
.ppt
and.pptx
files -
Introduction of
PAI_MAX_IMAGE_PIXELS
environment variable to configure max allowed pixels in images processed
3.6.1 (2023/12/21)
What’s new in 3.6.1?
- Security patch for the transformer library
3.6.0 (2023/12/20)
attention
3.6.0
introduces a breaking change: The automatic English/multilingual accuracy mode selection introduced in 3.5.0
is now used by default. To retain previous behaviour, please set accuracy
under the entity_detection
payload configuration to high
.
What’s new in 3.6.0?
-
Websocket Endpoint (Beta)
-
A websocket endpoint has been introduced in this version:
/ws
- More information can be found here
-
A websocket endpoint has been introduced in this version:
-
General Improvements
-
The
high_automatic
accuracy mode introduced in3.5.0
is now the default model when processing data with the container. This means that if thestandard_high_multilingual
orhigh_multilingual
models are available in your container instance and the container detects a non-English language, it will automatically use the*-multilingual
model to process the data. To retain previous behaviour, please setaccuracy
tohigh
- Various improvements to PDF and other document types, specifically:
- Post processed visual distortions / black outs on PDFs no longer occuring
- DICOM files support 16 bit images
- Office documents processing speed improvements
- Office documents have improved entity numbering
-
The
process/file/base64
endpoint now supports the filetype as well as the mimetype. E.g.pdf
andapplication/pdf
can be used for a base64-encodedpdf
file. -
The
project_id
character limit has been increased from 32 characters to 60 -
Better reference tracking for entities referred to with different names, e.g. "Gary", "Gary's" and "G-A-R-Y" can all be linked with the same maker (
NAME_1
) - The CPU container RAM requirement when audio file support is enabled has been raised to 16GB.
- Support for containerised Azure OCR has been added
- Support for Audio Distortion has been added. More information can be found here.
-
The
-
Model Improvements
- Enhanced detection of “spelled-out” entities, commonly occurring in call transcripts (e.g., “g as in golf, a as in apple, r for red, y for yellow” , “G-A-R-Y” )
-
Improvements to
DURATION
detection in English, developing multilingual support (with a focus on Spanish, German, and Dutch) - Improved PII / PHI detection in healthcare data, in particular: single word responses in patient forms and DICOM attributes
- Support added for Irish eircode (postal code) detection in English text
-
Improvements to PHI detection in Dutch, English, Italian, Ukrainian (focused on:
CONDITION
,BLOOD_TYPE
,INJURY
) -
Improved detection of German, Korean, and Italian
LOCATION
s and addresses
3.5.0 (2023/11/14)
What’s new in 3.5.0?
-
Company Confidential Information Preview
- This new feature allows users to detect and redact company confidential information. It is enabled through entity configuration in this release. Please reach out to support@private-ai.com for more information!
-
General Improvements
- Improved OCR performance (again!)
- Office file support improvement (Doc / DocX, PPT / PPTX etc.)
- The container OpenAPI spec now generates a v2 compatible schema for a seamless integration with API tools
-
New Beta Entities
- We've introduced beta entities which capture Confidential Company Information (CCI). Please see our Supported Entities page for more information on how to enable these classes and what they cover.
3.4.3 (2023/10/25)
What’s new in 3.4.3?
-
General Improvements
- Improved OCR performance
- Improved Japanese OCR image / file redaction
- The multilingual model is now auto-selected if a non-English language is detected and the English model is not explicitly selected
- General performance improvements
-
Model Improvements
-
Tagalog: Improvements in accuracy for
PHONE_NUMBER
detection -
English: Improvements in accuracy for PCI classes (in particular:
CREDIT_CARD_EXPIRATION
,CVV
,ROUTING_NUMBER
) and other numerical classes (ACCOUNT_NUMBER
,NUMERICAL_PII
,PHONE_NUMBER
,VEHICLE_ID
) -
Added support for DUNS number detection (classified as
NUMERICAL_PII
)
-
Tagalog: Improvements in accuracy for
3.4.2 (2023/10/11)
What’s new in 3.4.2?
-
General Improvements
- Doc / Docx files now process contents within tables
- Additional configuration with best label matching is now available in the Process text endpoint. Find more details on Enable Non-Max Suppression in the process text documentation.
-
Model Improvements
-
Improved detection of
ACCOUNT_NUMBER
entity, particularly in contexts where it may be ambiguous with other numerical classes such asBANK_ACCOUNT
andCREDIT_CARD
-
Improved detection of
3.4.1.1 (2023/10/04)
What’s new in 3.4.1.1?
-
Model Improvements
-
Improved detection of
NUMERICAL_PII
andMONEY
entities related to cryptocurrency wallet IDs, transaction hashes, and cryptocurrency names / amounts
-
Improved detection of
2.14.6 (2023/10/02)
What’s new in 2.14.6?
-
General Information
- Please note that this release is for legacy users only and is NOT for users already on V3 of Private AI
-
Model Improvements
-
Improved PCI detection (in particular,
CREDIT_CARD
s) in French
-
Improved PCI detection (in particular,
3.4.1 (2023/09/22)
What’s new in 3.4.1?
-
New Language Support
- We now provide extended support for Cantonese
-
Model Improvements
-
Improvements to PII detection in Dutch, with particular attention to
SSN
(Burgerservicenummer / Citizen Service Number and the Belgian NISS) andNUMERICAL_PII
such as organization numbers ( e.g. , Ondernemingsnummer, Identificatienummer) and VAT numbers ( e.g. , BTW Identificatienummer, BTW Nummer)
-
Improvements to PII detection in Dutch, with particular attention to
3.4.0 (2023/09/15)
What’s new in 3.4.0?
-
New Language Support
- We now provide Core Support for Dutch and Japanese
- Extended Support has also been added for Afrikaans
-
General Improvements
- DICOM file support is now available
- PNG file support is now available
- BMP file support is now available
- XML file support has been improved
- Audio support has been improved and can now be deployed in a single container
-
Model Improvements
- Improvements to multilingual PII detection, with a particular focus on PCI entity types, in: French, German, Spanish, and Portuguese
-
Fine-tuning of recently-added classes:
NAME_MEDICAL_PROFESSIONAL
andORGANIZATION_MEDICAL_FACILITY
3.3.4 (2023/09/02)
What’s new in 3.3.4?
-
General Improvements
- Improved OCR support and general performance improvements with PDFs
- General Office document support improvements
- webm format support for audio files
3.3.3 (2023/08/15)
What’s new in 3.3.3?
-
General Improvements
- General performance improvement and reduced memory footprint
- Various library updates based on security recommendations
- File processing now supports disabling entities being returned in response
-
New Entity Types
-
NAME_MEDICAL_PROFESSIONAL
: detects the names and professional titles of medical professionals such as doctors and nurses (e.g., Dr. Kay Martinez, MD ) -
ORGANIZATION_MEDICAL_FACILITY
: detects the names of medical facilities such as hospitals and clinics (e.g., Victoria General Hospital , Union Family Health Clinic )
-
-
Model Improvements
- Improved detection of PII in medical records and in .xml processed as plain text
-
Improved detection of
ACCOUNT_NUMBER
, particularly in French -
Improved detection of
HEALTHCARE_NUMBER
in English
3.3.2 (2023/07/12)
What’s new in 3.3.2?
-
General Improvements
- Significant performance improvement with OCR related tasks
- Image blurring has improved significantly
3.3.1 (2023/07/12)
What’s new in 3.3.1?
-
General Improvements
- Various library updates based on security recommendations
3.3.0 (2023/07/12)
What’s new in 3.3.0?
-
General Improvements
- File redaction for PDFs responds with numbered entities for the entire document rather than per page.
- PDF and image processing have speed improvements on the GPU container
- Doc / DocX file processing now returns redacted main file contents in response
- General updates to libraries based on security recommendations
-
Model Improvements
- General improvements to PII detection in: English, French, Japanese, Korean, Portuguese, Russian, Tagalog, Ukrainian
- Improved detection of numerical classes in: English, Korean, Spanish, Russian
- Improved detection of English PHI Classes: English
-
Improvements to the
ACCOUNT_NUMBER
entity in English and Spanish
3.2.1 (2023/06/03)
What’s new in 3.2.1?
-
General Improvements
- The Re-identification route has been improved to handle additional use cases.
-
New Language Support
- Extended support has been added for Bambara
3.2.0 (2023/05/25)
What’s new in 3.2.0?
-
New Features
- Re-identification endpoint now available. This endpoint allows a user to pass previously de-identified text to be re-identified. Further details on how to use this new endpoint can be found on the API Reference
-
You can now configure our solution to redact only entities protected by Japan's
Act on the Protection of Personal Information (APPI)
or APPI's sensitive personal data designation. See our documentation for details on
how to implement
and our
supported entities list
for the entities covered by
APPI
andAPPI_SENSITIVE
-
Model Improvements
-
Improved detection of numerical entity classes in English (e.g.,
BANK_ACCOUNT
,ACCOUNT_NUMBER
,CREDIT_CARD
,CREDIT_CARD_EXPIRATION
) -
Improved precision in detecting PHI classes in English (e.g.,
CONDITION
,DOSE
,DRUG
, andMEDICAL_PROCESS
) - Improved PII & PCI detection in Japanese, Polish, Portuguese, Russian, Spanish, Ukrainian
-
Improved detection of numerical entity classes in English (e.g.,
-
Better Image and PDF Processing (Again!)
PDF and image processing has once again been improved performance-wise.
-
New File Formats
The following file formats are now supported in the
/process/file/uri
andprocess/file/base64
endpoints:- .eml
- .txt
- .xls / .xlsx
- .ppt
3.1.1 (2023/04/18)
What’s new in 3.1.1?
-
New Entity Types
-
ACCOUNT_NUMBER
captures the number associated with a client’s account (e.g., Policy No. 10042992 , Member ID: HZ-5235-001 ) -
DURATION
captures mentions of periods of time, specified as a number and a unit of time (e.g., 8 months , 2 years )
-
-
New Language Support
- Added Core Support for Mandarin (simplified script)
-
Model Improvements
-
Improved detection of PCI classes in English, including optimization for South African English, Italian, Spanish (in particular:
BANK_ACCOUNT
,CREDIT_CARD
) - Improved detection of PHI classes in English
- Improved detection of PII in English clickstream data sets
- Improved detection of PII in Mandarin (simplified), Tagalog, French
-
Improved detection of PCI classes in English, including optimization for South African English, Italian, Spanish (in particular:
2.14.5 (2023/04/18)
What’s new in 2.14.5?
-
Model Improvements
-
Improved detection of PCI classes in English, including optimization for South African English, Italian, Spanish (in particular:
BANK_ACCOUNT
,CREDIT_CARD
) - Improved detection of PHI classes in English
- Improved detection of PII in English clickstream data sets
- Improved detection of PII in Mandarin (simplified), Tagalog, French
-
Improved detection of PCI classes in English, including optimization for South African English, Italian, Spanish (in particular:
3.1.0 (2023/04/03)
What’s new in 3.1.0?
-
New File Formats
The following file formats are now supported in the
/process/file/uri
andprocess/file/base64
endpoints:- .doc
- .docx
- .xml
- .json
-
Language Detection
The
/process/text
endpoint returns alanguage_detected
attribute which specifies ISO 639-1 language labels in the response. For more information, please have a look at the process text documentation -
Better Image and PDF Processing
PDF and image processing has been greatly improved in both accuracy and throughput performance.
-
Model Improvements
-
Improved detection of PCI and other numerical classes in English (in particular:
CREDIT_CARD
,CREDIT_CARD_EXPIRATION
,CVV
,HEALTHCARE_NUMBER
,VEHICLE_ID
) -
Improved detection of PCI classes in French and Spanish (in particular:
BANK_ACCOUNT
,CREDIT_CARD
,CREDIT_CARD_EXPIRATION
,CVV
)
-
Improved detection of PCI and other numerical classes in English (in particular:
3.0.0 (2023/03/12)
We are proud to announce the 3rd major version of Private AI's solution. Note that 3.0 does not maintain backwards compatibility. Instead, Private AI will continue to do 2.X releases with updated models and potential security fixes until 3 months after this release.
What’s new in 3.0?
Starting with 3.0, we will be distributing our container exclusively through the Azure Container Registry. Login credentials and sample commands to download the container image can be found in the customer portal and will look like:
docker login -u INSERT_UNIQUE_CLIENT_ID -p INSERT_UNIQUE_CLIENT_PW crprivateaiprod.azurecr.io
-
Licensing Change
We have changed our licensing system from an API Key to a license file. In order to run the container with the license file, run the following:
docker run --rm -v "full path to license.json":/app/license/license.json \ -p 8080:8080 -it crprivateaiprod.azurecr.io/deid:<version>
Once you have the container up and running with the new license file, you can run send the container a request like this:
curl --request POST --url http://localhost:8080/v3/process/text --header 'Content-Type: application/json' \ --data '{"text": ["Hello John"]}'
-
New API Interface
3.0 introduces many changes to the API, please see the new API Reference for details. Key changes:
-
deidentify_text
is now called/v3/process/text
-
Endpoints in general now follow the standard of
process/type/subtype
-
text
field is required to be a list by default, even with a single string -
key
field has been removed from the body and is now in the request header:X-API-KEY
. It is only required when using our cloud API -
accuracy_mode
is now calledaccuracy
and can be found one layer down in theentity_detection
dictionary settings -
return_entities
parameter allows you to configure whether to include identified entities in the response -
unique_pii_markers
has been removed. Instead, please setpattern
inside the marker parameters toBEST_ENTITY_TYPE
-
Entity
is established in nomenclature to recognize PII, PHI, PCI
Example conversions from V2 request payload to 3.0:
### Example with enabled_classes ### 2.0: {"text": "Hello there John!", "key":<My_api_key>, "accuracy_mode":"high", "enabled_classes":["NAME"] } 3.0: {"text": ["Hello there John! I live in Newark"], "entity_detection": {"accuracy": "high", "entity_types": [{"type": "ENABLE", "value":["NAME"]}] } } ----------------------------------------------------------------------------------------- ### Example with inclusion of all entity types in entity marker ### 2.x: {"text": "Hello there Pieter!", "key":<My_api_key>, "accuracy_mode":"standard", "marker_format": "[ALL_CLASS_NAMES]" } 3.0: {"text": ["Hello there Pieter!"], "entity_detection": {"accuracy": "standard"}, "processed_text": {"type": "MARKER", "pattern": "[ALL_ENTITY_TYPES]"} } ----------------------------------------------------------------------------------------- ### Example with disabling unique_pii_markers through MARKER definition ### 2.0: {"text": "Hello there Paul!", "key":<My_api_key>, "accuracy_mode":"high_multilingual", "unique_pii_markers": false } 3.0: {"text": ["Hello there Paul!"], "entity_detection": {"accuracy": "high_multilingual"}, "processed_text": {"type": "MARKER", "pattern": "[BEST_ENTITY_TYPE]"} }
-
-
File Support for Audio / PDFs / Images
3.0 supports file redaction using an unified endpoint, which works either with URIs or base64-encoded files:
/v3/process/files/uri
and/v3/process/files/base64
. Please see the Quickstart Guide for details. -
Application version endpoint
Sending a GET request to the container root endpoint
http://container-address:8080
will return a response providing information about the application version:{"app_version": "3.0.0"}
-
Synthetic Entity Generation
Synthetic entity generation is now supported across each language Private AI supports.
Quality of generated entities has been improved, particularly around matching the formatting and length of the original entity.
-
Environment Variables
All previous environment variables are now prefixed with “PAI” to better differentiate PAI specific variables. You can find the full list of environment variables in Environment Variables.
-
PII Metrics
In 3.0, non-airgapped users can enable PII metrics gathering for reporting purposes. In order to do this, add
PAI_ENABLE_PII_COUNT_METERING=True
as an environment variable. You'll be able to see the number of PII captured by your license usage and we will be further improving this feature to provide you with a granular view on entity types captured and other reporting features.Please note that this feature is OFF by default and requires explicit configuration to gather this data. Any usage prior to enabling this feature is NOT captured and cannot be reported on retroactively.
2.14.3 (2023/03/07)
-
Improvements to numerical entity detection and classification, specifically:
NUMERICAL_PII
,BANK_ACCOUNT
,PHONE_NUMBER
,CREDIT_CARD
,CREDIT_CARD_EXPIRATION
andCVV
. - Improvements to PII detection within ASR transcripts, including variable casing (lower/upper/sentence case) for named entities.
- Improvements to ORGANIZATION detection.
- Better recognition of emergency phone numbers.
- GPU container image size has been reduced.
2.14.2 (2023/01/17)
-
Improvements to
PHONE_NUMBER
detection, particularly in ASR transcripts in which entities may have unusual formatting. -
Improvements to
CREDIT_CARD
detection in ASR transcripts, which may contain spelling and formatting anomalies. - Optimizations for detecting PII entities in HR documents, such as CVs and resumes.
- General improvements to PII detection in Spanish text.
- Resolved an issue where redaction markers in previously redacted data were sometimes captured as PII.
-
The trailing period in company names such as
ACME Co.
are now included in the entity.
2.14.1 (2022/11/30)
- Improved PCI detection in French and Spanish
-
/
,\
and$
characters are no longer stripped from entities. For example,Visit us at facebook.com/user123/
is now redacted asVisit us at [URL_1]
instead ofVisit us at [URL_1]/
. - Tuned RAM check thresholds for machines with 8GB RAM.
- Language Support: Added Extended Support for Japanese .
2.14.0 (2022/11/11)
What’s new in 2.14.0?
-
New Language Support
The following languages have been added to Extended support:
- Luxembourgish
- Swahili
-
Entity Types
-
NAME_GIVEN
, which encompasses name(s) given to an individual, usually at birth, often first/middle names in Western cultures, middle/last names in Eastern cultures. -
NAME_FAMILY
, which encompasses names indicating a person’s family or community, often a last name in Western cultures, first name in Eastern cultures. -
MEDICAL_MISC
entity type has been deleted.
-
Improvements
-
Improved Models
- Improved detection of names spelled out in all caps by ASR systems.
-
NAME
: Improved name subclass detection / classification in English. -
EMAIL_ADDRESS
: More robustness around partial / unformatted emails in English. -
CREDIT_CARD
: improvement around mentions of the last 4 digits only in English. - Enhanced detection of NAMEs and other entities when spelled-out in a transcript (e.g., “c as in charlie …”)
- Improvements to detection of PASSWORD, including verification answers
- Improved handling of eponymous medical conditions in English.
- Improvements to PHI detection in English.
- Improvements to PHI detection in Spanish.
-
Improvements to all personal number classes such as
PASSPORT
,CREDIT_CARD
andSSN
including international variants in French, German, Italian, Tagalog and Ukrainian. - Improved PII detection in text containing facerolls and typos.
- Improvements to PII detection in Tagalog data containing profanities / toxic material.
- Improved detection of ambiguous LOCATION / ORGANIZATION mentions, as well as ambiguous NAMEs
- Improved PII detection in text containing control characters
- General improvements to:
- Russian
- Spanish
-
Miscellaneous
- Container startup memory check is now performed on container start, instead of after loading models
- Fixed handling of null strings
2.13.1 (2022/09/26)
-
Emoji Improvements
Processing of non-English text containing emojis has been improved
2.13.0 (2022/09/08)
What’s new in 2.13.0?
-
Second Generation Synthetic PII
This release features the debut of our second generation synthetic PII system. The system has been rebuilt from the ground up and leverages a new approach developed by Private AI. The new system features the following improvements:
- Increased PII realism, including greater variety of generated terms and less generation of common terms such as "John" or "Paul".
- Better generation of numerical PII, particularly around the correct number of digits.
Note that the CPU containers are now approximately 700MB larger due to this change and that the new synthetic PII system is slower than the first generation. Private AI will be releasing optimizations for both container size and processing time in subsequent releases, along with GPU support.
-
New Language Support
The following languages have been added to Extended support:
- Belarusian
- Icelandic
- Indonesian
- Khmer
- Thai
We have also added Beta support for Japanese.
-
New Entity Types
-
NAME_GIVEN
, which encompasses name(s) given to an individual, usually at birth, often first/middle names in Western cultures, middle/last names in Eastern cultures. -
NAME_FAMILY
, which encompasses names indicating a person’s family or community, often a last name in Western cultures, first name in Eastern cultures.
-
-
Disable GPU
-
PAI_DISABLE_GPU_CHECK
allows users to disable the startup check for GPU on the container and run the GPU container using CPU only.
-
Improvements
-
Best Label Calculation
The best label calculation has been updated to prefer the most granular entity type. For example,
Hello John
will becomeHello [NAME_GIVEN]
instead ofHello [NAME]
. Similarly,I live in Toronto
will beI live in [LOCATION_CITY]
instead ofI live in [LOCATION]
. When an entity spans multiple words that have additional, nested labels, the existing behaviour is retained: namely, the most general entity type, covering the entire span, is used. For example,Hello John Doe
will beHello [NAME]
andI live in Toronto, Canada
will beI live in [LOCATION]
. -
Improved Models
This release features a number of PII detection improvements:
- Further improvements to the character-level recognition that was introduced in 2.12.
-
False Positive reduction for
CONDITION
,DRUG
,MEDICAL_PROCESS
in English. -
CREDIT_CARD
,PHONE_NUMBER
,EMAIL_ADDRESS
,BANK_ACCOUNT
,PASSPORT_NUMBER
,SSN
improvements in Spanish. -
CONDITION
,DRUG
,MEDICAL_PROCESS
in Spanish. -
NAME
,LOCATION
,ORGANIZATION
,POLITICAL_AFFILIATION
improvements in German, French, Italian and Polish. - Improved performance across all entity types in Tagalog.
-
Miscellaneous
Improved log messages on container startup.
2.12.0 (2022/07/27)
What’s new in 2.12.0?
-
New Inference Pipeline
This release features the debut of our new inference pipeline. The main feature of the new pipeline is that it is able to operate on non-whitespace separated text. This has a number of benefits, including better performance around punctuation and control characters and enables new languages, such as Mandarin (simplified).
-
Prometheus Endpoint
A Prometheus metrics endpoint is now available at
/metrics
. See the API reference for details. -
New Language Support
The following languages have been added to Core support:
- Ukrainian
- Hindi
In addition to this, we have added Extended support for the following 5 languages:
- Estonian
- Malay
- Punjabi
- Tamil
- Vietnamese
We have also added Beta support for Mandarin (simplified)
Improvements
-
Improved Models
This release features a number of PII detection improvements:
-
German
NUMERICAL_PII
detection has been improved. - Improved performance on medical questionnaires and customer onboarding forms.
- Multilingual chat performance has been improved, particularly in Spanish.
- Postal address detection performance has been improved for addresses in the United Kingdom, Australia and New Zealand.
-
PASSWORD
andCVV
detection performance has been improved. - PHI Attributes / symptoms detection has been improved.
- General improvements for EHRs and ASR transcripts.
-
German
-
Security Patch
Several updates to container image dependencies and Python libraries have been updated to address security recommendations
2.11.1 (2022/06/01)
Improvements
-
Security Patch
Several libraries received patch updates to address security recommendations and have been included in this release.
-
Improved Models
Improvements have been made to the detect instances of medical entities such as
CONDITION
,INJURY
andMEDICAL_MISC
.Improvements have been made to
NUMERICAL_PII
, particularly in multilingual models -
Container Options
Allow startup resource check to be disabled.
2.11.0 (2022/05/10)
What’s new in 2.11.0?
-
New language support
Tagalog has been moved from extended to core support. For the full list, please see the supported languages page.
-
New Entity Types
VEHICLE_ID
has been added in this release. This entity type covers vehicle identification numbers such as license plate numbers, vehicle serial and vehicle identification numbers.
Improvements
-
Model Improvements
PII detection error has been reduced by approximately 10%, particularly around
CREDIT_CARD
,CREDIT_CARD_EXPIRATION
andCVV
. Australian and New Zealand address recognition have also been improved.Performance on disfluent ASR transcripts (particularly around passwords), chat logs and medical patient records has been improved.
CPU model processing speed has increased by approximately 8%, whilst GPU processing speed has been improved by up to 35%, depending on the chosen accuracy mode.
-
Service health monitoring
The
/healthz
endpoint is more robust for detecting the overall health of the API service. -
Improved error messages
Error messages when either the key or text fields are missing are now more specific.
-
Security updates
Libraries have been updated based on security recommendations from our regular vulnerability scans.
-
Documentation revamp
Our public documentation has been updated to include new guides, updated install instructions and sample configurations.
-
Other
ENABLED_CLASSES
can now be set via an environment variable, similar toLOG_LEVEL
.
2.10.0 (2022/03/14)
What’s new in 2.10.0?
-
33 Supported Languages
Our system can now detect PII in 33 different languages, with more coming soon. For the full list, please see the supported languages page.
-
New Entities
2.10 includes the following new entities:
-
GENDER_SEXUALITY
: Terms indicating gender identity or sexual orientation, including slang terms. E.g.: “female”, “bisexual”, “trans” -
MARITAL_STATUS
: Terms indicating marital status. E.g.: “single”, “common-law”, “ex-wife”, “married” -
LOCATION_COORDINATE
: A subclass of LOCATION. A geographic position referred to using latitude, longitude, and/or elevation coordinates. E.g.: “We’re at: [40.748440 and -73.984559] ”
The
NUMERICAL_PII
class now includes MAC addresses and cookie IDs.These entities will be listed in the docs shortly.
-
-
Complete HIPAA Support
With this release we have complete support for all the entities listed under the HIPAA Safe Harbor rule. Health plan beneficiary numbers and medical record numbers have been added to
HEALTHCARE_NUMBER
, whilst medical device serial numbers have been added toNUMERICAL_PII
. -
Entity Sets
enabled_classes
now supports entity sets. This way, you can simply include the name of the regulation that you want to comply with, and we will enable the entities that are listed in that regulation for you. The regulations that are implemented in this release are:- GPDR
- CPRA
- HIPAA
- Quebec Privacy Act
- PCI
Example command:
curl -X POST localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "Hi Anwar", "key": "<customer key>", "enabled_classes": ["GDPR"]}'
The docs will be updated to include this functionality and the entities included in each entity set shortly.
-
Docker image version logging
We are now logging the version of the docker image in our logs. This allows us to provide better customer support based on the version of that is in the logs.
Improvements
-
Better models
2.10 features improved PII detection models, particularly around credit card numbers, verification codes, social security numbers, US postal addresses, email addresses in emails and resumes.
-
TIME entity adjustment
We have adjusted the
TIME
entity to no longer include ASR transcript timestamps. -
Better API error messages
In order to improve error handling and make debugging easier, we have reworked our API error messages to be more detailed and understandable. Error messages (but not potentially sensitive payloads) are now also logged to console.
-
Redaction marker label calculation
We have improved how the redaction marker that is used in the redacted text is calculated.
-
Resource validation system
In 2.9 we introduced checks that validate that the container has been provided with enough resources. In this release, we have further expanded and improved these checks to be able to detect memory and GPU resources more accurately.
-
Health check system
We have improved the health check endpoint in the GPU build to return the health of the GPU inference engine as well.
Improve the process monitoring inside the GPU build to eliminate the possibility of having dead containers that are still running.
We have updated the health check route in the CPU build to be completely asynchronous.
-
RAM usage
The container printed the RAM usage on every API call. This has now been moved to ‘debug’ log level.
-
Docs
We have added a new page in our documentation title “Deployment Considerations”, which aims to help users on how to deploy the docker image on production environments.
Other notable changes are:
- Adding a new page that lists supported languages
- Update the list of supported entities
-
Web Demo
We have made a small improvement to the UI of the web demo by changing the model options from a drop down list to radio buttons.
Web demo now has unique PII markers disabled by default. This change will be reflected in the upcoming API refactor.
2.9.1 (2022/02/24)
-
Logging Improvements
RAM usage is now logged on debug level instead of info
-
Container Health
healthz
route latency improvedDocker container health check has been implemented, for improved AWS ECS use
2.9.0 (2022/01/18)
What’s new in 2.9.0?
-
New PII Classes
Passport numbers are now recognized as a separate entity type,
PASSPORT_NUMBER
instead ofNUMERICAL_PII
.POLITICAL_AFFILIATION
has been added and covers terms referring to a political party, movement, or ideology (e.g., Republican, liberal)We now support IPv6 address deidentification as well in addition to IPv4 addresses. Any IPv6 address that is found in the text will be labelled as
IP_ADDRESS
. -
Container Startup Resource Validation
Based on our user feedback, we have implemented a hardware resource validation that runs on container startup. This implementation validates that the container has access to an NVIDIA GPU and/or enough RAM on startup. If the implementation fails to validate these requirements, it prints a helpful and detailed error message (rather than the default “Killed” message printed by Docker) which guides the user on how to solve these resource related issues.
-
Docker Hub Repository
Starting with release 2.9.0, the container can be pulled from a private Docker Hub repository. Please contact us if you would like to receive the container via this repository, instead of the existing encrypted Docker image export.
Improvements
-
Model Improvements
This release includes improved models. Improvements include:
- Better performance on ASR system transcripts, particularly around disfluencies
- Improved Driver License detection
- Better performance on SMS message style conversations
-
Improved Documentation
We have spent some time improving our documentation as well. The noteworthy improvements are:
- The table of contents is now more clear and easier to navigate.
- A new detailed introduction page.
- Detailed installation instructions.
- Updated API reference.
- Updated Web Demo to showcase Multilingual PII Redaction and Synthetic Personal Data Generation in addition to English PII Redaction.
-
Fixes
We fixed an issue where the built-in labels that use regex patterns would override the custom labels defined in
block_list
.We tuned the models to fix an issue where some non-PII words that are following PII words would be labelled as part of the PII word.
We have removed a warning message that would show up on container startup due to an internal library incorrectly assessing the ML dependencies.
Synthetic PII generation now works when the custom
block_list
feature is used.
2.8.0 (2021/12/20)
What’s new in 2.8.0?
-
New Entity Types
Added
DRIVER_LICENSE
entity type. Driver's licenses will now be picked up in this class instead of `NUMERICAL_PII.
Improvements
- Improved backup authentication mechanism fail-over logic.
- Updated API server. This was a dependency and security upgrade.
- GPU inference server errors now return 500 instead of 503.
Deprecation Notice: We’ve rearranged the plumbing on our authentication system. Releases prior to 2.3.0 will no longer authenticate as of 31st December 2021.
2.7.1 (2021/10/28)
-
Linked Batch Processsing
This release adds the
link_batch
option. When enabled, batch inputs will be joined together internally in the Private AI inference engine, to share context between the different inputs. This is useful when processing a sequence of short inputs, such as an SMS chat log. Please visit the API reference for implementation details.
2.7.0 (2021/10/29)
What’s new in 2.7.0?
Breaking change: The default accuracy mode has been changed from standard
to high
.
-
Added the
LOG_LEVEL
environment variable, which controls logging verbosity. The environment variable can be set toinfo
,warning
orerror
. Default isinfo
.
Improvements
-
Model Improvements
This release features improved PII detection models:
- Numerical PII detection has been further refined, particularly around SSNs and credit card numbers
- Further improvements for chat transcripts
- Further improvements for OCR documents, particularly receipts
- Further improvements for JSON files
-
Authentication
The backup authentication mechanism has been moved to a completely new system, improving redundancy
-
Usage Reporting
The
get_usage
route now returns the current month's usage, instead of current week.
2.6.1 (2021/09/27)
-
Improved Models
This release fixes phone numbers and credit card numbers occasionally being detected as SSNs. Additionally, performance around ASR transcripts and the various ways they transcribe numbers was improved
2.6.0 (2021/09/21)
Improvements
-
Improved Models
This release features improved PII detection models, particularly surrounding English and Portuguese.
Optimizations for a number of popular ASR systems have been added in this release. In particular, the optimizations cover how the systems transcribe numbers.
2.5.0 (2021/08/20)
What’s new in 2.5.0?
-
New Entity Types
The
DATE
class has been split intoDATE
andDATE_INTERVAL
.DATE_INTERVAL
covers broader references such as 'last summer', whilstDATE
remains targeted as specific references like '21/8/2019' -
Batch Processing
Support for batch processing has been added. To use batch processing, simply submit a list of text strings:
curl -X POST http://localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": ["My password is: 4XDX63F8O1", "My password is: 33LMVLLDHNasdfsda"], "key": <key>}'
Improvements
-
Multilingual Improvements
This release features improved PII detection models, particularly surrounding English, Italian and Korean.
-
Image Size
Container image size has been further reduced.
2.4.0 (2021/07/21)
What’s new in 2.4.0?
-
Custom Redaction Markers
Added support for custom redaction markers.
-
Allow Lists
Added support for allow lists - any entities matching entries in the allow list will be discarded.
-
New Entity Types
Added new location classes:
-
LOCATION_ADDRESS
: A street address, e.g. '48 Bristol Ave, 6157, Perth, Australia' -
LOCATION_CITY
: A city, e.g. 'Perth' or 'Toronto' -
LOCATION_COUNTRY
: A country, e.g. 'Spain' -
LOCATION_ZIP
: A zip or postal code, e.g. '10405' -
LOCATION_STATE
: A reference to a state within a country, e.g. 'California'
NOTE: These entities are subclasses of
LOCATION
- theLOCATION
label remains unchanged and will appear along with the above entities -
Improvements
-
Model Improvements
This release features improved PII detection and synthetic PII generation models, particularly surrounding Spanish, Italian and Korean.
-
Phone Number Improvements
Improved phone number post-processing, particularly around bracket handling and '+' in international dialling codes
-
Best Label Calculation
Improved automatic calculation of the number of processing threads to use whilst executing the ML models.
2.3.1 (2021/07/16)
-
CPU Performance Improvement
Patch release to address CPU utilisation
2.3.0 (2021/06/25)
What’s new in 2.3.0?
-
New Languages
Added support for Korean
-
New Entity Types
Added
ROUTING_NUMBER
, which is a number associated with a bank or financial institution (e.g., 012345678).Added
BANK_ACCOUNT
, which is a bank account or bank card number (e.g., 012345-67).
Improvements
-
Improved Models
This release features improved PII detection models, trained on ~50% more data than 2.2.0.
We have improved PHI detection performance. More to come in the next release.
-
Authentication
This release now authenticates with our revamped authentication system. No changes on the user side are required.
2.2.2 (2021/06/03)
-
New Accuracy Mode
Added a new accuracy mode that is approximately 4x faster than
standard
. In order to use this model, please setaccuracy_mode
tofast
.
2.2.1 (2021/05)
-
Improved Models
Improved
SSN
detection in ASR transcriptsImproved PHI detection
2.2.0 (2021/04/29)
What’s new in 2.2.0?
-
Multilingual Support
This release adds support for Spanish, French, Italian, German and Portuguese. To enable it, please see the API Reference for details
-
Synthetic PII Generation
Beta release of synthetic PII generation. In addition to identifying and redacting PII, Private AI can now also generate synthetic PII. To try it out, please set
fake_entity_accuracy_mode
tostandard
:$ curl -X POST http://localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "so, it expires the 1st; and the 3 digits on the back", "fake_entity_accuracy_mode": "standard", "key": <key>}' { "result": "so, it expires the [CREDIT_CARD_EXPIRATION_1]; and the 3 digits on the back", "result_fake": "so, it expires the 20th; and the 3 digits on the back", "pii": [ { "marker": "CREDIT_CARD_EXPIRATION_1", "text": "21st", "best_label": "CREDIT_CARD_EXPIRATION", "stt_idx": 19, "end_idx": 23, "labels": {"CREDIT_CARD_EXPIRATION": 0.8895}, "fake_text": "20th", "fake_stt_idx": 19, "fake_end_idx": 23 }, ], "api_calls_used": 1, "output_checks_passed": true }
Improvements
-
Customizable API Port
API port can now be customized. See the Environment Variables page for details.
Health check port is now on port 8080, same as the main deidentify_text route
-
Revamped API Serving
The API serving infrastructure has been completely rebuilt
Shortened authentication request timeout
2.1.3 (2021/04/13)
-
Improved Models
Improved credit card handling in ASR transcripts
2.1.2 (2021/03/02)
-
Added
ZODIAC_SIGN
, which covers Zodiac Signs such as "Aries" or "Taurus". -
This release features improved PII detection, particularly surrounding
SSN
,DOB
andNUMERICAL_PII
. -
Added passport numbers, vehicle license plate numbers and vehicle serial numbers to
NUMERICAL_PII
. -
Passport numbers and vehicle serial numbers are now recognised as
NUMERICAL_PII
.
2.1.1 (2021/02/26)
-
Improved Models
Further PII detection improvements targeted at numerical entity detection.
2.1.0 (2021/02/18)
Improvements
-
Improved Models
This release improves PII detection accuracy, via model updates and improved training data.
Additionally an improvement was made in an edge case where model output is highly ambiguous.
2.0.1 (2021/01/25)
-
Improved Models
Improved PII detection models.
-
Reduced Image Size
Further reduced Docker image size.
2.0.0 (2021/01/14)
What’s new in 2.0.0?
-
Revamped API
The 2.0.0 release features a revamped API interface, based on recent customer feedback
-
New Entity Types
New entity types:
-
FILENAME
: Name of a computer file, e.g., bradtaxreturns.txt, koalabear.jpg -
ORIGIN
: Origin encompasses nationalities, ethnicities, and races. E.g., Canadian, american, caucasian
Added PHI entity types:
-
BLOOD_TYPE
: Blood type, e.g., O- -
CONDITION
: A medical condition. Includes diseases, syndromes, deficits, disorders. E.g., chronic fatigue syndrome, arrhythmia, depression. -
DRUG
: Medical drug, including vitamins and minerals. E.g., Advil, Acetaminophen, Panadol -
INJURY
: Human injury, e.g., I broke my arm, I have a sprained wrist. Includes mutations, miscarriages and dislocations. -
MEDICAL_PROCESS
: Medical process, including treatments, procedures and tests. E.g., ‘heart surgery’, ‘CT scan’. -
PHYSICAL_ATTRIBUTE
: A body attribute, e.g. I’m 190cm tall. -
STATISTICS
: How many people in a specific country have the disease or what percentage of people were cured of a disease, for example. E.g., 20 percent of people have arrythmia
-
Improvements
-
New Inference Engine
New inference engine, which is significantly faster than previous releases
-
Reduced Container Image Size
Docker image size has been drastically reduced
1.5.1 (2020/12/08)
-
Improved Models
Improved credit card number and SSN detection in chat logs.
1.5.0 (2020/11/19)
What’s new in 1.5.0?
-
New Accuracy Mode
The previous
standard
accuracy model is nowfast
. In it’s place, we have introduced a new model ~2x slower but with far better performance.
Improvements
-
Improved Models
Improved model accuracy via additional training data.
-
Runtime Performance Improvements
Reduced latency by ~15% on
fast
mode. 60ms to 52ms on our single core GCP N2 Cascade Lake test instance.Dramatically reduced RAM usage for all models.
Reduced Docker image size.
1.4.2 (2020/11/6)
-
Phone Number Improvements
Improved support for 7 digit phone numbers
1.4.1 (2020/11/4)
-
SSN Improvements
Improved
SSN
detection
1.4.0 (2020/10/23)
What’s new in 1.4.0?
-
New Entity Types
Added
DOB
entity type, which covers Date of Birth (e.g., Date of Birth: March 7, 1961)Added
CVV
, which covers credit card verification codes (e.g., CVV: 080)Added
CREDIT_CARD_EXPIRATION
, which is the expiration date of a credit card (e.g., Expires: 2/28)Added
PASSWORD
entity type, which covers account passwords, pins, access keys, or verification answers (e.g., 27%alfalfa, 1234)
Improvements
-
Improved Models
Adjusted entity types to give better per class accuracy.
Improved SSN and credit card detection.
-
Health Route
Added
last_auth_call_successful
intohealthz
response.
1.3.2 (2020/10/12)
-
Authentication
Added backup authentication mechanism.
1.3.1 (2020/10/05)
-
Large Input Handling
Improved handling of ultra large inputs (>100K words).
1.3.0 (2020/09/25)
What’s new in 1.3.0?
-
New Entity Types
Added
USERNAME
entity type, User name or handle (e.g., privateairocks, @_PrivateAI).Added
RELIGION
entity type, which covers terms indicating religious affiliation (e.g., Hindu).
1.2.0 (2020/08/14)
What’s new in 1.2.0?
-
New Entity Types
Added
AGE
entity type, which is a number or phrase associated with an age (e.g., 27) -
New Accuracy Mode
Added
best
accuracy mode. To use it, please setaccuracy_mode
tobest
.
1.1.0 (2020/07/05)
What’s new in 1.1.0?
-
Credit Card Number Support
Added support for credit card numbers
1.0.0 (2020/06/15)
Initial container release
For release notes older than 1.0.0, please contact us.