Deidentify Text

Remove identifiers from a string or multiple strings. You can find examples on how to deidentify text in several different scenarios in our examples repository.

*The try it console only works with the latest version.

Request
Request Body schema: application/json
required
required
string or Array of strings

UTF-8 encoded message(s) to de-identify. E.g. "My name is Adam" or ["I live at", "263 Spadina Av"]. Request processing increases linearly with input text length, therefore maximum length is dependent on provisioned hardware and any timeouts set by the user. Private AI has tested up to 500K characters on the CPU and GPU containers.

key
required
string (Key)

License key provided to you by Private AI. Note that this field will be moving to the API header in the upcoming API refactor.

unique_pii_markers
boolean (Unique Pii Markers)
Default: true

Specifies whether PII markers in the text should uniquely identify PII.

enabled_classes
Array of strings (Enabled Classes)

Controls which types of PII are removed. See Supported Entity Types for the list of possible entities.

allow_list
Array of strings (Allow List)

Any entities in this list will be ignored by the Private AI system. Note that this feature does not support regex patterns and the match is case-insensitive. If the allow list is ["maxim", "Kandeep"], possible matches that will be discarded are "maxim", "MAxim", "MAXIM", "kandeep", "kANdeep". It is also possible to set this option via environment variable. See Environment Variables

marker_format
string (Marker Format)

Specify a custom redaction marker format. The format must contain one of "CLASS_NAME" or "ALL_CLASS_NAMES" keywords. The "CLASS_NAME" keyword will be replaced by the entity class that best represent the entity while the keyword "ALL_CLASS_NAMES" will be replaced with all the labels applicable to the entity. These are valid examples of marker format: "<<CLASS_NAME>>", "-CLASS_NAME-", "[ALL_CLASS_NAMES]". It is also possible to set this option via environment variable. See Environment Variables

accuracy_mode
string (Accuracy Mode)
Default: "high"

Selects the model used to identify PII in the input text. By default, the "high" accuracy model is used. Whilst the models used by the Private AI solution are highly optimized (~25X faster than a reference transformer implementation), in high-throughput cases it is possible to trade accuracy for speed by selecting either the "standard" or "standard_high" accuracy modes. Multilingual support is available on the Scale & Pro tiers via the multilingual models, namely "standard_high_multilingual" (GPU container only) and "high_multilingual". The multilingual models process all supported languages including English, without the need to specify language. It is advisable to use the English-only models where possible, as they perform slightly better on English.

fake_entity_accuracy_mode
string (Fake Entity Accuracy Mode)

(Beta) Enable fake entity generation using the specified model. Currently this feature is in beta and only supports mode "standard". Note that this feature is only available on the Pro tier.

preserve_relationships
boolean (Preserve Relationships)
Default: true

(Beta) Specifies whether multiple instances of the same entity should have the same generated fake entity or not. For example, preserve relationships: "Hi John and Rosha, John nice to meet you" -> "Hi Harry and Alev, Harry nice to meet you". No preserve relationships: "Hi John and Rosha, John nice to meet you" -> "Hi Harry and Alev, Sulav nice to meet you". This field has no effects when fake_entity_accuracy_mode is not set.

link_batch
boolean (Link Batch)
Default: false

When set to true, batch inputs will be joined together internally in the Private AI inference engine, to share context between the different inputs. This is useful when processing a sequence of short inputs, such as an SMS chat log.

object (Block List)

The block list feature allows for PII detection functionality to be customized using Python regular expressions. It is possible to extend existing entity types or define new entity types.

Regex patterns are passed as a JSON object with the key representing a label and the value a regex pattern, for example {"CUSTOM_LABEL": "custom"}.

Since this feature uses regex patterns, you can either pass a word (e.g. the, word, custom, etc.) or you can pass a valid Python regex pattern. It is important to note that regex patterns require escaping the special characters when used in JSON objects. To give an example, if you would like to send the regex pattern r"\b\w{4}\b" which will catch every 4-character word, you need to send it as "\\b\\w{4}\\b". A complete JSON grammar is found here: https://www.json.org/json-en.html. More information on how to write a python regex is found here: https://docs.python.org/3/library/re.html

It is important to note also that only non-overlapping matches are returned.

Lastly,for supported entity types, if you would like the model to pick up only the entities specified from the block list, you can use the enabled classes feature together with the block list feature. This can be done by defining a list of enabled classes and not including the supported label you are adding to the block list. For example, if you would like the label "ORGANIZATION" to only pick up Microsoft, you can define the enabled classes as ["NAME", "LOCATION", "AGE", ...] (and omitting ORGANIZATION) and the block list as {"ORGANIZATION": "Microsoft"}.

block_list_max_likelihood
number (Block List Max Likelihood)
Default: 1
request_version
integer (Request Version)
Default: 1
Responses
200

Successful Response

400

Bad Request

500

Internal Server Error

4XX

Client Error

post/deidentify_text
Request samples
Response samples
application/json
{
  • "result": "string",
  • "result_fake": "string",
  • "pii": [
    ],
  • "api_calls_used": 0,
  • "output_checks_passed": true
}
© Copyright 2024 Private AI.