Deidentify Text

Remove identifiers from a string or multiple strings. You can find examples on how to deidentify text in several different scenarios in our examples repository.

Request
Request Body schema: application/json
required
required
Text (string) or Array of Text (strings) (Text)

UTF-8 encoded message(s) to de-identify.

key
required
string (Key)

License key provided to you by Private AI. Note that this field will be moving to the API header in the upcoming API refactor.

unique_pii_markers
boolean (Unique Pii Markers)
Default: true

Specifies whether PII markers in the text should uniquely identify PII.

enabled_classes
Array of strings (Enabled Classes)

Controls which types of PII are removed. See Supported Entity Types for the list of possible entities.

allow_list
Array of strings (Allow List)

Any entities in this list will be discarded. Note that this feature does not support regex patterns and the match is case-insensitive. If the allow list is ["maxim", "Kandeep"], possible matches that will be discarded are "maxim", "MAxim", "MAXIM", "kandeep", "kANdeep". It is also possible to set this option via environment variable. See Environment Variables

marker_format
string (Marker Format)

Specify a custom redaction marker format. The format must always contain "CLASS_NAME", which will be replaced by the entity class. E.g. "<<CLASS_NAME>>", "-CLASS_NAME-". It is also possible to set this option via environment variable. See Environment Variables

accuracy_mode
string (Accuracy Mode)
Default: "high"

Selects the model used to identify PII in the input text. By default, the "high" accuracy model is used. Whilst the models used by the Private AI solution are highly optimized (~25X faster than a reference transformer implementation), in high-throughput cases it is possible to trade accuracy for speed by selecting either the "standard" or "standard_high" accuracy modes. Multilingual support can be enabled by using one of the multilingual models, namely "standard_high_multilingual" (GPU container only) and "high_multilingual". The multilingual models process all supported languages including English, without the need to specify language. It is advisable to use the English-only models where possible, as they perform slightly better on English.

fake_entity_accuracy_mode
string (Fake Entity Accuracy Mode)

(Beta) Enable fake entity generation using the specified model. Currently this feature is in beta and only supports mode "standard".

preserve_relationships
boolean (Preserve Relationships)
Default: true

(Beta) Specifies whether multiple instances of the same entity should have the same generated fake entity or not. For example, preserve relationships: "Hi John and Rosha, John nice to meet you" -> "Hi Harry and Alev, Harry nice to meet you". No preserve relationships: "Hi John and Rosha, John nice to meet you" -> "Hi Harry and Alev, Sulav nice to meet you". This field has no effects when fake_entity_accuracy_mode is not set.

link_batch
boolean (Link Batch)
Default: false

When set to true, batch inputs will be joined together internally in the Private AI inference engine, to share context between the different inputs. This is useful when processing a sequence of short inputs, such as an SMS chat log.

object (Block List)

The block list features allows you to extend the functionality of the Private AI models by using regular expressions. This way, you can define a Python regex patterm that will be used to identify additional tokens with the given PII label.

The block list feature supports multiple regex patterns. These are passed as a JSON object with the key represening a label and the value a regex pattern, for example {"CUSTOM_LABEL": "custom"}. It is possible to pass multiple LABEL-REGEX pairs to the object. So the following example is also a valid use case: {"CUSTOM1": "custom", "CUSTOM2": "other"}.

Since this feature uses regex patterns, you can either pass a word (e.g. the, word, custom, etc.) or you can pass a valid Python regex pattern. It is important to note that regex patterns may require escaping when used in JSON objects. To give an example, if you would like to send the regex pattern r"\b\w{4}\b" which will catch every 4-character word, you need to send it as "\\b\\w{4}\\b". A complete JSON grammar is found here: https://www.json.org/json-en.html. More information on how to write a python regex is found here: https://docs.python.org/3/library/re.html

It is important to note also that only non-overlapping matches are returned.

Lastly,for supported labels, if you would like the model to pick up only the tokens from the block list, you can use the enabled clases feature together with the block list feature. This can be done by defining a list of enabled classes and not including the supported label you are adding to the block list. For example, if you would like the label "ORGANIZATION" to only pick up Microsoft, you can define the enabled classes as ["NAME", "LOCATION", "AGE", ...] (and omitting ORGANIZATION) and the block list as {"ORGANIZATION": "Microsoft"}.

block_list_max_likelihood
number (Block List Max Likelihood)
Default: 1
Responses
200

Successful Response

400

Bad Request

post/deidentify_text
Request samples
Response samples
application/json
{
  • "result": "string",
  • "result_fake": "string",
  • "pii": [
    ],
  • "api_calls_used": 0,
  • "output_checks_passed": true
}
© Copyright 2024 Private AI.