Deidentify Text

Remove identifiers from a string or multiple strings. You can find examples on how to deidentify text in several different scenarios in our examples repository.

Request

Request Body schema: application/json
required

required	Text (string) or Array of Text (strings) (Text) UTF-8 encoded message(s) to de-identify.
key required	string (Key) License key provided to you by Private AI. Note that this field will be moving to the API header in the upcoming API refactor.
unique_pii_markers	boolean (Unique Pii Markers) Default: true Specifies whether PII markers in the text should uniquely identify PII.
enabled_classes	Array of strings (Enabled Classes) Controls which types of PII are removed. See Supported Entity Types for the list of possible entities.
allow_list	Array of strings (Allow List) Any entities in this list will be discarded. Note that this feature does not support regex patterns and the match is case-insensitive. If the allow list is `["maxim", "Kandeep"]`, possible matches that will be discarded are `"maxim"`, `"MAxim"`, `"MAXIM"`, `"kandeep"`, `"kANdeep"`. It is also possible to set this option via environment variable. See Environment Variables
marker_format	string (Marker Format) Specify a custom redaction marker format. The format must always contain `"CLASS_NAME"`, which will be replaced by the entity class. E.g. `"<<CLASS_NAME>>"`, `"-CLASS_NAME-"`. It is also possible to set this option via environment variable. See Environment Variables
accuracy_mode	string (Accuracy Mode) Default: "high" Selects the model used to identify PII in the input text. By default, the `"high"` accuracy model is used. While the models used by the Private AI solution are highly optimized (~25X faster than a reference transformer implementation), in high-throughput cases it is possible to trade accuracy for speed by selecting either the `"standard"` or `"standard_high"` accuracy modes. Multilingual support can be enabled by using one of the multilingual models, namely `"standard_high_multilingual"` (GPU container only) and `"high_multilingual"`. The multilingual models process all supported languages including English, without the need to specify language. It is advisable to use the English-only models where possible, as they perform slightly better on English.
fake_entity_accuracy_mode	string (Fake Entity Accuracy Mode) (Beta) Enable fake entity generation using the specified model. Currently this feature is in beta and only supports mode "standard".
preserve_relationships	boolean (Preserve Relationships) Default: true (Beta) Specifies whether multiple instances of the same entity should have the same generated fake entity or not. For example, preserve relationships: `"Hi John and Rosha, John nice to meet you" -> "Hi Harry and Alev, Harry nice to meet you"`. No preserve relationships: `"Hi John and Rosha, John nice to meet you" -> "Hi Harry and Alev, Sulav nice to meet you"`. This field has no effects when `fake_entity_accuracy_mode` is not set.
link_batch	boolean (Link Batch) Default: false When set to true, batch inputs will be joined together internally in the Private AI inference engine, to share context between the different inputs. This is useful when processing a sequence of short inputs, such as an SMS chat log.
	object (Block List) The block list features allows you to extend the functionality of the Private AI models by using regular expressions. This way, you can define a Python regex patterm that will be used to identify additional tokens with the given PII label. The block list feature supports multiple regex patterns. These are passed as a JSON object with the key represening a label and the value a regex pattern, for example `{"CUSTOM_LABEL": "custom"}`. It is possible to pass multiple `LABEL-REGEX` pairs to the object. So the following example is also a valid use case: `{"CUSTOM1": "custom", "CUSTOM2": "other"}`. Since this feature uses regex patterns, you can either pass a word (e.g. the, word, custom, etc.) or you can pass a valid Python regex pattern. It is important to note that regex patterns may require escaping when used in JSON objects. To give an example, if you would like to send the regex pattern `r"\b\w{4}\b"` which will catch every 4-character word, you need to send it as `"\\b\\w{4}\\b"`. A complete JSON grammar is found here: https://www.json.org/json-en.html. More information on how to write a python regex is found here: https://docs.python.org/3/library/re.html It is important to note also that only non-overlapping matches are returned. Lastly,for supported labels, if you would like the model to pick up only the tokens from the block list, you can use the enabled clases feature together with the block list feature. This can be done by defining a list of enabled classes and not including the supported label you are adding to the block list. For example, if you would like the label `"ORGANIZATION"` to only pick up Microsoft, you can define the enabled classes as `["NAME", "LOCATION", "AGE", ...]` (and omitting `ORGANIZATION`) and the block list as `{"ORGANIZATION": "Microsoft"}`.

Responses

200

Successful Response

400

Bad Request

post/deidentify_text

Request samples

Response samples

application/json

{"result": "string",
"result_fake": "string",
"pii": [{"marker": "string",
"text": "string",
"best_label": "string",
"stt_idx": 0,
"end_idx": 0,
"labels": {"property1": 0,
"property2": 0
},
"fake_text": "string",
"fake_stt_idx": 0,
"fake_end_idx": 0
}
],
"api_calls_used": 0,
"output_checks_passed": true
}

Deidentify Text

Request Body schema: application/jsonrequired

Request Body schema: application/json
required