Getting Started

日本語

This guide walks through the main features of the Private AI API. It focuses on pure text applications, but is easily extended to processing files.

info

This guide relies on Private AI's cloud API. Please sign up for a free API key to run the code examples.

Alternatively if using the container instead of the cloud API please follow the container quickstart first and adjust the API endpoint in each example as required.

Basic Use

The process/text endpoint accepts a list of text strings and replaces each piece of PII found with a redaction marker. A simple request looks like this:

Request BodycURLPythonPython Client

Copy

Copied

{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ]
}

Copy

Copied

curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ]
}'

Copy

Copied

import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
        ]
    },
)

results = r.json()

print(results)

Copy

Copied

from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

text_request = request_objects.process_text_obj(text=["Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."])
response = client.process_text(text_request)

print(response.processed_text)

The response contains two main outputs:

processed_text , the redacted, masked or synthetic text as defined by processed_text in the input
entities , a list of each PII found, which is useful for PII detection and NER (Named Entity Recognition)

Copy

Copied

[
  {
    "processed_text": "Thank you for calling the [ORGANIZATION_1]. My name is miss [NAME_GIVEN_1], and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].",
    "entities": [
      {
        "processed_text": "ORGANIZATION_1",
        "text": "Georgia Division of Transportation",
        "location": {
          "stt_idx": 26,
          "end_idx": 60,
          "stt_idx_processed": 26,
          "end_idx_processed": 42
        },
        "best_label": "ORGANIZATION",
        "labels": {
          "LOCATION_STATE": 0.2403,
          "LOCATION": 0.2342,
          "ORGANIZATION": 0.8967
        }
      },
      {
        "processed_text": "NAME_GIVEN_1",
        "text": "Johanna",
        "location": {
          "stt_idx": 78,
          "end_idx": 85,
          "stt_idx_processed": 60,
          "end_idx_processed": 74
        },
        "best_label": "NAME_GIVEN",
        "labels": {
          "NAME_GIVEN": 0.9127,
          "NAME": 0.9018
        }
      },
      {
        "processed_text": "SSN_1",
        "text": "614-5555 01",
        "location": {
          "stt_idx": 203,
          "end_idx": 214,
          "stt_idx_processed": 192,
          "end_idx_processed": 199
        },
        "best_label": "SSN",
        "labels": {
          "SSN": 0.913
        }
      }
    ],
    "entities_present": true,
    "characters_processed": 215,
    "languages_detected": {
      "en": 0.920992910861969
    }
  }
]

Processing Related Examples

If the list of strings is related, please set link_batch like this:

Request BodycURLPythonPython Client

Copy

Copied

{
   "text": [
      "My phone number is",
      "2345435",
   ],
   "link_batch": True,
}

Copy

Copied

curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR API KEY>' \
--data '{
   "text": [
      "My phone number is",
      "2345435"
   ], 
   "link_batch": true
}'

Copy

Copied

import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "My phone number is",
            "2345435",
        ],
        "link_batch": True,
    },
)

results = r.json()
print(results)

Copy

Copied

from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

text_request = request_objects.process_text_obj(text=["My phone number is", "2345435"], link_batch=True)
response = client.process_text(text_request)

print(response.processed_text)

This ensures that the inputs are joined before going to the PII detection system. This way the model sees My phone number is 2345435 instead of My phone number is and separately 2345435, allowing the phone number to be identified correctly:

Redacted TextFull Response

Copy

Copied

["My phone number is", "[PHONE_NUMBER_1]"]

Copy

Copied

[
  {
    "processed_text": "My phone number is",
    "entities": [],
    "entities_present": false,
    "characters_processed": 18,
    "languages_detected": {
      "en": 0.8986189365386963
    }
  },
  {
    "processed_text": "[PHONE_NUMBER_1]",
    "entities": [
      {
        "processed_text": "PHONE_NUMBER_1",
        "text": "2345435",
        "location": {
          "stt_idx": 0,
          "end_idx": 7,
          "stt_idx_processed": 0,
          "end_idx_processed": 16
        },
        "best_label": "PHONE_NUMBER",
        "labels": {
          "PHONE_NUMBER": 0.9166
        }
      }
    ],
    "entities_present": true,
    "characters_processed": 7,
    "languages_detected": {}
  }
]

Customizing Entity Detection With Selective Redaction

The above example identifies and removes all non-beta entity types. The types of PII that are identified can be customized using Entity Selectors. For example, to only redact the SSN:

Request BodycURLPythonPython Client

Copy

Copied

{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ],
    "entity_detection": {
        "entity_types": [
            {
                "type": "ENABLE",
                "value": [
                    "SSN"
                ]
            }
        ]
    }
}

Copy

Copied

curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ],
    "entity_detection": {
        "entity_types": [
            {
                "type": "ENABLE",
                "value": [
                    "SSN"
                ]
            }
        ]
    }
}'

Copy

Copied

import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
        ],
        "entity_detection": {"entity_types": [{"type": "ENABLE", "value": ["SSN"]}]},
    },
)

results = r.json()
print(results)

Copy

Copied

from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

entity_detection_object = request_objects.entity_detection_obj(entity_types=[request_objects.entity_type_selector_obj(type="ENABLE", value=["SSN"])])
text_request = request_objects.process_text_obj(text=["Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."], entity_detection=entity_detection_object)
response = client.process_text(text_request)

print(response.processed_text)

The result of this selective redaction is below:

Redacted TextFull Response

Copy

Copied

Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].

Copy

Copied

[
  {
    "processed_text": "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].",
    "entities": [
      {
        "processed_text": "SSN_1",
        "text": "614-5555 01",
        "location": {
          "stt_idx": 203,
          "end_idx": 214,
          "stt_idx_processed": 203,
          "end_idx_processed": 210
        },
        "best_label": "SSN",
        "labels": {
          "SSN": 0.913
        }
      }
    ],
    "entities_present": true,
    "characters_processed": 215,
    "languages_detected": {
      "en": 0.920992910861969
    }
  }
]

Adding Allow & Block Lists via Regexes

It is also possible to customize PII detection and de-identification/redaction via regex-based Filters, allowing for custom behaviour on specific entity types such as employee IDs, internal database IDs, and other data unique a company.

Below is an example demonstrating how to combine the Entity Selectors presented above with Filters to provide fine-grained control & customization. In this hypothetical HR claim scenario, an employee has a medical injury and requires accommodation. Here, we demonstrate:

Two regex-based block filters that define custom entity types for employee IDs and business units, overriding Private AI's default entity types.
Disabling injury, which could be important information for an insurance claim that the employer might have to make.
We also see that the text element in the payload is a list, as you would expect from a conversational use case. In this case, we want to ensure that we keep the context of redactions across an entire thread of conversations by setting link_batch to true .
Disabling numbering of redaction markers.

Request BodycURLPythonPython Client

Copy

Copied

{
    "text": [
        "Hello Xavier, can you tell me your employee ID?",
        "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
        "Okay, thanks Xavier, why are you calling today?",
        "I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
        "Oh, so sorry to hear that! How can we help?",
        "I won't be able to come back to the office in NYC for a while",
        "No problem Xavier, I will enter a short term work from home for you. You're all set!",
        "Thanks so much Carole!"
    ],
    "link_batch": true,
    "entity_detection": {
        "entity_types": [
            {
                "type": "DISABLE",
                "value": [
                    "INJURY"
                ]
            }
        ],
        "filter": [
            {
                "type": "BLOCK",
                "entity_type": "EMPLOYEE_ID",
                "pattern": "GID-\\d{5}"
            },
            {
                "type": "BLOCK",
                "entity_type": "BUSINESS_UNIT",
                "pattern": "Best Corp"
            }
        ],
        "return_entity": true
    },
    "processed_text": {
        "type": "MARKER",
        "pattern": "[BEST_ENTITY_TYPE]"
    }
}

Copy

Copied

curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '                                                                                  
{   
    "text": [
        "Hello Xavier, can you tell me your employee ID?",
        "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
        "Okay, thanks Xavier, why are you calling today?",
        "I broke my right leg on the 31st and I''m waiting for my x-ray results. dr. zhang, mercer health centre.",
        "Oh, so sorry to hear that! How can we help?",
        "I won''t be able to come back to the office in NYC for a while",
        "No problem Xavier, I will enter a short term work from home for you. You''re all set!",
        "Thanks so much Carole!"                                                                
    ],                          
    "link_batch": true,
    "entity_detection": {
        "entity_types": [
            {            
                "type": "DISABLE",
                "value": [
                    "INJURY"
                ]           
            }    
        ],
        "filter": [
            {
                "type": "BLOCK",
                "entity_type": "EMPLOYEE_ID",
                "pattern": "GID-\\d{5}"
            },                         
            {
                "type": "BLOCK",
                "entity_type": "BUSINESS_UNIT",
                "pattern": "Best Corp"
            }                         
        ],
        "return_entity": true
    },                       
    "processed_text": {
        "type": "MARKER",
        "pattern": "[BEST_ENTITY_TYPE]"
    }                                           
}'

Copy

Copied

import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Hello Xavier, can you tell me your employee ID?",
            "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
            "Okay, thanks Xavier, why are you calling today?",
            "I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
            "Oh, so sorry to hear that! How can we help?",
            "I won't be able to come back to the office in NYC for a while",
            "No problem Xavier, I will enter a short term work from home for you. You're all set!",
            "Thanks so much Carole!",
        ],
        "link_batch": True,
        "entity_detection": {
            "entity_types": [{"type": "DISABLE", "value": ["INJURY"]}],
            "filter": [
                {
                    "type": "BLOCK",
                    "entity_type": "EMPLOYEE_ID",
                    "pattern": "GID-\\d{5}",
                },
                {
                    "type": "BLOCK",
                    "entity_type": "BUSINESS_UNIT",
                    "pattern": "Best Corp",
                },
            ],
            "return_entity": True,
        },
        "processed_text": {"type": "MARKER", "pattern": "[BEST_ENTITY_TYPE]"},
    },
)

results = r.json()
print(results)

Copy

Copied

from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

filter_employee = request_objects.filter_selector_obj(type="BLOCK", entity_type="EMPLOYEE_ID", pattern="GID-\\d{5}")
filter_bu = request_objects.filter_selector_obj(type="BLOCK", entity_type="BUSINESS_UNIT", pattern="Best Corp")
entity_detection_object = request_objects.entity_detection_obj(entity_types=[request_objects.entity_type_selector_obj(type="DISABLE", value=["INJURY"])],
                                                               filter=[filter_employee, filter_bu],
                                                               return_entity=True)

processed_text_object = request_objects.processed_text_obj(type="MARKER", pattern="[BEST_ENTITY_TYPE]")

text_request = request_objects.process_text_obj(text=[
        "Hello Xavier, can you tell me your employee ID?",
        "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
        "Okay, thanks Xavier, why are you calling today?",
        "I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
        "Oh, so sorry to hear that! How can we help?",
        "I won't be able to come back to the office in NYC for a while",
        "No problem Xavier, I will enter a short term work from home for you. You're all set!",
        "Thanks so much Carole!"
    ],
    link_batch=True,
    entity_detection=entity_detection_object,
    processed_text=processed_text_object,
)
response = client.process_text(text_request)

print(response.processed_text)

The above request yields this response:

Copy

Copied

['Hello [NAME_GIVEN], can you tell me your employee ID?', 'Yep, my [BUSINESS_UNIT] ID is [EMPLOYEE_ID], and my SIN is [SSN]', 'Okay, thanks [NAME_GIVEN], why are you calling today?', "I broke my right leg on the [DATE] and I'm waiting for my [MEDICAL_PROCESS] results. [NAME_MEDICAL_PROFESSIONAL], [ORGANIZATION_MEDICAL_FACILITY].", 'Oh, so sorry to hear that! How can we help?', "I won't be able to come back to the office in [LOCATION_CITY] for a while", "No problem [NAME_GIVEN], I will enter a short term work from home for you. You're all set!", 'Thanks so much [NAME_GIVEN]!']

LLMs: PrivateGPT

PrivateGPT pairs Private AI's redaction engine with a re-identification function to provide a seamless user experience with cloud-based LLMs, without sharing sensitive data and PII with the LLM:

Overview of PrivateGPT by Private AI workflow

To learn more about this functionality, please visit the PrivateGPT User Guide and the LLM integration guide.

Generating Synthetic Entities (Beta)

In addition to replacing PII entities with redaction markers, tokens and mask characters, Private AI can generate fake or synthetic replacements for each entity. This is done using an ML-based approach that produces realistic examples that fit the context of the surrounding text. This has a number of advantages:

Unlike other synthetic data generators which generate completely new data, data with synthetic PII is mostly the original data. This minimises the chance that the synthetic data generator introduces biases into the data, maximizing the utility for downstream tasks like sentiment analysis.
Our PII detection engine leads the market , although it isn't perfect. Synthetic PII ensures that any PII detection misses are hidden amongst realistic, fake PII, providing a higher level of protection against re-identification.
Less impact on downstream ML systems: Synthetic entities look more like natural text than redaction markers or hashing.

To generate synthetic PII, please set the processed_text object in the API request to have a marker type of SYNTHETIC.

Request BodycURLPythonPython Client

Copy

Copied

{
    "text": [
      "Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
    ],
    "processed_text": {
      "type": "SYNTHETIC"
    }
}

Copy

Copied

curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
    "text": [
      "Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
    ],
    "processed_text": {
      "type": "SYNTHETIC"
    }
}'

Copy

Copied

import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
        ],
        "processed_text": {"type": "SYNTHETIC"},
    },
)

results = r.json()
print(results)

Copy

Copied

from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

text_request = request_objects.process_text_obj(text=["Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."],
                                                processed_text=request_objects.processed_text_obj(type="SYNTHETIC"))
response = client.process_text(text_request)

print(response.processed_text)

Yields the following response:

Redacted TextFull Response

Copy

Copied

Hello, my name is Ben. I am the aunt of Michael Morley. We live in Ekshaku, Sweden.

Copy

Copied

[
  {
    "processed_text": "Hello, my name is Ben. I am the aunt of Michael Morley. We live in Ekshaku, Sweden.",
    "entities": [
      {
        "processed_text": "Ben",
        "text": "May",
        "location": {
          "stt_idx": 18,
          "end_idx": 21,
          "stt_idx_processed": 18,
          "end_idx_processed": 21
        },
        "best_label": "NAME_GIVEN",
        "labels": {
          "NAME_GIVEN": 0.9234,
          "NAME": 0.8903
        }
      },
      {
        "processed_text": "Michael Morley",
        "text": "Jessica Parker",
        "location": {
          "stt_idx": 40,
          "end_idx": 54,
          "stt_idx_processed": 40,
          "end_idx_processed": 54
        },
        "best_label": "NAME",
        "labels": {
          "NAME_GIVEN": 0.4595,
          "NAME": 0.9178,
          "NAME_FAMILY": 0.4567
        }
      },
      {
        "processed_text": "Ekshaku, Sweden",
        "text": "Toronto, Canada",
        "location": {
          "stt_idx": 67,
          "end_idx": 82,
          "stt_idx_processed": 67,
          "end_idx_processed": 82
        },
        "best_label": "LOCATION",
        "labels": {
          "LOCATION_CITY": 0.3177,
          "LOCATION": 0.9268,
          "LOCATION_COUNTRY": 0.3185
        }
      }
    ],
    "entities_present": true,
    "characters_processed": 83,
    "languages_detected": {
      "en": 0.8507365584373474
    }
  }
]