Redact Entities Selectively
PrivateGPT defaults to redacting all of Private AI's 50+ Supported Entity Types. However, terms which may sometimes be considered Personally Identifiable Information (PII) and thus redacted, in other contexts may be essential to ChatGPT’s interpretation of your prompt. It will sometimes be desirable or even necessary to leave certain terms unredacted, in order to preserve the context and information necessary for ChatGPT to perform a given task.
For this reason, PrivateGPT allows the user to selectively redact information by toggling individual entity types on and off by using the entity menu found at the right hand side of the interface. The utility of this feature is illustrated with some specific use cases below.
Entity Subclasses
When toggling individual entities, it's important to note that two of Private AI's supported entity types (namely, NAME
and LOCATION
) have associated subclasses. Disabling the parent classes will not automatically disable the subclasses (nor vice versa). Each entity type must be toggled on and off separately.
Subclasses of NAME
The parent class NAME
has two subclasses: NAME_GIVEN
and NAME_FAMILY
. When a full name appears in a prompt, including both given and family names, the entity is redacted as NAME
. The example below shows a prompt sent with default settings: Privacy Mode Enabled
and all entity types selected. Here, the full string Priya Chaudhry is redacted as NAME_1
.
In the next prompt, the user has disabled the NAME
entity type, in an attempt to allow names to remain unredacted in their prompt. As you can see in the screenshot, although NAME
has been disabled, names are still detected as the two subclasses. Below, Priya is now captured as NAME_GIVEN_1
and Chaudhry as NAME_FAMILY_1
.
To achieve the desired result, you must disable NAME
as well as both of its subclasses NAME_GIVEN
and NAME_FAMILY
. In the example below, all three classes are de-selected in the entity menu at the right, and so no NAME
entities are redacted in the prompt sent to ChatGPT.
Subclasses of LOCATION
The parent class LOCATION
has six subclasses: LOCATION_ADDRESS
, LOCATION_CITY
, LOCATION_COORDINATE
, LOCATION_COUNTRY
, LOCATION_STATE
, and LOCATION_ZIP
. Like the NAME
subclasses, these entity types can all be toggled individually.
The prompt below is sent with default settings (Privacy Mode Enabled
, all entity types selected). PrivateGPT redacts the address with the most specific label that applies to the full string, namely LOCATION_ADDRESS
.
ChatGPT is unable to supply an informative answer to the question with the address redacted. To allow the address to pass through unredacted, you might first try to disable LOCATION_ADDRESS
. While this entity type will no longer be detected in your prompts, the LOCATION
label will still apply to the address, as shown below.
Disabling both LOCATION
and LOCATION_ADDRESS
still does not give the desired result. Now, sub-parts of the addresss are detected as other location subclasses, namely LOCATION_CITY
, LOCATION_STATE
, and LOCATION_ZIP
.
To ensure that an address like this remains fully unredacted, disable LOCATION
and all of its subclasses before entering your prompt. In this final example, all location entity types have been deselected in the entity menu, and the full address information is included in the prompt sent to ChatGPT, so that it can provide an answer to the question.
Role Prompting
Assigning a role or persona to ChatGPT via your prompt can help provide the AI model with more context. This additional context can help the AI model to better understand your request, and thus increase the quality of its response. However, many roles that you might want to specify are things like professions or job titles (e.g., “Write me an email as if you were a lawyer …”, “Explain this topic to me as if you were a scientist …”). Because terms like these may be PII in certain contexts, PrivateGPT redacts them by default. This is shown in the example below, where the term lawyer is redacted in the prompt sent to ChatGPT.
This is a prime example of how selectively disabling individual entities can help you send all necessary information in your prompt, while still redacting any PII. In this case, simply deselect OCCUPATION
in the entity menu before entering your prompt.
With OCCUPATION
deselected, PrivateGPT will no longer redact terms referring to occupations or job titles, although it will still detect all other selected entity types in your prompt. ChatGPT will receive the essential content of the request, and provide a high-quality response.
Translation Tasks
If you would like PrivateGPT to perform a task that involves translation, you will need to first disable the LANGUAGE
entity type. Otherwise, ChatGPT will be lacking the crucial information of which language you would like your translation to be. In the example below, on the left, the word 'Spanish' is redacted, and so ChatGPT has resorted to guessing, and the response is translated to French. After disabling LANGUAGE
by deselecting it in the entity menu, the desired translation is returned, as shown in the example on the right.
More Selective Redaction
Here's another example where redacting the full set of entities obscures the prompt too much for ChatGPT to be able to provide a useful response. Without knowing the relevant locations, medical conditions, and dates from the original prompt, ChatGPT will not be able to provide an accurate answer to the question posed, as seen in the response below.
After disabling LOCATION
and its subclasses (see above for more info on entity subclasses), CONDITION
, and DATE
, however, ChatGPT has all of the information required to interpret and respond effectively to the question. The PII input by the user (e.g., their employer, job title, and budget) is still redacted and never seen by ChatGPT.