Processing TXT Files
Private AI supports scanning TXT files for PII and creating de-identified or redacted copies. Private AI’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Languages and Supported Entity Types page provides a more detailed look.
info
If you'd like to try it yourself, please visit our free interactive web demo. No code or account is necessary.
How TXT Files Are Processed
TXT files are processed by simply reading in the contents of the TXT files verbatim and passing it through Private AI's text module. The resulting file will contain the labelled and redacted version of contents of the original.
Constraints
-
Private AI currently only supports
utf-8
encoding for text files.
Support Matrix
CPU Container | GPU Container | Community API | Professional API | PrivateGPT UI | |
---|---|---|---|---|---|
Supported? | Yes | Yes | No | Yes | No |
Sample Request
info
Please sign up for a free API key to run this code.
{
"file": {
"data": file_content_base64,
"content_type": "text/plain",
},
"entity_detection": {
"return_entity": True
}
}
echo '{
"file": {"data": "'$(base64 -w 0 sample.txt)'",
"content_type": "text/plain"},
"entity_detection": {"return_entity": "True"}
}' \
| curl --request POST --url 'https://api.private-ai.com/community/v4/process/files/base64' \
-H 'Content-Type: application/json' \
-H 'x-api-key: <YOUR KEY HERE>' \
-d @- \
| jq -r .processed_file \
| base64 -d > 'sample.redacted.txt'
import requests
import base64
file_url = "https://paidocumentation.blob.core.windows.net/$web/sample.txt"
filename_out = "/path/to/output/sample.redacted.txt"
file_content = requests.get(file_url).content
file_content_base64 = base64.b64encode(file_content).decode()
url = "https://api.private-ai.com/community/v4/process/files/base64"
headers = {"Content-Type": "application/json", "x-api-key": "<INSERT API KEY>"}
payload = {
"file":{
"data": file_content_base64,
"content_type": "text/plain",
},
"entity_detection": {
"return_entity": True
}
}
response = requests.post(url, json=payload, headers=headers)
with open(filename_out, "wb") as f:
f.write(base64.b64decode(response.json()["processed_file"]))
from privateai_client import PAIClient
from privateai_client.objects import request_objects
import base64
filename_in = "sample.txt"
filename_out = "sample.redacted.txt"
file_type= "text/plain"
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")
with open(filename_in, "rb") as b64_file:
file_data = base64.b64encode(b64_file.read())
file_data = file_data.decode("ascii")
file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
request_obj = request_objects.file_base64_obj(file=file_obj)
resp = client.process_files_base64(request_object=request_obj)
with open(filename_out, 'wb') as redacted_file:
processed_file = resp.processed_file.encode("ascii")
processed_file = base64.b64decode(processed_file, validate=True)
redacted_file.write(processed_file)
Sample Response
"processed_file": "Base64 Encoded File Content of the Redacted File",
"processed_text":"string",
"entities":"List[Entity]",
"entities_present":true,
"languages_detected":{"lang_1":0.67, "lang_2": 0.74}