Processing PowerPoint (PPT/PPTX) Files
Private AI supports scanning Microsoft PowerPoint PPT and PPTX files for PII and creating de-identified or redacted copies. Private AI’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Languages and Supported Entity Types page provides a more detailed look.
info
If you'd like to try it yourself, please visit our free interactive web demo. No code or account is necessary.
How PPTX Files Are Processed
PPTX files are processed by extracting each element and processing according to the table below. The de-identified or redacted file is created by according to the behaviour specified in the table.
Property Type | Details | Behaviour |
---|---|---|
Core properties | Author, Category, Comments, Content Status, Identifier, Keywords, Language, Last Modified By, Subject, Title, Version | Redact |
Speaker notes | Any content in the speakers notes | Redact |
Tables | Table objects with text and images | Redact |
Images | The Images page provides a more detailed look at Image processing | Redact, unsupported image types are removed |
Text boxes | Main slide content | Redact |
Embedded links | Hyperlinks to internet pages or documents | Remove |
External elements | Tables and charts embedded from another document or file, such as an Excel chart | Remove external file, redact cached values |
Embedded audio & video | Videos and audio clips | Remove |
Review comments | Comments from document reviews | Redact |
Shape objects | Shapes containing text | Redact |
info
You can configure the OCR System by setting it as an Environment Variable or sending it in the request object. Check out our OCR Guide to further understand the OCR modes and their usage.
Constraints
- If a piece of PII text has more than one style, the redaction marker will in the first style.
- We recommend using Microsoft PowerPoint to open the processed PPT/PPTX files. Other editors may not give ideal results.
How PPT Files Are Processed
PPT files are processed by converting into PPTX files, followed the process described above and then converting back to PPT files.
Support Matrix
CPU Container | GPU Container | Community API | Professional API | PrivateGPT UI | |
---|---|---|---|---|---|
Supported? | Yes | Yes | Base64 Only | Yes | No |
Sample Request
info
Please sign up for a free API key to run this code.
{
"file": {
"data": file_content_base64,
"content_type": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
},
"entity_detection": {
"return_entity": True
}
}
echo '{
"file": {"data": "'$(base64 -w 0 sample.pptx)'",
"content_type": "application/vnd.openxmlformats-officedocument.presentationml.presentation"},
"entity_detection": {"return_entity": "True"}
}' \
| curl --request POST --url 'https://api.private-ai.com/community/v4/process/files/base64' \
-H 'Content-Type: application/json' \
-H 'x-api-key: <YOUR KEY HERE>' \
-d @- \
| jq -r .processed_file \
| base64 -d > 'sample.redacted.pptx'
import requests
import base64
file_url = "https://paidocumentation.blob.core.windows.net/$web/sample.pptx"
filename_out = "/path/to/output/sample.redacted.pptx"
file_content = requests.get(file_url).content
file_content_base64 = base64.b64encode(file_content).decode()
url = "https://api.private-ai.com/community/v4/process/files/base64"
headers = {"Content-Type": "application/json", "x-api-key": "<INSERT API KEY>"}
payload = {
"file":{
"data": file_content_base64,
"content_type": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
},
"entity_detection": {
"return_entity": True
}
}
response = requests.post(url, json=payload, headers=headers)
with open(filename_out, "wb") as f:
f.write(base64.b64decode(response.json()["processed_file"]))
from privateai_client import PAIClient
from privateai_client.objects import request_objects
import base64
filename_in = "sample.pptx"
filename_out = "sample.redacted.pptx"
file_type= "application/vnd.openxmlformats-officedocument.presentationml.presentation"
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")
with open(filename_in, "rb") as b64_file:
file_data = base64.b64encode(b64_file.read())
file_data = file_data.decode("ascii")
file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
request_obj = request_objects.file_base64_obj(file=file_obj)
resp = client.process_files_base64(request_object=request_obj)
with open(filename_out, 'wb') as redacted_file:
processed_file = resp.processed_file.encode("ascii")
processed_file = base64.b64decode(processed_file, validate=True)
redacted_file.write(processed_file)
Sample Response
"processed_file": "Base64 Encoded File Content of the Redacted File",
"processed_text":"string",
"entities":"List[Entity]",
"entities_present":true,
"languages_detected":{"lang_1":0.67, "lang_2": 0.74}