Processing Image Files

Private AI supports scanning images for PII and creating de-identified or redacted copies. Private AI’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Languages and Supported Entity Types page provides a more detailed look.

info

If you'd like to try it yourself, please visit our free interactive web demo. No code or account is necessary.

How Images Are Processed

Image files are processed as follows:

Non-text PII such as faces and license plates are detected.
Additionally the image is run through an OCR system to detect any text present.
The output of the OCR system is then passed to a PII detection module.
Any PII found in the previous steps is de-identified via blurring or black box redaction.

info

You can configure the OCR System by setting it as an Environment Variable or sending it in the request object. Check out our OCR Guide to further understand the OCR modes and their usage.

Constraints

There is limited support for 8-bit PNG images.
Multi-page TIFF images are only supported on versions >=4.2.1. Earlier versions will process and return the first page of a multi-page TIFF.

Supported File Types

File Type	Extension	Content Type	Added In
JPEG image	`.jpg`, `.jpeg`	`image/jpg`, `image/jpeg`	3.0.0
TIFF image	`.tif`, `.tiff`	`image/tif`, `image/tiff`	3.0.0
PNG image	`.png`	`image/png`	3.4.0
BMP image	`.bmp`	`image/bmp`, `image/x-ms-bmp`	3.4.0
GIF image	`.gif`	`image/gif`	4.2.1

Support Matrix

	CPU Container	GPU Container	Community API	Professional API
Supported	Yes	Yes	Up to 10 MiB	No

Sample Request

info

Connect with one of our privacy experts to run this code.

Request BodycURLPythonPython Client

Copy

Copied

{
    "file": {
        "data": file_content_base64,
        "content_type": "image/jpeg",
    },
    "entity_detection": {
        "return_entity": True
    }
}

Copy

Copied

echo '{
          "file": {"data": "'$(base64 -w 0 sample.jpeg)'", 
          "content_type": "image/jpeg"}, 
          "entity_detection": {"return_entity": "True"}
      }' \
| curl --request POST --url 'https://api.private-ai.com/community/v4/process/files/base64' \
       -H 'Content-Type: application/json' \
       -H 'x-api-key: <YOUR KEY HERE>' \
       -d @- \
       | jq -r .processed_file \
       | base64 -d > 'sample.redacted.jpeg'

Copy

Copied

import requests
import base64

file_url = "https://paidocumentation.blob.core.windows.net/$web/sample.jpeg"
filename_out = "/path/to/output/sample.redacted.jpeg"
file_content = requests.get(file_url).content
file_content_base64 = base64.b64encode(file_content).decode()

url = "https://api.private-ai.com/community/v4/process/files/base64"

headers = {"Content-Type": "application/json", "x-api-key": "<INSERT API KEY>"}

payload = {
  "file":{
    "data": file_content_base64,
    "content_type": "image/jpeg",
  },
  "entity_detection": {
    "return_entity": True
  }
}

response = requests.post(url, json=payload, headers=headers)
with open(filename_out, "wb") as f:
    f.write(base64.b64decode(response.json()["processed_file"]))

Copy

Copied

from privateai_client import PAIClient
from privateai_client.objects import request_objects
import base64

filename_in = "sample.jpeg"
filename_out = "sample.redacted.jpeg"

file_type= "image/jpeg"
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

with open(filename_in, "rb") as b64_file:
    file_data = base64.b64encode(b64_file.read())
    file_data = file_data.decode("ascii")

file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
request_obj = request_objects.file_base64_obj(file=file_obj)
resp = client.process_files_base64(request_object=request_obj)

with open(filename_out, 'wb') as redacted_file:
    processed_file = resp.processed_file.encode("ascii")
    processed_file = base64.b64decode(processed_file, validate=True)
    redacted_file.write(processed_file)

Sample Response

Copy

Copied

"processed_file": "Base64 Encoded File Content of the Redacted File",
"processed_text":"string",
"entities":"List[Entity]",
"entities_present":true,
"languages_detected":{"lang_1":0.67, "lang_2": 0.74}