Processing JSON Files

Private AI supports scanning JSON files for PII and creating de-identified or redacted copies. Private AI’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Languages and Supported Entity Types page provides a more detailed look.

How JSONs Are Processed

Similar to XML files, JSON files are processed using the method described in the Structured Data Guide. The output file retains its original format with redaction markers of the format "[LABEL_X]" in place of the detected PII. The file passed in should adhere to the JSON standard as defined.

Constraints

info

Please consider writing a handler for your specific application using the Structured Data Guide to get around any of the constraints listed below.

  • The file processing routes are synchronous; large files over 5MB in size may take a long time to process.
  • Only values are redacted. Key names are assumed to never contain PII and are not redacted.
  • Entity detection numbering is consistent within individual values only, not across the entire file.
  • The JSON must be entirely human-readable content. Encoded content such as Base64 is not natively supported and may lead to inaccurate PII detection. Please process this content separately.
  • Due to the way the data is formatted , deeply hierarchical (>10 levels) JSON structures take proportionally longer to process.

Support Matrix

CPU Container GPU Container Community API Professional API PrivateGPT UI
Supported? Yes Yes No Yes No

Sample Request

info

Please sign up for a free API key to run this code.

Request BodycURLPythonPython Client
Copy
Copied
{
    "file": {
        "data": file_content_base64,
        "content_type": "application/json",
    },
    "entity_detection": {
        "return_entity": True
    }
}
Copy
Copied
echo '{
          "file": {"data": "'$(base64 -w 0 sample.json)'", 
          "content_type": "application/json"}, 
          "entity_detection": {"return_entity": "True"}
      }' \
| curl --request POST --url 'https://api.private-ai.com/community/v4/process/files/base64' \
       -H 'Content-Type: application/json' \
       -H 'x-api-key: <YOUR KEY HERE>' \
       -d @- \
       | jq -r .processed_file \
       | base64 -d > 'sample.redacted.json'
Copy
Copied
import requests
import base64

file_url = "https://paidocumentation.blob.core.windows.net/$web/sample.json"
filename_out = "/path/to/output/sample.redacted.json"
file_content = requests.get(file_url).content
file_content_base64 = base64.b64encode(file_content).decode()

url = "https://api.private-ai.com/community/v4/process/files/base64"

headers = {"Content-Type": "application/json", "x-api-key": "<INSERT API KEY>"}

payload = {
  "file":{
    "data": file_content_base64,
    "content_type": "application/json",
  },
  "entity_detection": {
    "return_entity": True
  }
}

response = requests.post(url, json=payload, headers=headers)
with open(filename_out, "wb") as f:
    f.write(base64.b64decode(response.json()["processed_file"]))
Copy
Copied
from privateai_client import PAIClient
from privateai_client.objects import request_objects
import base64

filename_in = "sample.json"
filename_out = "sample.redacted.json"

file_type= "application/json"
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

with open(filename_in, "rb") as b64_file:
    file_data = base64.b64encode(b64_file.read())
    file_data = file_data.decode("ascii")

file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
request_obj = request_objects.file_base64_obj(file=file_obj)
resp = client.process_files_base64(request_object=request_obj)

with open(filename_out, 'wb') as redacted_file:
    processed_file = resp.processed_file.encode("ascii")
    processed_file = base64.b64decode(processed_file, validate=True)
    redacted_file.write(processed_file)

Sample Response

Copy
Copied
"processed_file": "Base64 Encoded File Content of the Redacted File",
"processed_text":"string",
"entities":"List[Entity]",
"entities_present":true,
"languages_detected":{"lang_1":0.67, "lang_2": 0.74}
© Copyright 2024 Private AI.