Processing Email (EML) Files

Private AI supports scanning Email (EML) files for PII and creating de-identified or redacted copies. Private AI’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Languages and Supported Entity Types page provides a more detailed look.

info

If you'd like to try it yourself, please visit our free interactive web demo. No code or account is necessary.

How EML files Are Processed

attention

Email support is a new feature and due to the complexity of the filetype we do not yet support all the elements of the filetype. Whilst we are working on expanding support, please consider rendering and processing as a PDF. This will ensure all content is processed and redacted.

Functionally, EML files are processed in the same way as XML files following the defined standard for .eml files. The EML file is traversed as a standard XML with node attributes and text elements processed by Private AI's text module. Please see the constraints section to see the limitations of processing .eml files natively.

Constraints

Private AI currently only supports de-identifying the e-mail body and headers.
You may sometimes notice spurious detections on email headers as these lack context when they are processed. To work around this, you can create an allow text filter to skip headers using the entity_types->filter parameter in the process/files endpoints, or consider creating your own handler to process the specific elements.
XML, JSON and other embedded data such as code snippets are not processed natively and may lead to inaccurate PII detection.
Remote elements such as images and links are not retrieved and processed, please process these separately.
Email attachments are not processed and passed through to the processed file as they are considered a remote element.

Support Matrix

	CPU Container	GPU Container	Community API	Professional API	PrivateGPT UI
Supported?	Yes	Yes	No	Yes	No

Sample Request

info

Please sign up for a free API key to run this code.

Request BodycURLPythonPython Client

Copy

Copied

{
  "file":{
    "data": file_content_base64,
    "content_type": "message/rfc822",
  },
  "entity_detection": {
    "return_entity": True
  }
}

Copy

Copied

echo '{
          "file": {"data": "'$(base64 -w 0 sample.eml)'", 
          "content_type": "message/rfc822"}, 
          "entity_detection": {"return_entity": "True"}
      }' \
| curl --request POST --url 'https://api.private-ai.com/community/v4/process/files/base64' \
       -H 'Content-Type: application/json' \
       -H 'x-api-key: <YOUR KEY HERE>' \
       -d @- \
       | jq -r .processed_file \
       | base64 -d > 'sample.redacted.eml'

Copy

Copied

import requests
import base64

file_url = "https://paidocumentation.blob.core.windows.net/$web/sample.eml"
filename_out = "/path/to/output/sample.redacted.eml"
file_content = requests.get(file_url).content
file_content_base64 = base64.b64encode(file_content).decode("ascii")

url = "https://api.private-ai.com/community/v4/process/files/base64"

headers = {"Content-Type": "application/json", "x-api-key": "<INSERT API KEY>"}

payload = {
  "file":{
    "data": file_content_base64,
    "content_type": "message/rfc822",
  },
  "entity_detection": {
    "return_entity": True
  }
}

response = requests.post(url, json=payload, headers=headers)
with open(filename_out, "wb") as f:
    f.write(base64.b64decode(response.json()["processed_file"]))

Copy

Copied

from privateai_client import PAIClient
from privateai_client.objects import request_objects
import base64

filename_in = "sample.eml"
filename_out = "sample.redacted.eml"

file_type= "message/rfc822"
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

with open(filename_in, "rb") as b64_file:
    file_data = base64.b64encode(b64_file.read())
    file_data = file_data.decode("ascii")

file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
request_obj = request_objects.file_base64_obj(file=file_obj)
resp = client.process_files_base64(request_object=request_obj)

with open(filename_out, 'wb') as redacted_file:
    processed_file = resp.processed_file.encode("ascii")
    processed_file = base64.b64decode(processed_file, validate=True)
    redacted_file.write(processed_file)

Sample Response

Copy

Copied

"processed_file": "Base64 Encoded File Content of the Redacted File",
"processed_text":"string",
"entities":"List[Entity]",
"entities_present":true,
"languages_detected":{"lang_1":0.67, "lang_2": 0.74}