Supported File Types

Private AI can support multiple file types for de-identification. The complete list of supported file types is below. New file types are continually being added, please contact us if you require a file type not in the list below.

Private AI’s supported entity types function across each file type, with multilingual equivalents of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Entity Types page provides a more detailed look at entities.

Document File Types

File Type Extension Content Type Added In
PDF doc .pdf application/pdf 3.0.0
JSON file .json application/json 3.1.0
XML file .xml application/xml 3.1.0
CSV file .csv text/csv 3.1.0
Word Doc .doc application/msword 3.1.0
Word Open XML Doc .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document 3.1.0
Email file .eml message/rfc822 3.1.1
Text file .txt text/plain 3.1.1
Excel workbook .xls application/vnd.ms-excel 3.2.0
Excel Open XML spreadsheet .xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 3.2.0
DICOM file .dcm application/dicom 3.4.0

Image File Types

File Type Extension Content Type Added In
JPEG image .jpg, .jpeg image/jpg, image/jpeg 3.0.0
TIFF image .tif, .tiff image/tif, image/tiff 3.0.0
PNG image .png image/png 3.4.0
BMP image .bmp image/bmp, image/x-ms-bmp 3.4.0

Audio File Types

File Type Extension Content Type Added In
wave audio file .wav audio/wav 3.0.0
mp3 audio file .mp3 audio/mpeg, audio/mp3 3.0.0
mp4 audio file .mp4 audio/mp4 3.0.0
Supported Languages

Note that while Private AI text de-identification service supports more than 50 languages, the file processing service supports this restricted list of languages: Dutch, English, French, German, Italian, Polish, Portuguese and Spanish.

Limitations

Private AI is constantly improving the file processing support in every releases. These are the current limitations:

Document Type Limitation
XML file Only the text of elements and node attributes are redacted
Text file Text encoding must be utf-8
CSV file The data must be column-oriented and the headers must be on the first row
Word Doc Only the document text and metadata are redacted
Word Open XML Doc Only the document text and metadata are redacted
Email file Only the email body is redacted
PDF Attachments are not redacted
© Copyright 2022, 2023 Private AI.