Supported File Types

Private AI can support multiple file types for de-identification. The complete list of supported file types is below. New file types are continually being added, please contact us if you require a file type not in the list below.

Private AI’s supported entity types function across each file type, with multilingual equivalents of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Entity Types page provides a more detailed look at entities.

Supported Languages

Note that while Private AI text de-identification service supports more than 50 languages, the file processing service supports a restricted list of languages. See supported languages for more details.

Document File Types

File Type Extension Content Type PII Removal Type Added In Beta
PDF doc .pdf application/pdf Blur 3.0.0
JSON file .json application/json Redaction Marker 3.1.0
XML file .xml application/xml Redaction Marker 3.1.0
CSV file .csv text/csv Redaction Marker 3.1.0
Word Doc .doc application/msword Redaction Marker + Blur 3.1.0
Word Open XML Doc .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document Redaction Marker + Blur 3.1.0
Email file .eml message/rfc822 Redaction Marker 3.1.1
Text file .txt text/plain Redaction Marker 3.1.1
Excel workbook .xls application/vnd.ms-excel Redaction Marker + Blur 3.2.0
Excel Open XML spreadsheet .xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Redaction Marker + Blur 3.2.0
PowerPoint files .ppt application/vnd.ms-powerpoint Redaction Marker + Blur 3.5.0 x
PowerPoint Open XML files .pptx application/vnd.openxmlformats-officedocument.presentationml.presentation Redaction Marker + Blur 3.5.0 x
DICOM file (Beta) .dcm application/dicom Redaction Marker + Blur 3.4.0 x

Image File Types

File Type Extension Content Type PII Removal Type
JPEG image .jpg, .jpeg image/jpg, image/jpeg Blur
TIFF image .tif, .tiff image/tif, image/tiff Blur
PNG image .png image/png Blur
BMP image .bmp image/bmp, image/x-ms-bmp Blur

Audio File Types

File Type Extension Content Type PII Removal Type
wave audio file .wav audio/wav Bleep
mp3 audio file .mp3 audio/mpeg, audio/mp3 Bleep
mp4 audio file .mp4 audio/mp4 Bleep
m4a audio file .m4a audio/m4a Bleep
webm audio file .webm audio/webm Bleep
VOX files

.vox files are not natively supported in the Private AI container, but can be processed by converting the .vox file to a wav or mp3 using a conversion tool like SoX

Because .vox files are headerless, you will need to know the sample rate and encoding to specify.

For example, to take a vox file with a sample rate 8000, mono channel, mu-law encoded: sox -t raw -r 8000 -c 1 -e mu-law myfile.vox myfile.wav

to generate a wav file.

PII Removal Types

Method Name Description
Blur Applicable to image and PDF formats, PII is blurred out.
Redaction Marker PII is replaced by markers, like [NAME_1]. This is the same behaviour as the text endpoint.
Redaction Marker + Blur A combined mode for file formats that might include text and embedded images.
Bleep Applicable to audio formats, PII is replaced with a bleep.
© Copyright 2024 Private AI.