Supported File Types

Private AI can support multiple file types for de-identification. The complete list of supported file types is below. New file types are continually being added, please contact us if you require a file type not in the list below.

Private AI’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Entity Types page provides a more detailed look at entities.

Supported Languages

Note that while Private AI text de-identification service supports more than 50 languages, the file processing service supports a restricted list of languages. See supported languages for more details.

Document File Types

File Type Extension Content Type Added In Object Detection Support Beta
PDF .pdf application/pdf 3.0.0 x
JSON .json application/json 3.1.0
XML .xml application/xml 3.1.0
CSV .csv text/csv 3.1.0
Word .doc application/msword 3.1.0 x (partially)
Word Open XML .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document 3.1.0 x (partially)
Text .txt text/plain 3.1.1
Excel .xls application/vnd.ms-excel 3.2.0 x (partially)
Excel Open XML .xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 3.2.0 x (partially)
PowerPoint .ppt application/vnd.ms-powerpoint 3.5.0 x (partially)
PowerPoint Open XML .pptx application/vnd.openxmlformats-officedocument.presentationml.presentation 3.5.0 x (partially)
DICOM .dcm application/dicom 3.4.0 x
A note on object detection support in Office documents

Object detection in Office files is partially supported. Embedded images in Office documents are only processed for object detection and redaction if they are compatible with our deidentifier. Non-compatible images are replaced with a black placeholder image. This ensures that sensitive data in non-supported formats is always processed, although it is not redacted with the same level of precision as data in supported formats.

Additionally:

  • The bounding box coordinates within the location field ( x0 , x1 , y0 , y1 ) are relative to the embedded image itself, unlike in PDFs, where they are relative to the document page.
  • The page field value remains 0 for Office files, as page numbering is not currently implemented for this file type.

Image File Types

File Type Extension Content Type Added In Object Detection Support
JPEG .jpg, .jpeg image/jpg, image/jpeg 3.0.0 x
TIFF .tif, .tiff image/tif, image/tiff 3.0.0 x
PNG .png image/png 3.4.0 x
BMP .bmp image/bmp, image/x-ms-bmp 3.4.0 x

Audio File Types

File Type Extension Content Type Added In
wave .wav audio/wav 3.0.0
mp3 .mp3 audio/mpeg, audio/mp3 3.0.0
mp4 .mp4 audio/mp4 3.0.0
m4a .m4a audio/m4a 3.5.0
webm .webm audio/webm 3.5.0
VOX files

.vox files are not natively supported in the Private AI container, but can be processed by converting the .vox file to a wav or mp3 using a conversion tool like SoX

Because .vox files are headerless, you will need to know the sample rate and encoding to specify.

For example, to take a vox file with a sample rate 8000, mono channel, mu-law encoded: sox -t raw -r 8000 -c 1 -e mu-law myfile.vox myfile.wav

to generate a wav file.

© Copyright 2024 Private AI.