Audio File Deidentification

Private AI supports de-identification of audio files. Private AI’s supported entity types function across each file type, with multilingual equivalents of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Entity Types page provides a more detailed look at entities.

How Audio files are handled

Private AI achieves Audio file de-identification by detecting and masking entities. The resulting de-identified file has the same format as the initial file. Optionally, the audio file can also be distorted.

Parameters

Below are the parameters that control the behaviour of the Audio De-identifier. These parameters shall be specified under audio_options.

Parameter Explanation Default
bleep_start_padding Padding to add at the start of the beep (in seconds) 0.5
bleep_end_padding Padding to add at the end of the beep (in seconds) 0.2
distort_audio Whether to distort the given audio file. False
distortion_steps Specifies how the distortion will be made. Providing a value more than 0 will result in a higher tone and a coefficient less than 0 will result in a lower tone. 0
bleep_frequency The bleep_frequency parameter configures the frequency of the sine wave used for the bleep sound in an audio segment. This setting allows users to adjust the pitch of the bleep, with higher values resulting in a higher pitch and vice versa. Ideal for customizing the bleep tone to suit various audio environments, it is expressed in Hertz (Hz) and should be chosen considering the balance and clarity needed in the audio. 600
bleep_gain The bleep_gain parameter sets the gain level, in decibels (dB), for the bleep sound within the audio segment. It controls the relative loudness of the bleep, allowing for precise volume adjustments. A value of 0.0 dB maintains the original amplitude of the bleep, positive values increase its loudness, and negative values decrease it. -3.0

Sample Request

Copy
Copied
import requests
import base64

file_url = "https://paidocumentation.blob.core.windows.net/$web/sample.wav"
file_content = requests.get(file_url).content
file_content_base64 = base64.b64encode(file_content).decode("ascii")

url = "https://api.private-ai.com/deid/v3/process/files/base64"

payload = {
  "data": file_content_base64,
  "content_type": "audio/wav",
  "entity_detection": {
    "accuracy": "high",
    "return_entity": True
  },
  "audio_options":{
    "bleep_start_padding": 0,
    "bleep_end_padding": 0
  }
}

response = requests.post(url, json=payload)

Sample Response

Copy
Copied
"processed_file": "Base64 Encoded File Content of the Redacted File",
"processed_text":"string",
"entities":"List[Entity]",
"entities_present":true,
"languages_detected":{"lang_1":0.67, "lang_2", 0.74}

Audio Support

CPU Container GPU Container Demo API Prod API PrivateGPT UI
Supported? Yes No No No No

Supported languages

See supported languages for details.

© Copyright 2024 Private AI.