Processing CSV Files

Private AI supports scanning Comma Separated Value (CSV) files for PII and creating de-identified or redacted copies. Private AI’s supported entity types function across each file type, with localized variants of different PII (Personally Identifiable Information) entities, PHI (Protected Health Information) entities, and PCI (Payment Card Industry) entities being detected. Our Supported Languages and Supported Entity Types page provides a more detailed look.

How CSV Files Are Processed

Similar to Excel files, CSV files are processed using the method described for Tabular Data in the Structured Data Guide. The output file retains its original format with labels in place of the detected PII.

Constraints

info

Please consider writing a handler for your specific application using the Structured Data Guide to get around any of the constraints listed below.

The file processing routes are synchronous, meaning that large files over 10MB in size may take a long time to process.
The data in the CSV file must be row-oriented (i.e. each row represents a separate record) and the headers must be on the first row.
Files must adhere to the csv file standards .

Support Matrix

	CPU Container	GPU Container	Community API	Professional API	PrivateGPT UI
Supported?	Yes	Yes	No	Yes	No

Sample Request

info

Please sign up for a free API key to run this code.

Request BodycURLPythonPython Client

Copy

Copied

{
    "file": {
        "data": file_content_base64,
        "content_type": "text/csv",
    },
    "entity_detection": {
        "return_entity": True
    }
}

Copy

Copied

echo '{
          "file": {"data": "'$(base64 -w 0 sample.csv)'", 
          "content_type": "text/csv"}, 
          "entity_detection": {"return_entity": "True"}
      }' \
| curl --request POST --url 'https://api.private-ai.com/community/v3/process/files/base64' \
       -H 'Content-Type: application/json' \
       -H 'x-api-key: <YOUR KEY HERE>' \
       -d @- \
       | jq -r .processed_file \
       | base64 -d > 'sample.redacted.csv'

Copy

Copied

import requests
import base64

file_url = "https://paidocumentation.blob.core.windows.net/$web/sample.csv"
filename_out = "/path/to/output/sample.redacted.csv"
file_content = requests.get(file_url).content
file_content_base64 = base64.b64encode(file_content).decode()

headers = {"Content-Type": "application/json", "x-api-key": "<INSERT API KEY>"}

url = "https://api.private-ai.com/community/v3/process/files/base64"

payload = {
  "file":{
    "data": file_content_base64,
    "content_type": "text/csv",
  },
  "entity_detection": {
    "return_entity": True
  }
}

response = requests.post(url, json=payload, headers=headers)
with open(filename_out, "wb") as f:
    f.write(base64.b64decode(response.json()["processed_file"]))

Copy

Copied

from privateai_client import PAIClient
from privateai_client.objects import request_objects
import base64

filename_in = "sample.csv"
filename_out = "sample.redacted.csv"

file_type= "text/csv"
client = PAIClient(url="https://api.private-ai.com/community/", api_key="<YOUR API KEY>")

with open(filename_in, "rb") as b64_file:
    file_data = base64.b64encode(b64_file.read())
    file_data = file_data.decode("ascii")

file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
request_obj = request_objects.file_base64_obj(file=file_obj)
resp = client.process_files_base64(request_object=request_obj)

with open(filename_out, 'wb') as redacted_file:
    processed_file = resp.processed_file.encode("ascii")
    processed_file = base64.b64decode(processed_file, validate=True)
    redacted_file.write(processed_file)

Sample Response

Copy

Copied

"processed_file": "Base64 Encoded File Content of the Redacted File",
"processed_text":"string",
"entities":"List[Entity]",
"entities_present":true,
"languages_detected":{"lang_1":0.67, "lang_2", 0.74}