Private AI の使用を開始する

このガイドでは、Private AI API の主な機能について説明します。まず基本的なテキスト処理の使用から始め、ファイル処理へとトピックを進めていきます。ファイル内の PII を秘匿化する

info

なお、このガイドでは Private AI クラウド API を実際に利用しながら確認を進めることができます。こちらからサインアップして下さい。無償 API キーの取得

Private AI コンテナのセットアップに進む場合はこちらをご覧ください。コンテナクイックスタートコンテナ利用時には API エンドポイントをコンテナ環境に合わせてサンプルコードを実行して下さい。

テキストの秘匿化処理

process/text エンドポイントはテキストのリストを受け取ります。テキスト内の PII 情報が検出され、リクエスト送信時のパラメータ (MARKER、MASKなど) によって指定された形で秘匿化されレスポンスが返されます。

Request BodycURLPythonPython Client

Copy

Copied

{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ]
}

Copy

Copied

curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ]
}'

Copy

Copied

import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
        ]
    },
)

results = r.json()

print(results)

Copy

Copied

from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

text_request = request_objects.process_text_obj(text=["Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."])
response = client.process_text(text_request)

print(response.processed_text)

このサンプルでのレスポンスには以下が含まれます。

processed_text 秘匿化されたテキスト情報。秘匿化の種類はリクエスト送信時に渡す同名のパラメータ processed_text によって指定することができます。
entities 検出された PII 情報のテキストがどのエンティティ (ラベル) として判定されたのか (NER - Named Entity Recognition) を参照することができます。

Copy

Copied

[
  {
    "processed_text": "Thank you for calling the [ORGANIZATION_1]. My name is miss [NAME_GIVEN_1], and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].",
    "entities": [
      {
        "processed_text": "ORGANIZATION_1",
        "text": "Georgia Division of Transportation",
        "location": {
          "stt_idx": 26,
          "end_idx": 60,
          "stt_idx_processed": 26,
          "end_idx_processed": 42
        },
        "best_label": "ORGANIZATION",
        "labels": {
          "LOCATION_STATE": 0.2403,
          "LOCATION": 0.2342,
          "ORGANIZATION": 0.8967
        }
      },
      {
        "processed_text": "NAME_GIVEN_1",
        "text": "Johanna",
        "location": {
          "stt_idx": 78,
          "end_idx": 85,
          "stt_idx_processed": 60,
          "end_idx_processed": 74
        },
        "best_label": "NAME_GIVEN",
        "labels": {
          "NAME_GIVEN": 0.9127,
          "NAME": 0.9018
        }
      },
      {
        "processed_text": "SSN_1",
        "text": "614-5555 01",
        "location": {
          "stt_idx": 203,
          "end_idx": 214,
          "stt_idx_processed": 192,
          "end_idx_processed": 199
        },
        "best_label": "SSN",
        "labels": {
          "SSN": 0.913
        }
      }
    ],
    "entities_present": true,
    "characters_processed": 215,
    "languages_detected": {
      "en": 0.920992910861969
    }
  }
]

エンティティ検出対象をカスタマイズする

ここまでのサンプルは全てのエンティティ (beta エンティティ以外) を秘匿化しましたが、秘匿化の対象エンティティ (PII のタイプ) を選択 (あるいは除外) することができます。対象エンティティの選択以下の例ではSSN のみを秘匿化対象にします。

Request BodycURLPythonPython Client

Copy

Copied

{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ],
    "entity_detection": {
        "entity_types": [
            {
                "type": "ENABLE",
                "value": [
                    "SSN"
                ]
            }
        ]
    }
}

Copy

Copied

curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ],
    "entity_detection": {
        "entity_types": [
            {
                "type": "ENABLE",
                "value": [
                    "SSN"
                ]
            }
        ]
    }
}'

Copy

Copied

import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
        ],
        "entity_detection": {"entity_types": [{"type": "ENABLE", "value": ["SSN"]}]},
    },
)

results = r.json()
print(results)

Copy

Copied

from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

entity_detection_object = request_objects.entity_detection_obj(entity_types=[request_objects.entity_type_selector_obj(type="ENABLE", value=["SSN"])])
text_request = request_objects.process_text_obj(text=["Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."], entity_detection=entity_detection_object)
response = client.process_text(text_request)

print(response.processed_text)

結果は以下のようになります。

Redacted TextFull Response

Copy

Copied

Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].

Copy

Copied

[
  {
    "processed_text": "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].",
    "entities": [
      {
        "processed_text": "SSN_1",
        "text": "614-5555 01",
        "location": {
          "stt_idx": 203,
          "end_idx": 214,
          "stt_idx_processed": 203,
          "end_idx_processed": 210
        },
        "best_label": "SSN",
        "labels": {
          "SSN": 0.913
        }
      }
    ],
    "entities_present": true,
    "characters_processed": 215,
    "languages_detected": {
      "en": 0.920992910861969
    }
  }
]

Regex によるフィルタリングの追加

PII の検出と秘匿化の対象を Regex によってフィルタリングすることも可能です。フィルタリングの設定企業固有の特定の書式を持つ PII 例えば従業員 ID、内部のデータベース ID、文書 ID 等の情報を秘匿化対象 (あるいは除外) を定義できます。

この例では対象エンティティの選択とフィルタリングの設定を同時に指定します。ある従業員からの、怪我とそれに伴い出勤が困難になりそうだという人事関連の申告があった際のログテキストを処理してみます。

2 つの Regex フィルタリングの設定で、それぞれ指定するパターンに合致するものを EMPLOYEE ID と BUSINESS UNIT というカスタムエンティティとして秘匿化します。
INJURY については、怪我の状態情報は保険請求等の事務手続きに重要なため秘匿化せず閲覧したいというシナリオとします。
この例での text はリストですが、一連の口頭でのやり取りとなっており、リスト間に関連があるため link_batch を true に設定しています。
連番を付けない MARKER を指定しています。

Request BodycURLPythonPython Client

Copy

Copied

{
  "text": [
    "Hello Xavier, can you tell me your employee ID?",
    "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
    "Okay, thanks Xavier, why are you calling today?",
    "I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
    "Oh, so sorry to hear that! How can we help?",
    "I won't be able to come back to the office in NYC for a while",
    "No problem Xavier, I will enter a short term work from home for you. You're all set!",
    "Thanks so much Carole!"
  ],
  "link_batch": true,
  "entity_detection": {
    "entity_types": [
      {
        "type": "DISABLE",
        "value": ["INJURY"]
      }
    ],
    "filter": [
      {
        "type": "BLOCK",
        "entity_type": "EMPLOYEE_ID",
        "pattern": "GID-\\d{5}"
      },
      {
        "type": "BLOCK",
        "entity_type": "BUSINESS_UNIT",
        "pattern": "Best Corp"
      }
    ],
    "return_entity": true
  },
  "processed_text": {
    "type": "MARKER",
    "pattern": "[UNIQUE_HASHED_ENTITY_TYPE]"
  }
}

Copy

Copied

curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '
{
    "text": [
        "Hello Xavier, can you tell me your employee ID?",
        "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
        "Okay, thanks Xavier, why are you calling today?",
        "I broke my right leg on the 31st and I''m waiting for my x-ray results. dr. zhang, mercer health centre.",
        "Oh, so sorry to hear that! How can we help?",
        "I won''t be able to come back to the office in NYC for a while",
        "No problem Xavier, I will enter a short term work from home for you. You''re all set!",
        "Thanks so much Carole!"
    ],
    "link_batch": true,
    "entity_detection": {
        "entity_types": [
            {
                "type": "DISABLE",
                "value": [
                    "INJURY"
                ]
            }
        ],
        "filter": [
            {
                "type": "BLOCK",
                "entity_type": "EMPLOYEE_ID",
                "pattern": "GID-\\d{5}"
            },
            {
                "type": "BLOCK",
                "entity_type": "BUSINESS_UNIT",
                "pattern": "Best Corp"
            }
        ],
        "return_entity": true
    },
    "processed_text": {
        "type": "MARKER",
        "pattern": "[UNIQUE_HASHED_ENTITY_TYPE]"
    }
}'

Copy

Copied

import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Hello Xavier, can you tell me your employee ID?",
            "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
            "Okay, thanks Xavier, why are you calling today?",
            "I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
            "Oh, so sorry to hear that! How can we help?",
            "I won't be able to come back to the office in NYC for a while",
            "No problem Xavier, I will enter a short term work from home for you. You're all set!",
            "Thanks so much Carole!",
        ],
        "link_batch": True,
        "entity_detection": {
            "entity_types": [{"type": "DISABLE", "value": ["INJURY"]}],
            "filter": [
                {
                    "type": "BLOCK",
                    "entity_type": "EMPLOYEE_ID",
                    "pattern": "GID-\\d{5}",
                },
                {
                    "type": "BLOCK",
                    "entity_type": "BUSINESS_UNIT",
                    "pattern": "Best Corp",
                },
            ],
            "return_entity": True,
        },
        "processed_text": {"type": "MARKER", "pattern": "[UNIQUE_HASHED_ENTITY_TYPE]"},
    },
)

results = r.json()
print(results)

Copy

Copied

from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

filter_employee = request_objects.filter_selector_obj(type="BLOCK", entity_type="EMPLOYEE_ID", pattern="GID-\\d{5}")
filter_bu = request_objects.filter_selector_obj(type="BLOCK", entity_type="BUSINESS_UNIT", pattern="Best Corp")
entity_detection_object = request_objects.entity_detection_obj(entity_types=[request_objects.entity_type_selector_obj(type="DISABLE", value=["INJURY"])],
                                                               filter=[filter_employee, filter_bu],
                                                               return_entity=True)

processed_text_object = request_objects.processed_text_obj(type="MARKER", pattern="[BEST_ENTITY_TYPE]")

text_request = request_objects.process_text_obj(text=[
        "Hello Xavier, can you tell me your employee ID?",
        "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
        "Okay, thanks Xavier, why are you calling today?",
        "I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
        "Oh, so sorry to hear that! How can we help?",
        "I won't be able to come back to the office in NYC for a while",
        "No problem Xavier, I will enter a short term work from home for you. You're all set!",
        "Thanks so much Carole!"
    ],
    link_batch=True,
    entity_detection=entity_detection_object,
    processed_text=processed_text_object,
)
response = client.process_text(text_request)

print(response.processed_text)

結果は以下となります。

Copy

Copied

['Hello [NAME_GIVEN], can you tell me your employee ID?', 'Yep, my [BUSINESS_UNIT] ID is [EMPLOYEE_ID], and my SIN is [SSN]', 'Okay, thanks [NAME_GIVEN], why are you calling today?', "I broke my right leg on the [DATE] and I'm waiting for my [MEDICAL_PROCESS] results. [NAME_MEDICAL_PROFESSIONAL], [ORGANIZATION_MEDICAL_FACILITY].", 'Oh, so sorry to hear that! How can we help?', "I won't be able to come back to the office in [LOCATION_CITY] for a while", "No problem [NAME_GIVEN], I will enter a short term work from home for you. You're all set!", 'Thanks so much [NAME_GIVEN]!']

大規模言語モデルとの連携: PrivateGPT

PrivateGPT では Private AI の秘匿化/復号化機能によって、クラウド上の大規模言語モデルとの安全でシームレスなやり取りをサポートします。

PrivateGPT の処理概要

さらに詳細な説明については PrivateGPT ユーザーガイド及び大規模言語モデルとのインテグレーションをご参照ください。

合成エンティティの生成 (Beta)

MARKER、トークン、MASK 文字による秘匿化の他に、Private AI では偽の合成語を生成し秘匿化を行うことができます。機械学習ベースのアプローチにより、周辺テキストから現実的な合成語を生成します。以下の利点が考えられます。

全く新しい合成語を生成するようなシステムと異なり、 Private AI では元の語句と同等の合成語を生成します。これにより元の文脈などを損なう率を低くし、例えばセンチメント分析等に悪影響を与えない運用が期待できます。
PII 検出のマーケットリーダーである一方で、検出成功率 100% を実現することは困難です。合成語生成を組み合わせることにより、復号化の試みを防ぐより強固な PII 保護を実現することができます。
合成語生成により、自然なアウトプットを期待できるため、ワークフローの先の機械学習システム等への予期しない影響を低減することが可能です。

合成語生成には processed_text パラメータに SYNTHETIC を指定します。(現在テキスト処理のみベータ版として対応)

Request BodycURLPythonPython Client

Copy

Copied

{
    "text": [
      "Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
    ],
    "processed_text": {
      "type": "SYNTHETIC"
    }
}

Copy

Copied

curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
    "text": [
      "Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
    ],
    "processed_text": {
      "type": "SYNTHETIC"
    }
}'

Copy

Copied

import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
        ],
        "processed_text": {"type": "SYNTHETIC"},
    },
)

results = r.json()
print(results)

Copy

Copied

from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

text_request = request_objects.process_text_obj(text=["Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."],
                                                processed_text=request_objects.processed_text_obj(type="SYNTHETIC"))
response = client.process_text(text_request)

print(response.processed_text)

結果は以下となります。

Redacted TextFull Response

Copy

Copied

Hello, my name is Ben. I am the aunt of Michael Morley. We live in Ekshaku, Sweden.

Copy

Copied

[
  {
    "processed_text": "Hello, my name is Ben. I am the aunt of Michael Morley. We live in Ekshaku, Sweden.",
    "entities": [
      {
        "processed_text": "Ben",
        "text": "May",
        "location": {
          "stt_idx": 18,
          "end_idx": 21,
          "stt_idx_processed": 18,
          "end_idx_processed": 21
        },
        "best_label": "NAME_GIVEN",
        "labels": {
          "NAME_GIVEN": 0.9234,
          "NAME": 0.8903
        }
      },
      {
        "processed_text": "Michael Morley",
        "text": "Jessica Parker",
        "location": {
          "stt_idx": 40,
          "end_idx": 54,
          "stt_idx_processed": 40,
          "end_idx_processed": 54
        },
        "best_label": "NAME",
        "labels": {
          "NAME_GIVEN": 0.4595,
          "NAME": 0.9178,
          "NAME_FAMILY": 0.4567
        }
      },
      {
        "processed_text": "Ekshaku, Sweden",
        "text": "Toronto, Canada",
        "location": {
          "stt_idx": 67,
          "end_idx": 82,
          "stt_idx_processed": 67,
          "end_idx_processed": 82
        },
        "best_label": "LOCATION",
        "labels": {
          "LOCATION_CITY": 0.3177,
          "LOCATION": 0.9268,
          "LOCATION_COUNTRY": 0.3185
        }
      }
    ],
    "entities_present": true,
    "characters_processed": 83,
    "languages_detected": {
      "en": 0.8507365584373474
    }
  }
]

Private AI の使用を開始する

info

テキストの秘匿化処理

関連テキストのリンク

エンティティ検出対象をカスタマイズする

Regex によるフィルタリングの追加

大規模言語モデルとの連携: PrivateGPT

合成エンティティの生成 (Beta)