Private AI の使用を開始する

このガイドでは、Private AI API の主な機能について説明します。まず基本的なテキスト処理の使用から始め、ファイル処理へとトピックを進めていきます。ファイル内の PII を秘匿化する

info

なお、このガイドでは Private AI クラウド API を実際に利用しながら確認を進めることができます。こちらからサインアップして下さい。無償 API キーの取得

Private AI コンテナのセットアップに進む場合はこちらをご覧ください。コンテナ クイックスタート コンテナ利用時には API エンドポイントをコンテナ環境に合わせてサンプルコードを実行して下さい。

テキストの秘匿化処理

process/text エンドポイントはテキストのリストを受け取ります。テキスト内の PII 情報が検出され、リクエスト送信時のパラメータ (MARKERMASKなど) によって指定された形で秘匿化されレスポンスが返されます。

Request BodycURLPythonPython Client
Copy
Copied
{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ]
}
Copy
Copied
curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ]
}'
Copy
Copied
import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
        ]
    },
)

results = r.json()

print(results)
Copy
Copied
from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

text_request = request_objects.process_text_obj(text=["Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."])
response = client.process_text(text_request)

print(response.processed_text)

このサンプルでのレスポンスには以下が含まれます。

  • processed_text 秘匿化されたテキスト情報。秘匿化の種類はリクエスト送信時に渡す同名のパラメータ processed_text によって指定することができます。
  • entities 検出された PII 情報のテキストがどのエンティティ (ラベル) として判定されたのか (NER - Named Entity Recognition) を参照することができます。
Copy
Copied
[
  {
    "processed_text": "Thank you for calling the [ORGANIZATION_1]. My name is miss [NAME_GIVEN_1], and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].",
    "entities": [
      {
        "processed_text": "ORGANIZATION_1",
        "text": "Georgia Division of Transportation",
        "location": {
          "stt_idx": 26,
          "end_idx": 60,
          "stt_idx_processed": 26,
          "end_idx_processed": 42
        },
        "best_label": "ORGANIZATION",
        "labels": {
          "LOCATION_STATE": 0.2403,
          "LOCATION": 0.2342,
          "ORGANIZATION": 0.8967
        }
      },
      {
        "processed_text": "NAME_GIVEN_1",
        "text": "Johanna",
        "location": {
          "stt_idx": 78,
          "end_idx": 85,
          "stt_idx_processed": 60,
          "end_idx_processed": 74
        },
        "best_label": "NAME_GIVEN",
        "labels": {
          "NAME_GIVEN": 0.9127,
          "NAME": 0.9018
        }
      },
      {
        "processed_text": "SSN_1",
        "text": "614-5555 01",
        "location": {
          "stt_idx": 203,
          "end_idx": 214,
          "stt_idx_processed": 192,
          "end_idx_processed": 199
        },
        "best_label": "SSN",
        "labels": {
          "SSN": 0.913
        }
      }
    ],
    "entities_present": true,
    "characters_processed": 215,
    "languages_detected": {
      "en": 0.920992910861969
    }
  }
]

関連テキストのリンク

エンドポイントに送信するテキストのリストを、関連する一つのまとまり (バッチ) として処理させることができます。(link_batich パラメータ)

Request BodycURLPythonPython Client
Copy
Copied
{
   "text": [
      "My phone number is",
      "2345435",
   ],
   "link_batch": True,
}
Copy
Copied
curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR API KEY>' \
--data '{
   "text": [
      "My phone number is",
      "2345435"
   ],
   "link_batch": true
}'
Copy
Copied
import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "My phone number is",
            "2345435",
        ],
        "link_batch": True,
    },
)

results = r.json()
print(results)
Copy
Copied
from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

text_request = request_objects.process_text_obj(text=["My phone number is", "2345435"], link_batch=True)
response = client.process_text(text_request)

print(response.processed_text)

link_batch により、Private AI の言語モデルは My phone number is2345435 という 2 つの要素としてではなく、一つのまとまりとして My phone number is 2345435 を扱います。ここでは電話番号としての検出を強固にします。

Redacted TextFull Response
Copy
Copied
["My phone number is", "[PHONE_NUMBER_1]"]
Copy
Copied
[
  {
    "processed_text": "My phone number is",
    "entities": [],
    "entities_present": false,
    "characters_processed": 18,
    "languages_detected": {
      "en": 0.8986189365386963
    }
  },
  {
    "processed_text": "[PHONE_NUMBER_1]",
    "entities": [
      {
        "processed_text": "PHONE_NUMBER_1",
        "text": "2345435",
        "location": {
          "stt_idx": 0,
          "end_idx": 7,
          "stt_idx_processed": 0,
          "end_idx_processed": 16
        },
        "best_label": "PHONE_NUMBER",
        "labels": {
          "PHONE_NUMBER": 0.9166
        }
      }
    ],
    "entities_present": true,
    "characters_processed": 7,
    "languages_detected": {}
  }
]

エンティティ検出対象をカスタマイズする

ここまでのサンプルは全てのエンティティ (beta エンティティ以外) を秘匿化しましたが、秘匿化の対象エンティティ (PII のタイプ) を選択 (あるいは除外) することができます。 対象エンティティの選択 以下の例ではSSN のみを秘匿化対象にします。

Request BodycURLPythonPython Client
Copy
Copied
{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ],
    "entity_detection": {
        "entity_types": [
            {
                "type": "ENABLE",
                "value": [
                    "SSN"
                ]
            }
        ]
    }
}
Copy
Copied
curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
    "text": [
        "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
    ],
    "entity_detection": {
        "entity_types": [
            {
                "type": "ENABLE",
                "value": [
                    "SSN"
                ]
            }
        ]
    }
}'
Copy
Copied
import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
        ],
        "entity_detection": {"entity_types": [{"type": "ENABLE", "value": ["SSN"]}]},
    },
)

results = r.json()
print(results)
Copy
Copied
from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

entity_detection_object = request_objects.entity_detection_obj(entity_types=[request_objects.entity_type_selector_obj(type="ENABLE", value=["SSN"])])
text_request = request_objects.process_text_obj(text=["Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."], entity_detection=entity_detection_object)
response = client.process_text(text_request)

print(response.processed_text)

結果は以下のようになります。

Redacted TextFull Response
Copy
Copied
Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].
Copy
Copied
[
  {
    "processed_text": "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].",
    "entities": [
      {
        "processed_text": "SSN_1",
        "text": "614-5555 01",
        "location": {
          "stt_idx": 203,
          "end_idx": 214,
          "stt_idx_processed": 203,
          "end_idx_processed": 210
        },
        "best_label": "SSN",
        "labels": {
          "SSN": 0.913
        }
      }
    ],
    "entities_present": true,
    "characters_processed": 215,
    "languages_detected": {
      "en": 0.920992910861969
    }
  }
]

Regex によるフィルタリングの追加

PII の検出と秘匿化の対象を Regex によってフィルタリングすることも可能です。 フィルタリングの設定 企業固有の特定の書式を持つ PII 例えば従業員 ID、内部のデータベース ID、文書 ID 等の情報を秘匿化対象 (あるいは除外) を定義できます。

この例では 対象エンティティの選択フィルタリングの設定 を同時に指定します。ある従業員からの、怪我とそれに伴い出勤が困難になりそうだという人事関連の申告があった際のログテキストを処理してみます。

  • 2 つの Regex フィルタリングの設定 で、それぞれ指定するパターンに合致するものを EMPLOYEE ID と BUSINESS UNIT というカスタムエンティティとして秘匿化します。
  • INJURY については、怪我の状態情報は保険請求等の事務手続きに重要なため秘匿化せず閲覧したいというシナリオとします。
  • この例での text はリストですが、一連の口頭でのやり取りとなっており、リスト間に関連があるため link_batchtrue に設定しています。
  • 連番を付けない MARKER を指定しています。
Request BodycURLPythonPython Client
Copy
Copied
{
  "text": [
    "Hello Xavier, can you tell me your employee ID?",
    "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
    "Okay, thanks Xavier, why are you calling today?",
    "I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
    "Oh, so sorry to hear that! How can we help?",
    "I won't be able to come back to the office in NYC for a while",
    "No problem Xavier, I will enter a short term work from home for you. You're all set!",
    "Thanks so much Carole!"
  ],
  "link_batch": true,
  "entity_detection": {
    "entity_types": [
      {
        "type": "DISABLE",
        "value": ["INJURY"]
      }
    ],
    "filter": [
      {
        "type": "BLOCK",
        "entity_type": "EMPLOYEE_ID",
        "pattern": "GID-\\d{5}"
      },
      {
        "type": "BLOCK",
        "entity_type": "BUSINESS_UNIT",
        "pattern": "Best Corp"
      }
    ],
    "return_entity": true
  },
  "processed_text": {
    "type": "MARKER",
    "pattern": "[UNIQUE_HASHED_ENTITY_TYPE]"
  }
}
Copy
Copied
curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '
{
    "text": [
        "Hello Xavier, can you tell me your employee ID?",
        "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
        "Okay, thanks Xavier, why are you calling today?",
        "I broke my right leg on the 31st and I''m waiting for my x-ray results. dr. zhang, mercer health centre.",
        "Oh, so sorry to hear that! How can we help?",
        "I won''t be able to come back to the office in NYC for a while",
        "No problem Xavier, I will enter a short term work from home for you. You''re all set!",
        "Thanks so much Carole!"
    ],
    "link_batch": true,
    "entity_detection": {
        "entity_types": [
            {
                "type": "DISABLE",
                "value": [
                    "INJURY"
                ]
            }
        ],
        "filter": [
            {
                "type": "BLOCK",
                "entity_type": "EMPLOYEE_ID",
                "pattern": "GID-\\d{5}"
            },
            {
                "type": "BLOCK",
                "entity_type": "BUSINESS_UNIT",
                "pattern": "Best Corp"
            }
        ],
        "return_entity": true
    },
    "processed_text": {
        "type": "MARKER",
        "pattern": "[UNIQUE_HASHED_ENTITY_TYPE]"
    }
}'
Copy
Copied
import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Hello Xavier, can you tell me your employee ID?",
            "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
            "Okay, thanks Xavier, why are you calling today?",
            "I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
            "Oh, so sorry to hear that! How can we help?",
            "I won't be able to come back to the office in NYC for a while",
            "No problem Xavier, I will enter a short term work from home for you. You're all set!",
            "Thanks so much Carole!",
        ],
        "link_batch": True,
        "entity_detection": {
            "entity_types": [{"type": "DISABLE", "value": ["INJURY"]}],
            "filter": [
                {
                    "type": "BLOCK",
                    "entity_type": "EMPLOYEE_ID",
                    "pattern": "GID-\\d{5}",
                },
                {
                    "type": "BLOCK",
                    "entity_type": "BUSINESS_UNIT",
                    "pattern": "Best Corp",
                },
            ],
            "return_entity": True,
        },
        "processed_text": {"type": "MARKER", "pattern": "[UNIQUE_HASHED_ENTITY_TYPE]"},
    },
)

results = r.json()
print(results)
Copy
Copied
from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

filter_employee = request_objects.filter_selector_obj(type="BLOCK", entity_type="EMPLOYEE_ID", pattern="GID-\\d{5}")
filter_bu = request_objects.filter_selector_obj(type="BLOCK", entity_type="BUSINESS_UNIT", pattern="Best Corp")
entity_detection_object = request_objects.entity_detection_obj(entity_types=[request_objects.entity_type_selector_obj(type="DISABLE", value=["INJURY"])],
                                                               filter=[filter_employee, filter_bu],
                                                               return_entity=True)

processed_text_object = request_objects.processed_text_obj(type="MARKER", pattern="[BEST_ENTITY_TYPE]")

text_request = request_objects.process_text_obj(text=[
        "Hello Xavier, can you tell me your employee ID?",
        "Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
        "Okay, thanks Xavier, why are you calling today?",
        "I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
        "Oh, so sorry to hear that! How can we help?",
        "I won't be able to come back to the office in NYC for a while",
        "No problem Xavier, I will enter a short term work from home for you. You're all set!",
        "Thanks so much Carole!"
    ],
    link_batch=True,
    entity_detection=entity_detection_object,
    processed_text=processed_text_object,
)
response = client.process_text(text_request)

print(response.processed_text)

結果は以下となります。

Copy
Copied
['Hello [NAME_GIVEN], can you tell me your employee ID?', 'Yep, my [BUSINESS_UNIT] ID is [EMPLOYEE_ID], and my SIN is [SSN]', 'Okay, thanks [NAME_GIVEN], why are you calling today?', "I broke my right leg on the [DATE] and I'm waiting for my [MEDICAL_PROCESS] results. [NAME_MEDICAL_PROFESSIONAL], [ORGANIZATION_MEDICAL_FACILITY].", 'Oh, so sorry to hear that! How can we help?', "I won't be able to come back to the office in [LOCATION_CITY] for a while", "No problem [NAME_GIVEN], I will enter a short term work from home for you. You're all set!", 'Thanks so much [NAME_GIVEN]!']

大規模言語モデルとの連携: PrivateGPT

PrivateGPT では Private AI の秘匿化/復号化機能 によって、クラウド上の大規模言語モデルとの安全でシームレスなやり取りをサポートします。

PrivateGPT の処理概要

さらに詳細な説明については PrivateGPT ユーザーガイド 及び 大規模言語モデルとのインテグレーション をご参照ください。

合成エンティティの生成 (Beta)

MARKER、トークン、MASK 文字による秘匿化の他に、Private AI では偽の合成語を生成し秘匿化を行うことができます。機械学習ベースのアプローチにより、周辺テキストから現実的な合成語を生成します。以下の利点が考えられます。

  1. 全く新しい合成語を生成するようなシステムと異なり、 Private AI では元の語句と同等の合成語を生成します。これにより元の文脈などを損なう率を低くし、例えばセンチメント分析等に悪影響を与えない運用が期待できます。
  2. PII 検出のマーケットリーダー である一方で、検出成功率 100% を実現することは困難です。合成語生成を組み合わせることにより、復号化の試みを防ぐより強固な PII 保護を実現することができます。
  3. 合成語生成により、自然なアウトプットを期待できるため、ワークフローの先の機械学習システム等への予期しない影響を低減することが可能です。

合成語生成には processed_text パラメータに SYNTHETIC を指定します。(現在テキスト処理のみベータ版として対応)

Request BodycURLPythonPython Client
Copy
Copied
{
    "text": [
      "Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
    ],
    "processed_text": {
      "type": "SYNTHETIC"
    }
}
Copy
Copied
curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
    "text": [
      "Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
    ],
    "processed_text": {
      "type": "SYNTHETIC"
    }
}'
Copy
Copied
import requests

r = requests.post(
    url="https://api.private-ai.com/community/v4/process/text",
    headers={"x-api-key": "<YOUR API KEY>"},
    json={
        "text": [
            "Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
        ],
        "processed_text": {"type": "SYNTHETIC"},
    },
)

results = r.json()
print(results)
Copy
Copied
from privateai_client import PAIClient
from privateai_client import request_objects

client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")

text_request = request_objects.process_text_obj(text=["Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."],
                                                processed_text=request_objects.processed_text_obj(type="SYNTHETIC"))
response = client.process_text(text_request)

print(response.processed_text)

結果は以下となります。

Redacted TextFull Response
Copy
Copied
Hello, my name is Ben. I am the aunt of Michael Morley. We live in Ekshaku, Sweden.
Copy
Copied
[
  {
    "processed_text": "Hello, my name is Ben. I am the aunt of Michael Morley. We live in Ekshaku, Sweden.",
    "entities": [
      {
        "processed_text": "Ben",
        "text": "May",
        "location": {
          "stt_idx": 18,
          "end_idx": 21,
          "stt_idx_processed": 18,
          "end_idx_processed": 21
        },
        "best_label": "NAME_GIVEN",
        "labels": {
          "NAME_GIVEN": 0.9234,
          "NAME": 0.8903
        }
      },
      {
        "processed_text": "Michael Morley",
        "text": "Jessica Parker",
        "location": {
          "stt_idx": 40,
          "end_idx": 54,
          "stt_idx_processed": 40,
          "end_idx_processed": 54
        },
        "best_label": "NAME",
        "labels": {
          "NAME_GIVEN": 0.4595,
          "NAME": 0.9178,
          "NAME_FAMILY": 0.4567
        }
      },
      {
        "processed_text": "Ekshaku, Sweden",
        "text": "Toronto, Canada",
        "location": {
          "stt_idx": 67,
          "end_idx": 82,
          "stt_idx_processed": 67,
          "end_idx_processed": 82
        },
        "best_label": "LOCATION",
        "labels": {
          "LOCATION_CITY": 0.3177,
          "LOCATION": 0.9268,
          "LOCATION_COUNTRY": 0.3185
        }
      }
    ],
    "entities_present": true,
    "characters_processed": 83,
    "languages_detected": {
      "en": 0.8507365584373474
    }
  }
]
© Copyright 2024 Private AI.