Private AI の使用を開始する
このガイドでは、Private AI API の主な機能について説明します。まず基本的なテキスト処理の使用から始め、ファイル処理へとトピックを進めていきます。ファイル内の PII を秘匿化する
info
なお、このガイドでは Private AI クラウド API を実際に利用しながら確認を進めることができます。こちらからサインアップして下さい。無償 API キーの取得
Private AI コンテナのセットアップに進む場合はこちらをご覧ください。コンテナ クイックスタート コンテナ利用時には API エンドポイントをコンテナ環境に合わせてサンプルコードを実行して下さい。
テキストの秘匿化処理
process/text
エンドポイントはテキストのリストを受け取ります。テキスト内の PII 情報が検出され、リクエスト送信時のパラメータ (MARKER
、MASK
など) によって指定された形で秘匿化されレスポンスが返されます。
{
"text": [
"Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
]
}
curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
"text": [
"Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
]
}'
import requests
r = requests.post(
url="https://api.private-ai.com/community/v4/process/text",
headers={"x-api-key": "<YOUR API KEY>"},
json={
"text": [
"Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
]
},
)
results = r.json()
print(results)
from privateai_client import PAIClient
from privateai_client import request_objects
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")
text_request = request_objects.process_text_obj(text=["Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."])
response = client.process_text(text_request)
print(response.processed_text)
このサンプルでのレスポンスには以下が含まれます。
-
processed_text
秘匿化されたテキスト情報。秘匿化の種類はリクエスト送信時に渡す同名のパラメータprocessed_text
によって指定することができます。 -
entities
検出された PII 情報のテキストがどのエンティティ (ラベル) として判定されたのか (NER - Named Entity Recognition) を参照することができます。
[
{
"processed_text": "Thank you for calling the [ORGANIZATION_1]. My name is miss [NAME_GIVEN_1], and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].",
"entities": [
{
"processed_text": "ORGANIZATION_1",
"text": "Georgia Division of Transportation",
"location": {
"stt_idx": 26,
"end_idx": 60,
"stt_idx_processed": 26,
"end_idx_processed": 42
},
"best_label": "ORGANIZATION",
"labels": {
"LOCATION_STATE": 0.2403,
"LOCATION": 0.2342,
"ORGANIZATION": 0.8967
}
},
{
"processed_text": "NAME_GIVEN_1",
"text": "Johanna",
"location": {
"stt_idx": 78,
"end_idx": 85,
"stt_idx_processed": 60,
"end_idx_processed": 74
},
"best_label": "NAME_GIVEN",
"labels": {
"NAME_GIVEN": 0.9127,
"NAME": 0.9018
}
},
{
"processed_text": "SSN_1",
"text": "614-5555 01",
"location": {
"stt_idx": 203,
"end_idx": 214,
"stt_idx_processed": 192,
"end_idx_processed": 199
},
"best_label": "SSN",
"labels": {
"SSN": 0.913
}
}
],
"entities_present": true,
"characters_processed": 215,
"languages_detected": {
"en": 0.920992910861969
}
}
]
関連テキストのリンク
エンドポイントに送信するテキストのリストを、関連する一つのまとまり (バッチ) として処理させることができます。(link_batich
パラメータ)
{
"text": [
"My phone number is",
"2345435",
],
"link_batch": True,
}
curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR API KEY>' \
--data '{
"text": [
"My phone number is",
"2345435"
],
"link_batch": true
}'
import requests
r = requests.post(
url="https://api.private-ai.com/community/v4/process/text",
headers={"x-api-key": "<YOUR API KEY>"},
json={
"text": [
"My phone number is",
"2345435",
],
"link_batch": True,
},
)
results = r.json()
print(results)
from privateai_client import PAIClient
from privateai_client import request_objects
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")
text_request = request_objects.process_text_obj(text=["My phone number is", "2345435"], link_batch=True)
response = client.process_text(text_request)
print(response.processed_text)
link_batch
により、Private AI の言語モデルは My phone number is
と 2345435
という 2 つの要素としてではなく、一つのまとまりとして My phone number is 2345435
を扱います。ここでは電話番号としての検出を強固にします。
["My phone number is", "[PHONE_NUMBER_1]"]
[
{
"processed_text": "My phone number is",
"entities": [],
"entities_present": false,
"characters_processed": 18,
"languages_detected": {
"en": 0.8986189365386963
}
},
{
"processed_text": "[PHONE_NUMBER_1]",
"entities": [
{
"processed_text": "PHONE_NUMBER_1",
"text": "2345435",
"location": {
"stt_idx": 0,
"end_idx": 7,
"stt_idx_processed": 0,
"end_idx_processed": 16
},
"best_label": "PHONE_NUMBER",
"labels": {
"PHONE_NUMBER": 0.9166
}
}
],
"entities_present": true,
"characters_processed": 7,
"languages_detected": {}
}
]
エンティティ検出対象をカスタマイズする
ここまでのサンプルは全てのエンティティ (beta エンティティ以外) を秘匿化しましたが、秘匿化の対象エンティティ (PII のタイプ) を選択 (あるいは除外) することができます。 対象エンティティの選択 以下の例ではSSN
のみを秘匿化対象にします。
{
"text": [
"Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
],
"entity_detection": {
"entity_types": [
{
"type": "ENABLE",
"value": [
"SSN"
]
}
]
}
}
curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
"text": [
"Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
],
"entity_detection": {
"entity_types": [
{
"type": "ENABLE",
"value": [
"SSN"
]
}
]
}
}'
import requests
r = requests.post(
url="https://api.private-ai.com/community/v4/process/text",
headers={"x-api-key": "<YOUR API KEY>"},
json={
"text": [
"Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."
],
"entity_detection": {"entity_types": [{"type": "ENABLE", "value": ["SSN"]}]},
},
)
results = r.json()
print(results)
from privateai_client import PAIClient
from privateai_client import request_objects
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")
entity_detection_object = request_objects.entity_detection_obj(entity_types=[request_objects.entity_type_selector_obj(type="ENABLE", value=["SSN"])])
text_request = request_objects.process_text_obj(text=["Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, 614-5555 01."], entity_detection=entity_detection_object)
response = client.process_text(text_request)
print(response.processed_text)
結果は以下のようになります。
Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].
[
{
"processed_text": "Thank you for calling the Georgia Division of Transportation. My name is miss Johanna, and it is a pleasure assisting you today. For security reasons, may I please have your Social Security number? Yes, [SSN_1].",
"entities": [
{
"processed_text": "SSN_1",
"text": "614-5555 01",
"location": {
"stt_idx": 203,
"end_idx": 214,
"stt_idx_processed": 203,
"end_idx_processed": 210
},
"best_label": "SSN",
"labels": {
"SSN": 0.913
}
}
],
"entities_present": true,
"characters_processed": 215,
"languages_detected": {
"en": 0.920992910861969
}
}
]
Regex によるフィルタリングの追加
PII の検出と秘匿化の対象を Regex によってフィルタリングすることも可能です。 フィルタリングの設定 企業固有の特定の書式を持つ PII 例えば従業員 ID、内部のデータベース ID、文書 ID 等の情報を秘匿化対象 (あるいは除外) を定義できます。
この例では 対象エンティティの選択 と フィルタリングの設定 を同時に指定します。ある従業員からの、怪我とそれに伴い出勤が困難になりそうだという人事関連の申告があった際のログテキストを処理してみます。
- 2 つの Regex フィルタリングの設定 で、それぞれ指定するパターンに合致するものを EMPLOYEE ID と BUSINESS UNIT というカスタムエンティティとして秘匿化します。
- INJURY については、怪我の状態情報は保険請求等の事務手続きに重要なため秘匿化せず閲覧したいというシナリオとします。
-
この例での
text
はリストですが、一連の口頭でのやり取りとなっており、リスト間に関連があるためlink_batch
をtrue
に設定しています。 - 連番を付けない MARKER を指定しています。
{
"text": [
"Hello Xavier, can you tell me your employee ID?",
"Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
"Okay, thanks Xavier, why are you calling today?",
"I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
"Oh, so sorry to hear that! How can we help?",
"I won't be able to come back to the office in NYC for a while",
"No problem Xavier, I will enter a short term work from home for you. You're all set!",
"Thanks so much Carole!"
],
"link_batch": true,
"entity_detection": {
"entity_types": [
{
"type": "DISABLE",
"value": ["INJURY"]
}
],
"filter": [
{
"type": "BLOCK",
"entity_type": "EMPLOYEE_ID",
"pattern": "GID-\\d{5}"
},
{
"type": "BLOCK",
"entity_type": "BUSINESS_UNIT",
"pattern": "Best Corp"
}
],
"return_entity": true
},
"processed_text": {
"type": "MARKER",
"pattern": "[UNIQUE_HASHED_ENTITY_TYPE]"
}
}
curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '
{
"text": [
"Hello Xavier, can you tell me your employee ID?",
"Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
"Okay, thanks Xavier, why are you calling today?",
"I broke my right leg on the 31st and I''m waiting for my x-ray results. dr. zhang, mercer health centre.",
"Oh, so sorry to hear that! How can we help?",
"I won''t be able to come back to the office in NYC for a while",
"No problem Xavier, I will enter a short term work from home for you. You''re all set!",
"Thanks so much Carole!"
],
"link_batch": true,
"entity_detection": {
"entity_types": [
{
"type": "DISABLE",
"value": [
"INJURY"
]
}
],
"filter": [
{
"type": "BLOCK",
"entity_type": "EMPLOYEE_ID",
"pattern": "GID-\\d{5}"
},
{
"type": "BLOCK",
"entity_type": "BUSINESS_UNIT",
"pattern": "Best Corp"
}
],
"return_entity": true
},
"processed_text": {
"type": "MARKER",
"pattern": "[UNIQUE_HASHED_ENTITY_TYPE]"
}
}'
import requests
r = requests.post(
url="https://api.private-ai.com/community/v4/process/text",
headers={"x-api-key": "<YOUR API KEY>"},
json={
"text": [
"Hello Xavier, can you tell me your employee ID?",
"Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
"Okay, thanks Xavier, why are you calling today?",
"I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
"Oh, so sorry to hear that! How can we help?",
"I won't be able to come back to the office in NYC for a while",
"No problem Xavier, I will enter a short term work from home for you. You're all set!",
"Thanks so much Carole!",
],
"link_batch": True,
"entity_detection": {
"entity_types": [{"type": "DISABLE", "value": ["INJURY"]}],
"filter": [
{
"type": "BLOCK",
"entity_type": "EMPLOYEE_ID",
"pattern": "GID-\\d{5}",
},
{
"type": "BLOCK",
"entity_type": "BUSINESS_UNIT",
"pattern": "Best Corp",
},
],
"return_entity": True,
},
"processed_text": {"type": "MARKER", "pattern": "[UNIQUE_HASHED_ENTITY_TYPE]"},
},
)
results = r.json()
print(results)
from privateai_client import PAIClient
from privateai_client import request_objects
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")
filter_employee = request_objects.filter_selector_obj(type="BLOCK", entity_type="EMPLOYEE_ID", pattern="GID-\\d{5}")
filter_bu = request_objects.filter_selector_obj(type="BLOCK", entity_type="BUSINESS_UNIT", pattern="Best Corp")
entity_detection_object = request_objects.entity_detection_obj(entity_types=[request_objects.entity_type_selector_obj(type="DISABLE", value=["INJURY"])],
filter=[filter_employee, filter_bu],
return_entity=True)
processed_text_object = request_objects.processed_text_obj(type="MARKER", pattern="[BEST_ENTITY_TYPE]")
text_request = request_objects.process_text_obj(text=[
"Hello Xavier, can you tell me your employee ID?",
"Yep, my Best Corp ID is GID-45434, and my SIN is 690 871 283",
"Okay, thanks Xavier, why are you calling today?",
"I broke my right leg on the 31st and I'm waiting for my x-ray results. dr. zhang, mercer health centre.",
"Oh, so sorry to hear that! How can we help?",
"I won't be able to come back to the office in NYC for a while",
"No problem Xavier, I will enter a short term work from home for you. You're all set!",
"Thanks so much Carole!"
],
link_batch=True,
entity_detection=entity_detection_object,
processed_text=processed_text_object,
)
response = client.process_text(text_request)
print(response.processed_text)
結果は以下となります。
['Hello [NAME_GIVEN], can you tell me your employee ID?', 'Yep, my [BUSINESS_UNIT] ID is [EMPLOYEE_ID], and my SIN is [SSN]', 'Okay, thanks [NAME_GIVEN], why are you calling today?', "I broke my right leg on the [DATE] and I'm waiting for my [MEDICAL_PROCESS] results. [NAME_MEDICAL_PROFESSIONAL], [ORGANIZATION_MEDICAL_FACILITY].", 'Oh, so sorry to hear that! How can we help?', "I won't be able to come back to the office in [LOCATION_CITY] for a while", "No problem [NAME_GIVEN], I will enter a short term work from home for you. You're all set!", 'Thanks so much [NAME_GIVEN]!']
大規模言語モデルとの連携: PrivateGPT
PrivateGPT では Private AI の秘匿化/復号化機能 によって、クラウド上の大規模言語モデルとの安全でシームレスなやり取りをサポートします。
さらに詳細な説明については PrivateGPT ユーザーガイド 及び 大規模言語モデルとのインテグレーション をご参照ください。
合成エンティティの生成 (Beta)
MARKER、トークン、MASK 文字による秘匿化の他に、Private AI では偽の合成語を生成し秘匿化を行うことができます。機械学習ベースのアプローチにより、周辺テキストから現実的な合成語を生成します。以下の利点が考えられます。
- 全く新しい合成語を生成するようなシステムと異なり、 Private AI では元の語句と同等の合成語を生成します。これにより元の文脈などを損なう率を低くし、例えばセンチメント分析等に悪影響を与えない運用が期待できます。
- PII 検出のマーケットリーダー である一方で、検出成功率 100% を実現することは困難です。合成語生成を組み合わせることにより、復号化の試みを防ぐより強固な PII 保護を実現することができます。
- 合成語生成により、自然なアウトプットを期待できるため、ワークフローの先の機械学習システム等への予期しない影響を低減することが可能です。
合成語生成には processed_text
パラメータに SYNTHETIC
を指定します。(現在テキスト処理のみベータ版として対応)
{
"text": [
"Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
],
"processed_text": {
"type": "SYNTHETIC"
}
}
curl --location 'https://api.private-ai.com/community/v4/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <YOUR KEY HERE>' \
--data '{
"text": [
"Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
],
"processed_text": {
"type": "SYNTHETIC"
}
}'
import requests
r = requests.post(
url="https://api.private-ai.com/community/v4/process/text",
headers={"x-api-key": "<YOUR API KEY>"},
json={
"text": [
"Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."
],
"processed_text": {"type": "SYNTHETIC"},
},
)
results = r.json()
print(results)
from privateai_client import PAIClient
from privateai_client import request_objects
client = PAIClient(url="https://api.private-ai.com/community/v4/", api_key="<YOUR API KEY>")
text_request = request_objects.process_text_obj(text=["Hello, my name is May. I am the aunt of Jessica Parker. We live in Toronto, Canada."],
processed_text=request_objects.processed_text_obj(type="SYNTHETIC"))
response = client.process_text(text_request)
print(response.processed_text)
結果は以下となります。
Hello, my name is Ben. I am the aunt of Michael Morley. We live in Ekshaku, Sweden.
[
{
"processed_text": "Hello, my name is Ben. I am the aunt of Michael Morley. We live in Ekshaku, Sweden.",
"entities": [
{
"processed_text": "Ben",
"text": "May",
"location": {
"stt_idx": 18,
"end_idx": 21,
"stt_idx_processed": 18,
"end_idx_processed": 21
},
"best_label": "NAME_GIVEN",
"labels": {
"NAME_GIVEN": 0.9234,
"NAME": 0.8903
}
},
{
"processed_text": "Michael Morley",
"text": "Jessica Parker",
"location": {
"stt_idx": 40,
"end_idx": 54,
"stt_idx_processed": 40,
"end_idx_processed": 54
},
"best_label": "NAME",
"labels": {
"NAME_GIVEN": 0.4595,
"NAME": 0.9178,
"NAME_FAMILY": 0.4567
}
},
{
"processed_text": "Ekshaku, Sweden",
"text": "Toronto, Canada",
"location": {
"stt_idx": 67,
"end_idx": 82,
"stt_idx_processed": 67,
"end_idx_processed": 82
},
"best_label": "LOCATION",
"labels": {
"LOCATION_CITY": 0.3177,
"LOCATION": 0.9268,
"LOCATION_COUNTRY": 0.3185
}
}
],
"entities_present": true,
"characters_processed": 83,
"languages_detected": {
"en": 0.8507365584373474
}
}
]