Performance Tips
Concurrency and Batching
The following code examples outline how to increase throughput when using Private AI.
Title | Description |
---|---|
AsyncIO | How to use Python's AsyncIO library to make concurrent calls to Private AI |
Threading | How to use Python's threading and concurrent.futures libraries to make concurrent calls to Private AI |
Batch Requests | How to process multiple inputs in a single API call using Python |
Please visit the recommended concurrency levels page to set the number of concurrent requests optimally.
Context
Private AI relies on Machine Learning to detect PII based on context, instead of pattern matching approaches such as regular expressions. Therefore, for best performance it is advisable to send text through in the largest possible chunks that still meet latency requirements. For example, the following chat log should be sent through in one call with link_batch
enabled, as opposed to line-by-line:
"Hi John, how are you?"
"I'm good thanks"
"Great, hope Atlanta is treating you well"
The Batch Requests code example provided above shows how to implement this.
Similarly, text documents should be sent through in a single request, rather than by paragraph or sentence. In addition to improving accuracy, this will minimize the number of API calls made.
Capitalization
The PII detection models are optimised for normal English capitalization, e.g. "Robert is from Sydney, Australia. Muhab is from Wales"
. If this is not the case for your data, please contact Private AI so that we can provide you with the optimal model for your use case. Our solution will still work, but some performance will be lost.
This being said, Private AI is optimized for processing text with emojis and text containing ASR transcription errors.
ASR Transcripts
When processing audio transcripts, it is recommended to use the following input format:
"<speaker id>: <message>, <speaker id>: <message>,"
Model Tuning
Whilst Private AI's PII detection models generally perform well out of the box, model tuning may be beneficial in order to tailor our solution to unique use cases. The tuning process is outlined below.
1. Client-supplied Data Sample
Prepare a sample of at least 20 examples illustrating the problem to be addressed. See the table at the bottom of this page for examples. The sample should contain examples that are:
- anonymized such that all PII is manually replaced with synthetic entities
- long enough to provide enough context, ideally including ~30 words before and after the problematic entity
- diverse enough to be representative of the issues encountered
2. Secure Data Transfer
Data samples can be shared with Private AI via a secure transfer mechanism built on Microsoft Azure. Contact us for an access token.
3. Manual Data Anonymization
Private AI's data team will again manually anonymize the provided data, to ensure that no personal data is ever stored or used for training.
4. Model Tuning & Delivery
Using few-shot learning techniques, Private AI improves the PII Detection models by optimizing for your use case. Updated models are released via a new container version. Updates are usually delivered in the next regular release, but can be delivered via a patch release in as little as 72 hours, depending on the SLA and severity of the problem.
Example Data Samples
Potential Deidentification Issue* | Illustrative Example |
---|---|
CVV number not redacted | Okay. I'm just pulling it up. All right, I can go ahead and take that card number whenever you're ready. Okay, card number 4622-6542-1425-3511. All right, and the expiration date? 0226. And the three digits on the back. Six. 25. Thank you. I'll be charging your card for 243. Let me see what that amount was. 243 16. Would you like your receipt emailed? |
DOSE entities not redacted | Blood test done, results normal Medications Norvasc (AMLODIPINE 10MG, 1 Tablet(s) PO OD Catapres (CLONIDINE) 75MG, 1 Tablet(s) SL PRN Hydrochlorothiazide 25 MG PO PRN Aspirin 81 MG PO OD Cenolate (ASCORBIC ACID) 500MG, 1 Tablet(s) PO OD Allergies No known allergies Physical Exam Vital signs 200/130 135bpm |
DOB misclassified as DATE | Ms. Richmond, to verify your account can I get your date of birth and your phone number, please? Absolutely, yes. So my number is 907 563 2834 and then February 14, 1990. Okay, February 14, perfect, and it'll just be a moment while I access your file here. |
*Note that these examples are used only for illustration. Private AI correctly picks up the PII in each of these cases.