Performance Tips
Context
Private AI relies on Machine Learning to detect PII based on context, instead of pattern matching approaches such as regular expressions. Therefore, for best performance it is advisable to send text through in the largest possible chunks that still meet latency requirements. For example, the following chat log should be sent through in one call with link_batch
enabled, as opposed to line-by-line:
"Hi John, how are you?"
"I'm good thanks"
"Great, hope Atlanta is treating you well"
Similarly, text documents should be sent through in a single request, rather than by paragraph or sentence. In addition to improving accuracy, this will minimize the number of API calls made.
ASR Transcripts
When processing audio transcripts, it is recommentded to use the following input format:
"<speaker id>: <message>, <speaker id>: <message>,"
Capitalization
Finally, the PII detection models are optimised for normal English capitalization, e.g. "Robert is from Sydney, Australia. Muhab is from Wales"
. If this is not the case for your data, please contact Private AI so that we can provide you with the optimal model for your use case. Our solution will still work, but some performance will be lost.
This being said, Private AI is optimized for processing text with emojis and text containing ASR transcription errors.