Accuracy Modes

This guide explains the different Accuracy Modes available for selecting the model used to identify Personally Identifiable Information (PII) in text. The default mode is High Automatic, which automatically chooses between High and High Multilingual models. While the models in the Private AI solution are highly optimized (approximately 25x faster than a reference transformer implementation), in high-throughput scenarios, speed can be prioritized over accuracy by choosing Standard models.

Table of contents:

Standard Accuracy

Attribute Rating
Accuracy Lower
Speed Highest
Multilingual No

The standard accuracy mode offers a balanced trade-off between speed and accuracy. It is designed for high-throughput environments where faster processing is required, but slight reduction in accuracy is acceptable.

When to choose Standard Accuracy?

  • When speed is the primary concern.
  • When processing mainly English text.

Standard High Accuracy

Attribute Rating
Accuracy Higher
Speed Higher
Multilingual No

The standard_high mode provides higher accuracy than Standard but maintains a fast processing speed. It is optimized for English text and doesn’t support multilingual data.

When to choose Standard High Accuracy?

  • When increased accuracy is required, but speed is still important.
  • When processing English-only text.

Standard High Multilingual Accuracy

Attribute Rating
Accuracy Higher
Speed Higher
Multilingual Yes

The standard_high_multilingual mode is the multilingual equivalent of the Standard High model. See our documentation for details on which languages we support.

When to choose Standard High Multilingual Accuracy?

  • When increased accuracy is required as compared to Standard but speed is still important.
  • When working with multilingual data, including English.

Standard High Automatic Accuracy

Attribute Rating
Accuracy Higher
Speed Higher
Multilingual Yes

The standard_high_automatic mode uses a combination of standard High and Standard High Multilingual models. It dynamically chooses between English-only and multilingual models based on the detected language of the input text, provided the multilingual model is available. While the same model will be used for a single sample, different models can be applied across a batch, making this mode ideal for processing data in mixed languages. See our documentation for details on which languages we support.

When to choose Standard High Automatic Accuracy?

  • When increased accuracy is required, but speed is still important.
  • When automatic language detection and model selection are needed.

High Accuracy

Attribute Rating
Accuracy Highest
Speed Lower
Multilingual No

The high accuracy mode provides the best accuracy for English text, making it suitable for scenarios where precision is critical. It is slightly slower compared to the standard model but delivers the highest detection accuracy for English-only text and documents. GPU support is available for tasks requiring higher accuracy and throughput.

When to choose High Accuracy?

  • When the highest level of accuracy for English text is needed.
  • When slight speed reductions are acceptable for better detection.

High Multilingual Accuracy

Attribute Rating
Accuracy Highest
Speed Lower
Multilingual Yes

The high_multilingual mode is the multilingual equivalent of the High model. It offers high accuracy for multilingual text but is slower compared to the standard_high_multilingual version. GPU support is available for tasks requiring higher accuracy and throughput. See our documentation for details on which languages we support.

When to choose High Multilingual Accuracy?

  • When the highest level of accuracy is needed, and slight speed reductions are acceptable for improved detection.
  • When working with multilingual data, including English.

High Automatic Accuracy

Attribute Rating
Accuracy Highest
Speed Lower
Multilingual Yes

The high_automatic mode uses a combination of High and High Multilingual models. It dynamically chooses between English-only and multilingual models based on the detected language of the input text, provided the multilingual model is available. While the same model will be used for a single sample, different models can be applied across a batch, making this mode ideal for processing data in mixed languages. GPU support is available for tasks requiring higher accuracy and throughput. See our documentation for details on which languages we support.

When to choose High Automatic Accuracy?

  • When the highest level of accuracy is needed, and slight speed reductions are acceptable for improved detection.
  • When automatic language detection and model selection are needed.

Conclusion

Choosing the appropriate accuracy mode depends on your specific needs for speed, precision, and language support. Standard modes prioritize speed, making them ideal for high-throughput environments with primarily English text. High modes, on the other hand, deliver superior accuracy for scenarios where precision is critical, accepting some reduction in speed. For multilingual data, the Multilingual modes ensure comprehensive language coverage.

The Automatic modes provide the best balance, dynamically selecting the appropriate model based on the language detected.

By selecting the right mode, you can optimize performance and ensure accurate PII detection tailored to your data processing needs.

For details on our throughput and latency, please see our benchmarks page.

© Copyright 2024 Private AI.