Reporting Integration

The Private AI container can be configured to send reporting metrics to a logstash server. To enable this feature, the following environment variables can be added to the docker run command:

Variable Name
PAI_ENABLE_REPORTING Enables Reporting to a Logstash Server
LOGSTASH_HOST The Logstash server's host info
LOGSTASH_PORT The port of the Logstash server
LOGSTASH_TTL Sets the time to live value (in seconds) of the data queued for logstash. Data will be lost if the queued data is not sent successfully before the ttl value.
PAI_REPORT_ENTITY_COUNTS Enables entity counts (per piece of text deidentified) to be added to reporting

To run a container with these settings, the following command can be used:

docker run --rm -p 8080:8080 --mount type=bind,src=$PWD/tests/fixtures/licenses/license.json,dst=/app/license/license.json -e PAI_ENABLE_REPORTING=true -e LOGSTASH_HOST= -e LOGSTASH_PORT=50000 -e PAI_REPORT_ENTITY_COUNTS=true -it deid:image-name

The Logstash pipeline that the data is being sent to must be configured to able to accept JSON objects. A sample pipeline configuration that allows this would be:

input {
	tcp {
		port => 50000
		codec => json_lines {}
output {
	elasticsearch {
		hosts => "elasticsearch:9200"
		user => "${LOGSTASH_USER"
		password => "${LOGSTASH_PASSWORD}"

What is Being Sent?

When PAI_ENABLE_REPORTING is enabled in the container metering records will be sent to Logstash in batches, at 5 minute intervals. All records include the following fields:

deid.SessionId: a unique id that relates all metering information sent in the interval
deid.Accuracy: the accuracy used in the requests
deid.ProjectId: the project id of the request (the default value is main)
deid.Synthetic: boolean value indicating if synthetic data was used

The following meters are currently being used. (one record per meter, per project ID):

deid.api_calls: the amount of api calls sent in the interval
deid.api_chars: the amount of characters processed in the interval
deid.api_words: the amount of words processed in the interval

If PAI_ENABLE_PII_COUNT_METERING is enabled in the container, meters for each entity found will also be sent. These meters contain the following fields:

deid.Label: the entity found to be the best label (eg. NAME)
deld.pii-count: the amount of times the entity was found to be the best label in the interval

If PAI_REPORT_ENTITY_COUNTS is enabled in the container, a record of the entities found and their counts for each unit of text processed will be sent. e.g.

deid.NAME: 2

would indicate that in a unit of text processed, the entity NAME was the best label two times, and OCCUPATION was the best label once.

© Copyright 2022, 2023 Private AI.