Reading files from remote storage
By mounting external storage systems to the Private AI container teams can take advantage of a wide array of different technologies to source data for input. This can enable simplification of the overall design for data pipelines and additionally allow teams to take advantage of the feature sets present in those underlying storage providers.
S3 Object storage
When working with s3 buckets it can be useful to mount the bucket and process files directly without the need to encode the file and transfer data over the network. The mini guide below demonstrates how to do this using the Amazon mountpoint application.
There are two primary steps required to process files directly from object storage:
- Configure the container / host machine for S3 access
-
Process files using the
/process/files/uriendpoint
Prerequisites and assumptions:
The /process/files/uri endpoint assumes that all files are available in local storage in the running container.
At time of writing there are no connectors or drivers deployed with the Private AI container that enable communication with any additional protocols.
As such any storage volume that is mounted to docker is available via the /process/files/uri endpoint.
For S3 object storage this mini-guide assumes that you have:
- Active AWS account
- An S3 bucket
- A running Private AI instance
- AWS CLI installed on the host
- Appropriate software installed on the host to enable mounting an S3 bucket as local storage.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mountpoint.html
Step 1 - Setup S3 mount point
Ensure you have connected your AWS account. Follow the documentation here to assist with authentication: https://github.com/awslabs/mountpoint-s3/blob/main/doc/CONFIGURATION.md#aws-credentials
The format of the command to mount the S3 bucket is fairly straight forward:
mount-s3 s3://MY_BUCKETNAME /PATH/TO/FOLDER/ON/HOST --allow-other --uid $(id -u) --gid $(id -g)Take special note of the uid and gid, setting the appropriate user and groups for your configuration. The command above sets the logged in user/group to have permission to access the mount.
Step 2 - Modify the docker run / container start parameters to mount the local folder
The docker run command will now need to be modified to include a new volume linked to the mount point.
This is done by utilizing the parameter -v on the command line.
Example below:
docker run --rm -v /home/some/local/folder/mounted-s3-input:/home/ubuntu/s3-mount-input \
-v /home/some/local/folder/mounted-s3-output:/home/ubuntu/mounted-s3-output \
-e PAI_OUTPUT_FILE_DIR=/home/ubuntu/mounted-s3-output \
-e PAI_FILE_SUPPORT_ENABLED=True -v //home/some/local/folder/license.json:/app/license/license.json \
-p 8080:8080 -it crprivateaiprod.azurecr.io/deid:4.2.2-cpuWith the steps above complete the container will start and requests can be issued using the following payload format:
curl -i -X POST \
https://api.private-ai.com/community/process/files/uri \
-H 'Content-Type: application/json' \
-d '{
"uri": "/home/ubuntu/s3-mount-input/example.txt",
"entity_detection": {
"return_entity": true
}
}'