System overhead uses two of the available CPUs in GPU-powered models. The maximum number of GPU resource quota requests you can make is two less than the total number of CPUs in the node instance. If you attempt to request more resources than this limit allows, you will receive the following message:
Waiting for resources to become available. The maximum number of allowed instances for model deployment has been reached. Please contact your Domino Administrator to increase this limit or unpublish older models.
Set the Compute Resources per instance to at least 1 GB. The R model API container requires at least 720 GB RAM to run. The other containers that get created such as Istio and fluentbit require additional memory.