The pods that host Model APIs have hardware specifications based on resource quotas set by you. A resource quota determines the CPU and memory resources available to the Model that uses it.
Users can access resource quotas from a dropdown menu on the Model deployment page.
From the admin home, click Advanced > Resource Quotas to open the management interface.
From here you can create, edit, and set default resource quotas. Resource quotas cannot be permanently deleted. To make a resource quota unavailable for use, edit it and set Visible to false.
Resource quotas have the following properties:
CPUs requested - The number of cores that will be reserved for a Model with this quota.
Memory requested - The amount of RAM that will be reserved for a model with this quota.
CPU limit - If the hosting node has idle cores available, a model running this quota can make use of additional cores up to this limit.
Memory limit - If the hosting node has RAM available, a model running this quota can make use of additional memory up to this limit.
Visible - This property on a resource quota must be set to
truefor the quota to appear in the dropdown selector for users publishing Models.
Default - The resource quota with this set to
trueis the quota that will be used for all newly published Models by default.