The pods that host Model APIs have hardware specifications based on resource quotas set by you. A resource quota determines the CPU and memory resources available to the Model that uses it.
Users can access resource quotas from a dropdown menu on the Model deployment page.
From the admin home, click Advanced > Resource Quotas to open the management interface.
From here you can create, edit, and set default resource quotas. Resource quotas cannot be permanently deleted. To make a resource quota unavailable for use, edit it and set Visible to false.
Resource quotas have the following properties:
-
CPUs requested - The number of cores that will be reserved for a Model with this quota.
-
Memory requested - The amount of RAM that will be reserved for a model with this quota.
-
CPU limit - If the hosting node has idle cores available, a model running this quota can make use of additional cores up to this limit.
-
Memory limit - If the hosting node has RAM available, a model running this quota can make use of additional memory up to this limit.
-
Visible - This property on a resource quota must be set to
true
for the quota to appear in the dropdown selector for users publishing Models. -
Default - The resource quota with this set to
true
is the quota that will be used for all newly published Models by default.
