The pods that host Model APIs have hardware specifications based on resource quotas set by you. A resource quota determines the CPU and memory resources available to the Model that uses it.
Users can access resource quotas from a dropdown menu on the Model deployment page.
From the admin home, click Advanced > Resource Quotas to open the management interface.
From here you can create, edit, and set default resource quotas. Resource quotas cannot be permanently deleted. To make a resource quota unavailable for use, edit it and set Visible to false.
Resource quotas have the following properties:
CPUs requested The number of cores that will be reserved for a Model with this quota.
Memory requested The amount of RAM that will be reserved for a model with this quota.
CPU limit If the hosting node has idle cores available, a model running this quota can make use of additional cores up to this limit.
Memory limit If the hosting node has RAM available, a model running this quota can make use of additional memory up to this limit.
Visible This property on a resource quota must be set to
truefor the quota to appear in the dropdown selector for users publishing Models.
Default The resource quota with this set to
trueis the quota that will be used for all newly published Models by default.